A Bayesian Matrix Factorization Model for Relational Data UAI 2010

Relational Learning via Collective Matrix Factorization SIGKDD 2008 A Bayesian Matrix Factorization Model for Relational DataUAI 2010 Authors: Ajit P. Singh & Geoﬀrey J. Gordon Presenter: Xian Xing Zhang

Basic ideas • Collective matrix factorization is proposed for relational learning when an entity participates in multiple relations. • Several matrices (with different types of support) are factored simultaneously with shared parameters • CMF is extended to a hierarchical Bayesian model to enhance the sharing of statistics strength

An example of application • Functional Magnetic Resonance Imaging (fMRI): • fMRI data can be viewed as a relation (real valued), Response(stimulus, voxel) ∈ [0, 1] • stimulus side-information: a relation (binary) Co-occurs(word, stimulus) ∈ {0, 1} (which is collected as the statistics of whether the stimulus word co-occurs with other commonly-used words in large) • The goal is to predict unobserved values of the Response relation

Basic model description • In fMRI example, the Co-occurs relation is an m×n matrix X; the Response relation is an n×r matrix Y. • Likelihood of each matrix X and Y: • Co-occurs (p_X) is modeled by the Bernoulli distribution, Response (p_Y) is modeled by a Gaussian.

Hierarchical Collective Matrix Factorization • Information between entities can only be shared indirectly, through another facto: e.g., in f(UV’), two distinct rows of U are correlated only through V . • The hierarchical prior acts as a shrinkage estimator for the rows of a factor, pooling information indirectly, through Θ.

Bayesian Inference • Hessian Metropolis-Hastings: • In random walk Metropolis-Hastings it samples from a proposal distribution defined by a Gaussian with mean equal to the sample at time t, F_i(t) and covariance matrix , which is problematic. • HMH uses both the gradient and Hessian to automatically construct a proposal distribution at each sampling step. This is claimed as the main technical contribution of the UAI2010 paper.

Related work

Experiment setting • The Co-occurs(word, stimulus) relation is collected by measuring whether or not the stimulus word occurs within ﬁve tokens of a word in the Google Tera-word corpus. • Hold-out prediction: • Fold-in prediction (to predict a new row in Y)

Experiment results

Discussions • Existing methods force one to choose between ignoring parameter uncertainty or making Gaussianity assumptions. • Non-Gaussian response types signiﬁcantly improve predictive accuracy. • While non-Gaussianity complicates the construction of proposal distributions for Metropolis-Hastings, it does have a signiﬁcant impact on predictive accuracy

A Bayesian Matrix Factorization Model for Relational Data UAI 2010

A Bayesian Matrix Factorization Model for Relational Data UAI 2010

Presentation Transcript

Relational Data Model

Relational Data Model

Matrix Factorization

Bayesian Nonparametric Matrix Factorization for Recorded Music

Relational data model

Relational Data Model

Relational Data Model

Stochastic Matrix Factorization

Operators for a Relational Data Model

Bayesian Nonparametric Matrix Factorization for Recorded Music

Outline: Relational Data Model Relational Data Model - relation schema, relations

The Relational Data Model

Matrix Factorization

The Relational Data Model

Relational Data Model

Relational Data Model

Matrix Factorization

The Relational Data Model

The Relational Data Model

Outline: Relational Data Model Relational Data Model - relation schema, relations

Relational Data Model