100 likes | 321 Views
Relational Learning via Collective Matrix Factorization SIGKDD 2008. A Bayesian Matrix Factorization Model for Relational Data UAI 2010. Authors: Ajit P. Singh & Geoffrey J. Gordon Presenter: Xian Xing Zhang. Basic ideas.
Relational Learning via Collective Matrix Factorization SIGKDD 2008 A Bayesian Matrix Factorization Model for Relational DataUAI 2010 Authors: Ajit P. Singh & Geoffrey J. Gordon Presenter: Xian Xing Zhang
Basic ideas • Collective matrix factorization is proposed for relational learning when an entity participates in multiple relations. • Several matrices (with different types of support) are factored simultaneously with shared parameters • CMF is extended to a hierarchical Bayesian model to enhance the sharing of statistics strength
An example of application • Functional Magnetic Resonance Imaging (fMRI): • fMRI data can be viewed as a relation (real valued), Response(stimulus, voxel) ∈ [0, 1] • stimulus side-information: a relation (binary) Co-occurs(word, stimulus) ∈ {0, 1} (which is collected as the statistics of whether the stimulus word co-occurs with other commonly-used words in large) • The goal is to predict unobserved values of the Response relation
Basic model description • In fMRI example, the Co-occurs relation is an m×n matrix X; the Response relation is an n×r matrix Y. • Likelihood of each matrix X and Y: • Co-occurs (p_X) is modeled by the Bernoulli distribution, Response (p_Y) is modeled by a Gaussian.
Hierarchical Collective Matrix Factorization • Information between entities can only be shared indirectly, through another facto: e.g., in f(UV’), two distinct rows of U are correlated only through V . • The hierarchical prior acts as a shrinkage estimator for the rows of a factor, pooling information indirectly, through Θ.
Bayesian Inference • Hessian Metropolis-Hastings: • In random walk Metropolis-Hastings it samples from a proposal distribution defined by a Gaussian with mean equal to the sample at time t, F_i(t) and covariance matrix , which is problematic. • HMH uses both the gradient and Hessian to automatically construct a proposal distribution at each sampling step. This is claimed as the main technical contribution of the UAI2010 paper.
Experiment setting • The Co-occurs(word, stimulus) relation is collected by measuring whether or not the stimulus word occurs within five tokens of a word in the Google Tera-word corpus. • Hold-out prediction: • Fold-in prediction (to predict a new row in Y)
Discussions • Existing methods force one to choose between ignoring parameter uncertainty or making Gaussianity assumptions. • Non-Gaussian response types significantly improve predictive accuracy. • While non-Gaussianity complicates the construction of proposal distributions for Metropolis-Hastings, it does have a significant impact on predictive accuracy