70 likes | 170 Views
Nonparametric Latent Feature Models for Link Prediction. Kurt T. Miller, Thomas L. Griffiths, Michael I. Jordan NIPS 2009 Presented by Minhua Chen, 06.04.2010. Problem Formulation. Link prediction in Social Network (Binary Matrix completion). Y =.
E N D
Nonparametric Latent Feature Modelsfor Link Prediction Kurt T. Miller, Thomas L. Griffiths, Michael I. Jordan NIPS 2009 Presented by Minhua Chen, 06.04.2010.
Problem Formulation • Link prediction in Social Network (Binary Matrix completion) Y = Yij = 1: person i is linked to person j. Yij = 0: person i is not linked to person j. Yij = ?: unobserved entry to be filled in. • Linkage can stand for different relations, e.g., friends or not, colleagues or not. • If the network is a directed graph, then Y can be asymmetric. • Observed entries + auxiliary information (optional) unobserved entries
Methods • Class-based model Entities are clustered into classes. Linkage is determined by which classes they belong to. Models: Infinite Relational Model (IRM) Mixed Membership Stochastic Blockmodel (MMSB) Disadvantage: clustering description is too coarse, not expressive. • Latent-feature model Interactions between latent-features determine the linkage. This paper extends it to a nonparametric model using IBP. Number of latent features can be inferred as well as their interactions.
Model • Define Z to be a binary N*K matrix with N people and K latent features. • Define W to be a K*K weighting matrix for the K latent features. • The model is • Or expressed in more details:
Results on Synthetic Data (c) Ground truth of Z (d) Generated Y (e) Inferred Z Although the missing values are imputed correctly, the inferred Z is different from ground truth. This indicates that the model is unidentifiable.
Results on Multi-Task Data • The Countries data contains 54 relation matrices among 14 countries, along with 90 given covariates. • The Alyawarra data contains 26 kinship relationship matrices of 104 people in the Alyawarra tribe in Central Australia. • For each dataset, 80% of the data is used for training and the rest 20% is used for testing. • LFRM outperforms IRM and MMSB with proper initialization.
Results on Single-Task Data AUC performance • 234 authors who published with the most other • people in NIPS 1-17 are used, and their • coauthorship matrix is constructed.