710 likes | 1.03k Views
Probabilistic Models for Matrix Completion Problems. Arindam Banerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin Cities. March 11, 2011. Recommendation Systems. Movies. Title: Gone with the wind Release year: 1940
E N D
Probabilistic Models for Matrix Completion Problems ArindamBanerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin Cities March 11, 2011
Recommendation Systems Movies Title: Gone with the wind Release year: 1940 Cast: Vivien Leigh, Clark Gable Genre: War, Romance Awards: 8 Oscars Keywords: Love, Civil war … Users Age: 28 Gender: Male Job: Sales man Interest: Travel … Movie ratings matrix Probabilistic Matrix Completion
Advertisements on the Web Category: Sports shoes Brand: Nike Ratings: 4.2/5 … Products … 1% 2% 0.01% … Category: Baby URL: babyearth.com Content: Webpage text Hyperlinks: 0.1% 2% 3% … 2% 2% 0.5% … 0.2% 0.3% 1.5% 2% … Webpages 2.5% 1% … 1.5% 1% 0.04% … Click-Through-Rate matrix Probabilistic Matrix Completion
Forest Ecology Traits Leaf(N) Leaf(P) SLA Leaf-Size … Wood density 2 3 5 … 4 1 2 … 3 3 … Plants 1 1 3 2 … 4 2 1 … 1 1 3 … Plant Trait Matrix (TRY db) (Jens Kattage, Peter Reich, et al)
The Main Idea Probabilistic Matrix Completion
Overview • Graphical Models • Bayesian Networks • Inference • Probabilistic Co-clustering • Structure: Simultaneous Row-Column Clustering • Bayesian models, Inference • Probabilistic Matrix Factorization • Structure: Low Rank Factorization • Bayesian models, Inference Probabilistic Matrix Completion
Graphical Models: What and Why • Statistical Machine Learning • Build diagnostic/predictive models from data • Uncertainty quantification based on (minimal) assumptions • The I.I.D. assumption • Data is independently and identically distributed • Example: Words in a doc drawn i.i.d. from the dictionary • Graphical models • Assume (graphical) dependencies between (random) variables • Closer to reality, domain knowledge can be captured • Learning/inference is much more difficult Probabilistic Matrix Completion
Flavors of Graphical Models • Basic nomenclature • Node = random variable, maybe observed/hidden • Edge = statistical dependency • Two popular flavors: ‘Directed’ and ‘Undirected’ • Directed Graphs • A directed graph between random variables, causal dependencies • Example: Bayesian networks, Hidden Markov Models • Joint distribution is a product of P(child|parents) • Undirected Graphs • An undirected graph between random variables • Example: Markov/Conditional random fields • Joint distribution in terms of potential functions X2 X1 X3 X4 X5 Graphical Models
Bayesian Networks • Joint distribution in terms of P(X|Parents(X)) X2 X1 X3 X4 X5 Probabilistic Matrix Completion
Example I: Burglary Network Probabilistic Matrix Completion
Example II: Rain Network Probabilistic Matrix Completion
Example III: Car Problem Diagnosis Probabilistic Matrix Completion
Latent Variable Models • Bayesian network with hidden variables • Semantically more accurate, less parameters • Example: Compute probability of heart disease Probabilistic Matrix Completion
Inference • Some variables in the Bayes net are observed • the evidence/data, e.g., John has not called, Mary has called • Inference • How to compute value/probability of other variables • Example: What is the probability of Burglary, i.e., P(b|¬j,m) Probabilistic Matrix Completion
Inference Algorithms • Graphs without loops • Efficient exact inference algorithms are possible • Sum-product algorithm, and its special cases • Belief propagation in Bayes nets • Forward-Backward algorithm in Hidden Markov Models (HMMs) • Graphs with loops • Junction tree algorithms • Convert into a graph without loops • May lead to exponentially large graph, inefficient algorithm • Sum-product algorithm, disregarding loops • Active research topic, correct convergence `not guaranteed’ • Works well in practice, e.g., turbo codes • Approximate inference Probabilistic Matrix Completion
Approximate Inference • Variational Inference • Deterministic approximation • Approximate complex true distribution/domain • Replace with family of simple distributions/domains • Use the best approximation in the family • Example: Mean-field, Expectation Propagation • Stochastic Inference • Simple sampling approaches • Markov Chain Monte Carlo methods (MCMC) • Powerful family of methods • Gibbs sampling • Useful special case of MCMC methods Probabilistic Matrix Completion
Overview • Graphical Models • Bayesian Networks • Inference • Probabilistic Co-clustering • Structure: Simultaneous Row-Column Clustering • Bayesian models, Inference • Probabilistic Matrix Factorization • Structure: Low Rank Factorization • Bayesian models, Inference Probabilistic Matrix Completion
Example: Gene Expression Analysis Original Co-clustered Probabilistic Matrix Completion
Co-clustering and Matrix Approximation Probabilistic Matrix Completion
Probabilistic Co-clustering … Row clusters: Column clusters: … Probabilistic Matrix Completion
Generative Process • Assume a mixed membership for each row and column • Assume a Gaussian for each co-cluster • Pick row/column clusters • Generate each entry of the matrix 2 Probabilistic Matrix Completion
Bayesian Co-clustering (BCC) • A Dirichlet distribution over all possible mixed memberships 2 Probabilistic Matrix Completion
Background: Plate Diagrams a a b b1 b2 b3 3 Compact representation of large Bayesian networks Probabilistic Matrix Completion
Bayesian Co-clustering (BCC) Probabilistic Matrix Completion
Recall: The Inference Problem What is P( b | ¬j, m) ? Probabilistic Matrix Completion
Bayesian Co-clustering (BCC) Probabilistic Matrix Completion
Learning: Inference and Estimation • Learning • Estimate model parameters • Infer ‘mixed memberships’ of individual rows and columns • Expectation Maximization (EM) • Issues • Posterior probability cannot be obtained in closed form • Parameter estimation cannot be done directly • Approach:Variational inference Probabilistic Matrix Completion
Variational Inference • Introduce a variational distribution to approximate • Use Jensen’s inequality to get a tractable lower bound • Maximize the lower bound w.r.t. • Alternatively minimize the KL divergence between and • Maximize the lower bound w.r.t. Probabilistic Matrix Completion
Variational EM for BCC = lower bound of log-likelihood Probabilistic Matrix Completion
Residual Bayesian Co-clustering (RBC) • (m1,m2): row/column means • (bm1,bm2): row/column bias • (z1,z2) determines the distribution • Users/movies may have bias Probabilistic Matrix Completion
Results: Datasets • Movielens: Movie recommendation data • 100,000 ratings (1-5) for 1682 movies by 943 users (6.3%) • 1 million ratings for 3900 movies by 6040 users (4.2%) • Foodmart: Transaction data • 164,558 sales records for 7803 customers and 1559 products (1.35%) • Jester: Joke rating data • 100,000 ratings (-10.00,+10.00) for 100 jokes from 1000 users (100%) Probabilistic Matrix Completion
BCC, RBC vs. Co-clustering algorithms • BCC and RBC have the best performance • RBC and RBC-FF perform better than BCC Jester Probabilistic Matrix Completion
RBC vs. Other Co-clustering Algorithms Movielens Foodmart Probabilistic Matrix Completion
RBC vs. SVD, NNMF, and CORR • RBC and RBC-FF are competitive with other algorithms Jester Probabilistic Matrix Completion
RBC vs. SVD, NNMF, and CORR Movielens Foodmart Probabilistic Matrix Completion
SVD vs. Parallel RBC Parallel RBC scales well to large matrices Probabilistic Matrix Completion
Co-embedding: Users Probabilistic Matrix Completion
Co-embedding: Movies Probabilistic Matrix Completion
Overview • Graphical Models • Bayesian Networks • Inference • Probabilistic Co-clustering • Structure: Simultaneous Row-Column Clustering • Bayesian models, Inference • Probabilistic Matrix Factorization • Structure: Low Rank Factorization • Bayesian models, Inference Probabilistic Matrix Completion
Matrix Factorization • Singular value decomposition • Problems • Large matrices, with millions of row/columns • SVD can be rather slow • Sparse matrices, most entries are missing • Traditional approaches cannot handle missing entries ≈ Probabilistic Matrix Completion
Matrix Factorization: “Funk SVD” • Model X ϵRn×m as UVT where • U is a Rn×k, V is Rm×k • Alternatively optimize U and V vj Xij= uiTvj = error = (Xij–Xij)2 = (Xij–uiTvj)2 ^ uiT ^ Probabilistic Matrix Completion
Probabilistic Matrix Factorization (PMF) N(0, σv2I) uiT ~ N(0, σu2I) vj ~ N(0, σv2I) Rij ~ N(uiTvj , σ2) vj Xij~ N(uiTvj , σ2) uiT N(0, σu2I) Inference using gradient descent Probabilistic Matrix Completion R. Salakhutdinov and A. Mnih, NIPS 2007
Bayesian Probabilistic Matrix Factorization µu ~ N(µ0, Λ u), Λ u ~ W(ν0, W0) µv ~ N(µ0, Λ v), Λ v ~ W(ν0, W0) ui ~ N(µu, Λ u) vj ~ N(µv, Λ v) Rij~ N(uiTvj , σ2) N(µv, Λv) vj Xij~ N(uiTvj , σ2) Wishart uiT N(µu, Λu) Gaussian Inference using MCMC Probabilistic Matrix Completion R. Salakhutdinov and A. Mnih, ICML 2008
Parametric PMF (PPMF) • Are the priors used in PMF and BPMF suitable? N(0, σv2I) N(µv, Λv) PMF: Diagonal covariance BPMF: Full covariance, with “hyperprior” vj vj uiT uiT N(0, σu2I) N(µu, Λu) N(µv, Λv) vj Parametric PMF (PPMF): Full covariance, but no “hyperprior” uiT N(µu, Λu) Probabilistic Matrix Completion
PPMF Probabilistic Matrix Completion
PPMF with Mixture Models (MPMF) • What if the row (column) items belong to several groups? Parametric PMF (PPMF): A single Gaussian to generate all ui (or vj) vj N1(µ1u, Λ1u) N2(µ2u, Λ2u) N3(µ3u, Λ3u) uiT Mixture PMF (MPMF): A mixture of Gaussians represent a set of groups. Each ui (or vj) is generated from one of the Gaussians Probabilistic Matrix Completion
MPMF Probabilistic Matrix Completion
PMF with Side Information: LDA-MPMF • Can we use side information to improve accuracy? users side information movies N1(µ1u, Λ1u) p1(θ1u) N2(µ2u, Λ2u) p2(θ2u) LDA-MPMF: ui and side information share a membership vector N3(µ3u, Λ3u) p3(θ3u) Probabilistic Matrix Completion
LDA-MPMF Probabilistic Matrix Completion
PMF with Side Information: CTM-PPMF LDA-MPMF: ui and side information share a membership vector CTM-MPMF: ui is converted to the membership vector to generate side information users side information movies p1(θ1u) p2(θ2u) N(µu, Λu) p3(θ3u) Probabilistic Matrix Completion