Probabilistic Models for Matrix Completion Problems

Probabilistic Models for Matrix Completion Problems ArindamBanerjee banerjee@cs.umn.edu Dept of Computer Science & Engineering University of Minnesota, Twin Cities March 11, 2011

Recommendation Systems Movies Title: Gone with the wind Release year: 1940 Cast: Vivien Leigh, Clark Gable Genre: War, Romance Awards: 8 Oscars Keywords: Love, Civil war … Users Age: 28 Gender: Male Job: Sales man Interest: Travel … Movie ratings matrix Probabilistic Matrix Completion

Advertisements on the Web Category: Sports shoes Brand: Nike Ratings: 4.2/5 … Products … 1% 2% 0.01% … Category: Baby URL: babyearth.com Content: Webpage text Hyperlinks: 0.1% 2% 3% … 2% 2% 0.5% … 0.2% 0.3% 1.5% 2% … Webpages 2.5% 1% … 1.5% 1% 0.04% … Click-Through-Rate matrix Probabilistic Matrix Completion

Forest Ecology Traits Leaf(N) Leaf(P) SLA Leaf-Size … Wood density 2 3 5 … 4 1 2 … 3 3 … Plants 1 1 3 2 … 4 2 1 … 1 1 3 … Plant Trait Matrix (TRY db) (Jens Kattage, Peter Reich, et al)

The Main Idea Probabilistic Matrix Completion

Overview • Graphical Models • Bayesian Networks • Inference • Probabilistic Co-clustering • Structure: Simultaneous Row-Column Clustering • Bayesian models, Inference • Probabilistic Matrix Factorization • Structure: Low Rank Factorization • Bayesian models, Inference Probabilistic Matrix Completion

Graphical Models: What and Why • Statistical Machine Learning • Build diagnostic/predictive models from data • Uncertainty quantification based on (minimal) assumptions • The I.I.D. assumption • Data is independently and identically distributed • Example: Words in a doc drawn i.i.d. from the dictionary • Graphical models • Assume (graphical) dependencies between (random) variables • Closer to reality, domain knowledge can be captured • Learning/inference is much more difficult Probabilistic Matrix Completion

Flavors of Graphical Models • Basic nomenclature • Node = random variable, maybe observed/hidden • Edge = statistical dependency • Two popular flavors: ‘Directed’ and ‘Undirected’ • Directed Graphs • A directed graph between random variables, causal dependencies • Example: Bayesian networks, Hidden Markov Models • Joint distribution is a product of P(child|parents) • Undirected Graphs • An undirected graph between random variables • Example: Markov/Conditional random fields • Joint distribution in terms of potential functions X2 X1 X3 X4 X5 Graphical Models

Bayesian Networks • Joint distribution in terms of P(X|Parents(X)) X2 X1 X3 X4 X5 Probabilistic Matrix Completion

Example I: Burglary Network Probabilistic Matrix Completion

Example II: Rain Network Probabilistic Matrix Completion

Example III: Car Problem Diagnosis Probabilistic Matrix Completion

Latent Variable Models • Bayesian network with hidden variables • Semantically more accurate, less parameters • Example: Compute probability of heart disease Probabilistic Matrix Completion

Inference • Some variables in the Bayes net are observed • the evidence/data, e.g., John has not called, Mary has called • Inference • How to compute value/probability of other variables • Example: What is the probability of Burglary, i.e., P(b|¬j,m) Probabilistic Matrix Completion

Inference Algorithms • Graphs without loops • Efficient exact inference algorithms are possible • Sum-product algorithm, and its special cases • Belief propagation in Bayes nets • Forward-Backward algorithm in Hidden Markov Models (HMMs) • Graphs with loops • Junction tree algorithms • Convert into a graph without loops • May lead to exponentially large graph, inefficient algorithm • Sum-product algorithm, disregarding loops • Active research topic, correct convergence `not guaranteed’ • Works well in practice, e.g., turbo codes • Approximate inference Probabilistic Matrix Completion

Approximate Inference • Variational Inference • Deterministic approximation • Approximate complex true distribution/domain • Replace with family of simple distributions/domains • Use the best approximation in the family • Example: Mean-field, Expectation Propagation • Stochastic Inference • Simple sampling approaches • Markov Chain Monte Carlo methods (MCMC) • Powerful family of methods • Gibbs sampling • Useful special case of MCMC methods Probabilistic Matrix Completion

Example: Gene Expression Analysis Original Co-clustered Probabilistic Matrix Completion

Co-clustering and Matrix Approximation Probabilistic Matrix Completion

Probabilistic Co-clustering … Row clusters: Column clusters: … Probabilistic Matrix Completion

Generative Process • Assume a mixed membership for each row and column • Assume a Gaussian for each co-cluster • Pick row/column clusters • Generate each entry of the matrix 2 Probabilistic Matrix Completion

Bayesian Co-clustering (BCC) • A Dirichlet distribution over all possible mixed memberships 2 Probabilistic Matrix Completion

Background: Plate Diagrams a a b b1 b2 b3 3 Compact representation of large Bayesian networks Probabilistic Matrix Completion

Bayesian Co-clustering (BCC) Probabilistic Matrix Completion

Recall: The Inference Problem What is P( b | ¬j, m) ? Probabilistic Matrix Completion

Bayesian Co-clustering (BCC) Probabilistic Matrix Completion

Learning: Inference and Estimation • Learning • Estimate model parameters • Infer ‘mixed memberships’ of individual rows and columns • Expectation Maximization (EM) • Issues • Posterior probability cannot be obtained in closed form • Parameter estimation cannot be done directly • Approach:Variational inference Probabilistic Matrix Completion

Variational Inference • Introduce a variational distribution to approximate • Use Jensen’s inequality to get a tractable lower bound • Maximize the lower bound w.r.t. • Alternatively minimize the KL divergence between and • Maximize the lower bound w.r.t. Probabilistic Matrix Completion

Variational EM for BCC = lower bound of log-likelihood Probabilistic Matrix Completion

Residual Bayesian Co-clustering (RBC) • (m1,m2): row/column means • (bm1,bm2): row/column bias • (z1,z2) determines the distribution • Users/movies may have bias Probabilistic Matrix Completion

Results: Datasets • Movielens: Movie recommendation data • 100,000 ratings (1-5) for 1682 movies by 943 users (6.3%) • 1 million ratings for 3900 movies by 6040 users (4.2%) • Foodmart: Transaction data • 164,558 sales records for 7803 customers and 1559 products (1.35%) • Jester: Joke rating data • 100,000 ratings (-10.00,+10.00) for 100 jokes from 1000 users (100%) Probabilistic Matrix Completion

BCC, RBC vs. Co-clustering algorithms • BCC and RBC have the best performance • RBC and RBC-FF perform better than BCC Jester Probabilistic Matrix Completion

RBC vs. Other Co-clustering Algorithms Movielens Foodmart Probabilistic Matrix Completion

RBC vs. SVD, NNMF, and CORR • RBC and RBC-FF are competitive with other algorithms Jester Probabilistic Matrix Completion

RBC vs. SVD, NNMF, and CORR Movielens Foodmart Probabilistic Matrix Completion

SVD vs. Parallel RBC Parallel RBC scales well to large matrices Probabilistic Matrix Completion

Co-embedding: Users Probabilistic Matrix Completion

Co-embedding: Movies Probabilistic Matrix Completion

Matrix Factorization • Singular value decomposition • Problems • Large matrices, with millions of row/columns • SVD can be rather slow • Sparse matrices, most entries are missing • Traditional approaches cannot handle missing entries ≈ Probabilistic Matrix Completion

Matrix Factorization: “Funk SVD” • Model X ϵRn×m as UVT where • U is a Rn×k, V is Rm×k • Alternatively optimize U and V vj Xij= uiTvj = error = (Xij–Xij)2 = (Xij–uiTvj)2 ^ uiT ^ Probabilistic Matrix Completion

Probabilistic Matrix Factorization (PMF) N(0, σv2I) uiT ~ N(0, σu2I) vj ~ N(0, σv2I) Rij ~ N(uiTvj , σ2) vj Xij~ N(uiTvj , σ2) uiT N(0, σu2I) Inference using gradient descent Probabilistic Matrix Completion R. Salakhutdinov and A. Mnih, NIPS 2007

Bayesian Probabilistic Matrix Factorization µu ~ N(µ0, Λ u), Λ u ~ W(ν0, W0) µv ~ N(µ0, Λ v), Λ v ~ W(ν0, W0) ui ~ N(µu, Λ u) vj ~ N(µv, Λ v) Rij~ N(uiTvj , σ2) N(µv, Λv) vj Xij~ N(uiTvj , σ2) Wishart uiT N(µu, Λu) Gaussian Inference using MCMC Probabilistic Matrix Completion R. Salakhutdinov and A. Mnih, ICML 2008

Parametric PMF (PPMF) • Are the priors used in PMF and BPMF suitable? N(0, σv2I) N(µv, Λv) PMF: Diagonal covariance BPMF: Full covariance, with “hyperprior” vj vj uiT uiT N(0, σu2I) N(µu, Λu) N(µv, Λv) vj Parametric PMF (PPMF): Full covariance, but no “hyperprior” uiT N(µu, Λu) Probabilistic Matrix Completion

PPMF Probabilistic Matrix Completion

PPMF with Mixture Models (MPMF) • What if the row (column) items belong to several groups? Parametric PMF (PPMF): A single Gaussian to generate all ui (or vj) vj N1(µ1u, Λ1u) N2(µ2u, Λ2u) N3(µ3u, Λ3u) uiT Mixture PMF (MPMF): A mixture of Gaussians represent a set of groups. Each ui (or vj) is generated from one of the Gaussians Probabilistic Matrix Completion

MPMF Probabilistic Matrix Completion

PMF with Side Information: LDA-MPMF • Can we use side information to improve accuracy? users side information movies N1(µ1u, Λ1u) p1(θ1u) N2(µ2u, Λ2u) p2(θ2u) LDA-MPMF: ui and side information share a membership vector N3(µ3u, Λ3u) p3(θ3u) Probabilistic Matrix Completion

LDA-MPMF Probabilistic Matrix Completion

PMF with Side Information: CTM-PPMF LDA-MPMF: ui and side information share a membership vector CTM-MPMF: ui is converted to the membership vector to generate side information users side information movies p1(θ1u) p2(θ2u) N(µu, Λu) p3(θ3u) Probabilistic Matrix Completion

Probabilistic Models for Matrix Completion Problems

Probabilistic Models for Matrix Completion Problems

Presentation Transcript

Statistical Models for Probabilistic Forecasting

Probabilistic models

Probabilistic Models

Matrix Completion

Active Learning for Probabilistic Models

Temporal Probabilistic Models

Probabilistic Models

Probabilistic Models for Parsing Images

Probabilistic Graphical Models

Probabilistic Models for Relational Data

Probabilistic Sparse Matrix Factorization

Probabilistic Models

Distributed Nuclear Norm Minimization for Matrix Completion

Probabilistic Models

Probabilistic Topic Models

Principled Regularization for Probabilistic Matrix Factorization

MATRIX COMPLETION PROBLEMS IN MULTIDIMENSIONAL SYSTEMS

Probabilistic Graphical Models

Probabilistic Models

Probabilistic Models

Probabilistic models

Probabilistic Topic Models