340 likes | 578 Views
Principled Regularization for Probabilistic Matrix Factorization. Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis of High-Dimensional Data July 26-28, 2011. Probabilistic Matrix Factorization (PMF). Approximate a large n -by- m matrix R by
E N D
Principled Regularization for Probabilistic Matrix Factorization Robert Bell, Suhrid Balakrishnan AT&T Labs-Research Duke Workshop on Sensing and Analysis of High-Dimensional DataJuly 26-28, 2011
Probabilistic Matrix Factorization (PMF) • Approximate a large n-by-m matrix R by • M = PQ • P and Q each have k rows, k << n, m • mui = puqi • R may be sparsely populated • Prime tool in Netflix Prize • 99% of ratings were missing
Regularization for PMF • Needed to avoid overfitting • Even after limiting rank of M • Critical for sparse, imbalanced data • Penalized least squares • Minimize
Regularization for PMF • Needed to avoid overfitting • Even after limiting rank of M • Critical for sparse, imbalanced data • Penalized least squares • Minimize • or
Regularization for PMF • Needed to avoid overfitting • Even after limiting rank of M • Critical for sparse, imbalanced data • Penalized least squares • Minimize • or • ’s selected by cross validation
Research Questions • Should we use separate P and Q?
Research Questions • Should we use separate P and Q? • Should we use k separate ’sfor each dimension of P and Q?
Matrix Completion with Noise(Candes and Plan, Proc IEEE, 2010) • Rank reduction without explicit factors • No pre-specification of k, rank(M) • Regularization applied directly to M • Trace norm, aka, nuclear norm • Sum of the singular values of M • Minimize subject to • “Equivalent” to L2 regularization for P, Q
Research Questions • Should we use separate P and Q? • Should we use k separate ’sfor each dimension of P and Q? • Should we use the trace norm for regularization?
Bayesian Matrix Factorization (BPMF) (Salakhutdinov and Mnih, ICML 2008) • Let rui ~ N(puqi, 2) • No PMF-type regularization • pu ~ N(P, P-1) and qi ~ N(Q, Q-1) • Priors for 2, P, Q, P, Q • Fit by Gibbs sampling • Substantial reduction in prediction error relative to PMF with L2 regularization
Research Questions • Should we use separate P and Q? • Should we use k separate reg. parameters for each dimension of P and Q? • Should we use the trace norm for regularization? • Does BPMF “regularize” appropriately?
Matrix Factorization with Biases • Let mui = + au + bi + puqi • Regularization similar to before • Minimize
Matrix Factorization with Biases • Let mui = + au + bi + puqi • Regularization similar to before • Minimize • or
Research Questions • Should we use separate P and Q? • Should we use k separate reg. parameters for each dimension of P and Q? • Should we use the trace norm for regularization? • Does BPMF “regularize” appropriately? • Should we use separate ’sfor the biases?
Some Things this Talk Will Not Cover • Various extensions of PMF • Combining explicit and implicit feedback • Time varying factors • Non-negative matrix factorization • L1 regularization • ’s depending on user or item sample sizes • Efficiency of optimization algorithms • Use Newton’s method, each coordinate separately • Iterate to convergence
No Need for Separate P and Q • M = (cP)(c-1Q) is invariant for c ≠ 0 • For initial P and Q • Solve for c to minimize • c = • Gives • Sufficient to let P = Q = PQ
Bayesian Motivation for L2 Regularization • Simplest case: only one item • R is n-by-1 • Ru1 = a1 + ui, a1 ~ N(0, 2), ui ~ N(0, 2) • Posterior mean (or MAP) of a1 satisfies • a = ( 2/ 2) • Best is inversely proportional to 2
Implications for Regularization of PMF • Allow a≠ b • If a2 ≠ b2 • Allow a≠ b≠ PQ • Allow PQ1≠ PQ2≠ … ≠ PQk ? • Trace norm does not • BPMF appears to
Simulation Experiment Structure • n = 2,500 users, m = 400 items • 250,000 observed ratings • 150,000 in Training (to estimate a, b, P, Q) • 50,000 in Validation (to tune ’s) • 50,000 in Test (to estimate MSE) • Substantial imbalance in ratings • 8 to 134 ratings per user in Training data • 33 to 988 ratings per item in Training data
Simulation Model • rui = au + bi + pu1qi1 + pu2qi2 + ui • Elements of a, b, P, Q, and • Independent normals with mean 0 • Var(au) = 0.09 • Var(bi) = 0.16 • Var(pu1qi1) = 0.04 • Var(pu2qi2) = 0.01 • Var(ui) = 1.00
Evaluation • Test MSE for estimation of mui = E(rui) • MSE = • Limitations • Not real data • Only one replication • No standard errors
Results for Matrix Completion • Performs poorly on raw ratings • MSE = .0693 • Not designed to estimate biases • Fit to residuals from PMF with k = 0 • MSE = .0477 • “Recovered” rank was 1 • Worse than MSE’s from PMF: .0428 to .0439
Results for BPMF • Raw ratings • MSE = .0498, using k = 3 • Early stopping • Not designed to estimate biases • Fit to residuals from PMF with k = 0 • MSE = .0433, using k = 2 • Near .0428, for best PMF w/ biases
Summary • No need for separate P and Q • Theory suggests using separate ’s for distinct sets of exchangeable parameters • Biases vs. factors • For individual factors • Tentative simulation results support need for separate ’s across factors • BPMF does so automatically • PMF requires a way to do efficient tuning