130 likes | 321 Views
Tea Talk: Weighted Low Rank Approximations. Ben Marlin Machine Learning Group Department of Computer Science University of Toronto April 30, 2003. Authors: Nathan Srebro, Tommi Jaakkola (MIT). URL: http://www.ai.mit.edu/~nati/LowRank/icml.pdf. Title: Weighted Low Rank Approximations.
E N D
Tea Talk:Weighted Low Rank Approximations Ben MarlinMachine Learning Group Department of Computer ScienceUniversity of Toronto April 30, 2003
Authors: Nathan Srebro, Tommi Jaakkola (MIT) URL: http://www.ai.mit.edu/~nati/LowRank/icml.pdf Title: Weighted Low Rank Approximations Submitted:ICML2003 Paper Details:
Missing Data: Weighted LRA naturally handles data matrices with missing elements by using a 0/1 weight matrix. Noisy Data: Weighted LRA naturally handles data matrices with different noise variance estimates σijfor each of the elements of the matrix by setting Wij = 1/σij. Motivation:
Given an nxmdata matrix D and an nxm weight matrix W, construct a rank-K approximation X=UV’ to D that minimizes error in the weighted Froebenius norm EWF. m m m K m D W X U V’ K = n The Problem:
Adding the requirement that U and V are orthogonal results in a weighted low rank approximation analogous to SVD. Critical points of EWF can be local minima that are not global minima. wSVD does not admit a solution based on eigenvectors of the data matrix D. Relationship to standard SVD:
For a given V the optimal Uv* can be calculated analytically, as can the gradient of the projected objective function E*WF(V)= E*WF(Uv*, V). Thus, perform gradient descent on E*WF(V). Where d(Wi) is the mxm matrix with the ith row of W along the diagonal and Di is the ith row of D. Optimization Approach: Main Idea:
Consider a model of the data matrix given by D=X+Z where Z is white Gaussian noise. The weighted cost of X is equivalent to the log-likelihood of the observed variables. This suggests an EM approach where in the E step the missing values in D are filled in according to the values in X creating a matrix F. In the M step X is re-estimated as the rank-K SVD of F. Missing Value Approach: Main Idea:
Consider a system with several data matrices Dn=X+Zn where the Zn are independent gaussian white noise. The maximum likelihood X in this case is found by taking the rank-K SVD of the mean of the Fn’s. Now consider a weighted rank-K approximation problem where Wij = wij/N and wij={1,…,N}. Such a problem can be converted to the type of problem described above by observing Dij in wij of a total of N Dn’s. For any N the mean of the N matrices Fn is given by: Missing Value Approach: Extension to General Weights:
This approach yields an extremely simple EM- Algorithm: Missing Value Approach: EM Algorithm: E-Step: function X=wsvd(D,W,K) X=zeros(size(D)); Xold=inf*ones(size(D)); C=inf; while(sum(sum((X-Xold).^2))>eps) Xold=X; [U,S,V]=svd(W.*D+(1-W).*X); S(K+1:end,K+1:end)=0; X=U*S*V'; end M-Step: Obtain U,V from SVD of F Set Xt+1 = UV’
Data Weights wSVD K=2 0.92 0.75 0.33 1 1 0 1 0.93 0.75 0.58 0.33 0.98 0.90 0.38 1 1 0 1 0.98 0.90 0.49 0.38 1.19 1.05 0.65 0.45 1 1 1 1 1.19 1.05 0.65 0.45 0.58 0.30 0 1 1 0 0.62 0.58 0.30 0.25 1.06 0.86 0.86 1 1 1 0 1.06 0.86 0.86 0.37 Example: Synthetic Rank 2 Matrix: =