1 / 11

Tea Talk: Weighted Low Rank Approximations

Tea Talk: Weighted Low Rank Approximations. Ben Marlin Machine Learning Group Department of Computer Science University of Toronto April 30, 2003. Authors: Nathan Srebro, Tommi Jaakkola (MIT). URL: http://www.ai.mit.edu/~nati/LowRank/icml.pdf. Title: Weighted Low Rank Approximations.

flynn
Download Presentation

Tea Talk: Weighted Low Rank Approximations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tea Talk:Weighted Low Rank Approximations Ben MarlinMachine Learning Group Department of Computer ScienceUniversity of Toronto April 30, 2003

  2. Authors: Nathan Srebro, Tommi Jaakkola (MIT) URL: http://www.ai.mit.edu/~nati/LowRank/icml.pdf Title: Weighted Low Rank Approximations Submitted:ICML2003 Paper Details:

  3. Missing Data: Weighted LRA naturally handles data matrices with missing elements by using a 0/1 weight matrix. Noisy Data: Weighted LRA naturally handles data matrices with different noise variance estimates σijfor each of the elements of the matrix by setting Wij = 1/σij. Motivation:

  4. Given an nxmdata matrix D and an nxm weight matrix W, construct a rank-K approximation X=UV’ to D that minimizes error in the weighted Froebenius norm EWF. m m m K m D W X U V’ K = n The Problem:

  5. Adding the requirement that U and V are orthogonal results in a weighted low rank approximation analogous to SVD. Critical points of EWF can be local minima that are not global minima. wSVD does not admit a solution based on eigenvectors of the data matrix D. Relationship to standard SVD:

  6. For a given V the optimal Uv* can be calculated analytically, as can the gradient of the projected objective function E*WF(V)= E*WF(Uv*, V). Thus, perform gradient descent on E*WF(V). Where d(Wi) is the mxm matrix with the ith row of W along the diagonal and Di is the ith row of D. Optimization Approach: Main Idea:

  7. Consider a model of the data matrix given by D=X+Z where Z is white Gaussian noise. The weighted cost of X is equivalent to the log-likelihood of the observed variables. This suggests an EM approach where in the E step the missing values in D are filled in according to the values in X creating a matrix F. In the M step X is re-estimated as the rank-K SVD of F. Missing Value Approach: Main Idea:

  8. Consider a system with several data matrices Dn=X+Zn where the Zn are independent gaussian white noise. The maximum likelihood X in this case is found by taking the rank-K SVD of the mean of the Fn’s. Now consider a weighted rank-K approximation problem where Wij = wij/N and wij={1,…,N}. Such a problem can be converted to the type of problem described above by observing Dij in wij of a total of N Dn’s. For any N the mean of the N matrices Fn is given by: Missing Value Approach: Extension to General Weights:

  9. This approach yields an extremely simple EM- Algorithm: Missing Value Approach: EM Algorithm: E-Step: function X=wsvd(D,W,K) X=zeros(size(D)); Xold=inf*ones(size(D)); C=inf; while(sum(sum((X-Xold).^2))>eps) Xold=X; [U,S,V]=svd(W.*D+(1-W).*X); S(K+1:end,K+1:end)=0; X=U*S*V'; end M-Step: Obtain U,V from SVD of F Set Xt+1 = UV’

  10. Data Weights wSVD K=2 0.92 0.75 0.33 1 1 0 1 0.93 0.75 0.58 0.33 0.98 0.90 0.38 1 1 0 1 0.98 0.90 0.49 0.38 1.19 1.05 0.65 0.45 1 1 1 1 1.19 1.05 0.65 0.45 0.58 0.30 0 1 1 0 0.62 0.58 0.30 0.25 1.06 0.86 0.86 1 1 1 0 1.06 0.86 0.86 0.37 Example: Synthetic Rank 2 Matrix: =

  11. The End

More Related