Dimensionality Reduction

Dimensionality Reduction

Dimensionality Reduction • High-dimensional == many features • Find concepts/topics/genres: • Documents: • Features: Thousands of words, millions of word pairs • Surveys – Netflix:480k users x 177k movies Slides by Jure Leskovec

Dimensionality Reduction • Compress / reduce dimensionality: • 106rows; 103columns; no updates • random access to any cell(s); small error: OK Slides by Jure Leskovec

Dimensionality Reduction • Assumption: Data lies on or near a low d-dimensional subspace • Axes of this subspace are effective representation of the data Slides by Jure Leskovec

Why Reduce Dimensions? Why reduce dimensions? • Discover hidden correlations/topics • Words that occur commonly together • Remove redundant and noisy features • Not all words are useful • Interpretation and visualization • Easier storage and processing of the data Slides by Jure Leskovec

SVD - Definition A[m x n]= U[m x r][ r x r] (V[n x r])T • A: Input data matrix • mx nmatrix (e.g., mdocuments, nterms) • U: Left singular vectors • mx r matrix (mdocuments, r concepts) • : Singular values • rx r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A) • V: Right singular vectors • nx r matrix (nterms, r concepts) Slides by Jure Leskovec

n m SVD T n VT A   m U Slides by Jure Leskovec

SVD T n 1u1v1 2u2v2 A  + m σi… scalar ui … vector vi … vector Slides by Jure Leskovec

SVD - Properties It is alwayspossible to decompose a real matrix A into A = U VT , where • U, , V: unique • U, V: column orthonormal: • UT U = I; VT V = I (I: identity matrix) • (Cols. are orthogonal unit vectors) • : diagonal • Entries (singular values) are positive, and sorted in decreasing order (σ1σ2 ...  0) Slides by Jure Leskovec

SVD – Example: Users-to-Movies • A = U  VT - example: Matrix Alien Serenity Casablanca Amelie SciFi x x = Romnce Slides by Jure Leskovec

SVD – Example: Users-to-Movies • A = U  VT - example: Matrix Alien Serenity Casablanca Amelie SciFi-concept Romance-concept SciFi x x = Romnce Slides by Jure Leskovec

SVD - Example U is “user-to-concept” similarity matrix • A = U  VT - example: Matrix Alien Serenity Casablanca Amelie SciFi-concept Romance-concept SciFi x x = Romnce Slides by Jure Leskovec

SVD - Example • A = U VT - example: Matrix Alien Serenity Casablanca Amelie ‘strength’ of SciFi-concept SciFi x x = Romnce Slides by Jure Leskovec

SVD - Example • A = U VT - example: V is “movie-to-concept” similarity matrix Matrix Alien Serenity Casablanca Amelie SciFi-concept SciFi x x = Romnce Slides by Jure Leskovec

SVD - Interpretation #1 ‘movies’, ‘users’ and ‘concepts’: • U: user-to-concept similarity matrix • V: movie-to-concept sim. matrix • : its diagonal elements: ‘strength’ of each concept Slides by Jure Leskovec

SVD gives best axis to project on: ‘best’ = min sum of squares of projection errors minimum reconstruction error SVD - interpretation #2 Movie 2 rating first right singularvector v1 Movie 1 rating Slides by Jure Leskovec

x x v1 SVD - Interpretation #2 • A = U VT - example: first right singularvector Movie 2 rating v1 Movie 1 rating = Slides by Jure Leskovec

SVD - Interpretation #2 • A = U VT - example: variance (‘spread’) on the v1 axis x x = Slides by Jure Leskovec

x x SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? = Slides by Jure Leskovec

SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? • A:Set the smallest singular values to zero x x = A= Slides by Jure Leskovec

SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A:Set the smallest singular values to zero x x ~ A= Slides by Jure Leskovec

SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A:Set the smallest singular values to zero: x x ~ A= Slides by Jure Leskovec

SVD - Interpretation #2 More details • Q: How exactly is dim. reduction done? • A:Set the smallest singular values to zero B= Frobenius norm: ǁMǁF= Σij Mij2 ~ A= ǁA-BǁF= Σij(Aij-Bij)2 is “small” Slides by Jure Leskovec

U A Sigma VT = B is approx A B U Sigma = VT Slides by Jure Leskovec

SVD – Best Low Rank Approx. • Theorem: Let A = U  VT(σ1σ2…, rank(A)=r) thenB = U S VT • S = diagonal nxn matrix where si=σi (i=1…k) else si=0 is a best rank-k approximation to A: • Bis solution to minBǁA-BǁFwhere rank(B)=k • We will need 2 facts: • where M= PQR is SVD of M • U  VT - U S VT = U ( - S) VT Slides by Jure Leskovec

SVD – Best Low Rank Approx. • We will need 2 facts: • where M= PQR is SVD of M • U VT -U S VT= U ( - S) VT We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal Slides by Jure Leskovec

SVD – Best Low Rank Approx. • A = U  VT , B = U S VT(σ1σ2…  0, rank(A)=r) • S = diagonal nxn matrix where si=σi (i=1…k) else si=0 then Bis solution to minBǁA-BǁF, rank(B)=k • Why? • We want to choose si to minimize we set si=σi (i=1…k) else si=0 • U  VT - U S VT = U ( - S) VT Slides by Jure Leskovec

SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: σ1 x x = u1 u2 σ2 v1 v2 Slides by Jure Leskovec

u2 u1 vT2 vT1 σ2 σ1 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix m k terms = + +... 1 x m n x 1 n Assume: σ1σ2σ3  ...  0 Why is setting small σsthe thing to do? Vectors ui and vi are unit length, so σiscales them. So, zeroing small σs introduces less error. Slides by Jure Leskovec

u1 u2 vT1 vT2 σ1 σ2 SVD - Interpretation #2 Q: How many σsto keep? A: Rule-of-a thumb: keep 80-90% of ‘energy’(=σi2) m = + +... n assume: σ1σ2σ3  ... Slides by Jure Leskovec

SVD - Complexity • To compute SVD: • O(nm2) or O(n2m) (whichever is less) • But: • Less work, if we just want singular values • or if we want first k singular vectors • or if the matrix is sparse • Implemented in linear algebra packages like • LINPACK, Matlab, SPlus, Mathematica ... Slides by Jure Leskovec

SVD - Conclusions so far • SVD:A= U  VT: unique • U: user-to-concept similarities • V: movie-to-concept similarities •  : strength of each concept • Dimensionality reduction: • keep the few largest singular values (80-90% of ‘energy’) • SVD: picks up linear correlations Slides by Jure Leskovec

Case study: How to query? Q: Find users that like ‘Matrix’ and ‘Alien’ Matrix Alien Serenity Casablanca Amelie SciFi x x = Romnce Slides by Jure Leskovec

Case study: How to query? Q: Find users that like ‘Matrix’ A: Map query into a ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie SciFi x x = Romnce Slides by Jure Leskovec

Case study: How to query? Q: Find users that like ‘Matrix’ A: map query vectors into ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie q Alien v2 q= v1 Project into concept space:Inner product with each ‘concept’ vector vi Matrix Slides by Jure Leskovec

Case study: How to query? Q: Find users that like ‘Matrix’ A: map the vector into ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie q Alien v2 q= v1 q*v1 Project into concept space:Inner product with each ‘concept’ vector vi Matrix Slides by Jure Leskovec

Case study: How to query? Compactly, we have: qconcept = q V E.g.: Matrix Alien Serenity Casablanca Amelie SciFi-concept = q= movie-to-concept similarities Slides by Jure Leskovec

Case study: How to query? How would the user d that rated (‘Alien’, ‘Serenity’) be handled? dconcept = d V E.g.: Matrix Alien Serenity Casablanca Amelie SciFi-concept = d= movie-to-concept similarities Slides by Jure Leskovec

d= Case study: How to query? Observation:User d that rated (‘Alien’, ‘Serenity’) will be similar to query “user” q that rated (‘Matrix’), although d did not rate ‘Matrix’! Matrix Alien Serenity Casablanca Amelie SciFi-concept q= Similarity = 0 Similarity ≠ 0 Slides by Jure Leskovec

= SVD: Drawbacks • Optimal low-rank approximation: • in Frobenius norm • Interpretability problem: • A singular vector specifies a linear combination of all input columns or rows • Lack of sparsity: • Singular vectors are dense! VT  U Slides by Jure Leskovec

Dimensionality Reduction