1 / 63

Dimensionality Reduction: SVD & CUR

bayle
Download Presentation

Dimensionality Reduction: SVD & CUR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org Dimensionality Reduction:SVD & CUR Mining of Massive Datasets Jure Leskovec, AnandRajaraman, Jeff Ullman Stanford University http://www.mmds.org

  2. Dimensionality Reduction • Assumption: Data lies on or near a low d-dimensional subspace • Axes of this subspace are effective representation of the data J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  3. Dimensionality Reduction • Compress / reduce dimensionality: • 106rows; 103columns; no updates • Random access to any cell(s); small error: OK The above matrix is really “2-dimensional.”All rows can be reconstructed by scaling [1 1 1 0 0] or [0 0 0 1 1] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  4. Rank of a Matrix • Q: What is rankof a matrix A? • A:Number of linearly independent columns of A • For example: • Matrix A = has rank r=2 • Why? The first two rows are linearly independent, so the rank is at least 2, but all three rows are linearly dependent (the first is equal to the sum of the second and third) so the rank must be less than 3. • Why do we care about low rank? • We can write A as two “basis” vectors: [1 2 1] [-2 -3 1] • And new coordinates of : [1 0] [0 1] [1 1] J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  5. Rank is “Dimensionality” • Cloud of points 3D space: • Think of point positionsas a matrix: • We can rewrite coordinates more efficiently! • Old basis vectors:[1 0 0] [0 1 0] [0 0 1] • New basis vectors: [1 2 1] [-2 -3 1] • Then A has new coordinates: [1 0]. B: [0 1], C: [1 1] • Notice: We reduced the number of coordinates! A B C A 1 row per point: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  6. Dimensionality Reduction • Goal of dimensionality reduction is to discover the axis of data! Rather than representingevery point with 2 coordinateswe represent each point with 1 coordinate (corresponding tothe position of the point on the red line). By doing this we incur a bit oferror as the points do not exactly lie on the line J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  7. Why Reduce Dimensions? Why reduce dimensions? • Discover hidden correlations/topics • Words that occur commonly together • Remove redundant and noisy features • Not all words are useful • Interpretation and visualization • Easier storage and processing of the data J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  8. SVD - Definition A[m x n]= U[m x r][ r x r] (V[n x r])T • A: Input data matrix • mx nmatrix (e.g., mdocuments, nterms) • U: Left singular vectors • mx r matrix (mdocuments, r concepts) • : Singular values • rx r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix A) • V: Right singular vectors • nx r matrix (nterms, r concepts) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  9. n m SVD T n VT A   m U J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  10. SVD T n 1u1v1 2u2v2 A  + m σi… scalar ui … vector vi … vector J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  11. SVD - Properties It is alwayspossible to decompose a real matrix A into A = U VT , where • U, , V: unique • U, V: column orthonormal • UT U = I; VT V = I(I: identity matrix) • (Columns are orthogonal unit vectors) • : diagonal • Entries (singular values) are positive, and sorted in decreasing order (σ1σ2 ...  0) Nice proof of uniqueness: http://www.mpi-inf.mpg.de/~bast/ir-seminar-ws04/lecture2.pdf J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  12. n m SVD – Example: Users-to-Movies • A = U  VT - example: Users to Movies Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 SciFi VT  = Romnce U “Concepts” AKALatent dimensionsAKA Latent factors J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  13. SVD – Example: Users-to-Movies • A = U  VT - example: Users to Movies Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 SciFi 12.4 0 0 0 9.5 0 0 0 1.3 = x x Romnce 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  14. SVD – Example: Users-to-Movies • A = U  VT - example: Users to Movies Matrix Alien Serenity Casablanca Amelie SciFi-concept Romance-concept 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 SciFi 12.4 0 0 0 9.5 0 0 0 1.3 = x x Romnce 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  15. SVD – Example: Users-to-Movies • A = U  VT - example: U is “user-to-concept” similarity matrix Matrix Alien Serenity Casablanca Amelie Romance-concept SciFi-concept 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 SciFi 12.4 0 0 0 9.5 0 0 0 1.3 = x x Romnce 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  16. SVD – Example: Users-to-Movies • A = U  VT - example: Matrix Alien Serenity Casablanca Amelie SciFi-concept “strength” of the SciFi-concept 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 SciFi SciFi 12.4 0 0 0 9.5 0 0 0 1.3 = x x Romnce Romnce 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  17. SVD – Example: Users-to-Movies • A = U  VT - example: Matrix Alien Serenity Casablanca Amelie V is “movie-to-concept” similarity matrix SciFi-concept 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 SciFi 12.4 0 0 0 9.5 0 0 0 1.3 = x x Romnce 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 SciFi-concept J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  18. SVD - Interpretation #1 ‘movies’, ‘users’ and ‘concepts’: • U: user-to-concept similarity matrix • V: movie-to-concept similarity matrix • : its diagonal elements: ‘strength’ of each concept J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  19. Dimensionality Reduction with SVD

  20. SVD – Dimensionality Reduction • Instead of using two coordinates to describe point locations, let’s use only one coordinate • Point’s position is its location along vector • How to choose ? Minimize reconstruction error Movie 2 rating first right singularvector v1 Movie 1 rating J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  21. SVD – Dimensionality Reduction • Goal: Minimize the sumof reconstruction errors: • where are the “old” and are the “new” coordinates • SVD gives ‘best’ axis to project on: • ‘best’ = minimizing the reconstruction errors • In other words, minimum reconstruction error Movie 2 rating first right singularvector v1 Movie 1 rating J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  22. SVD - Interpretation #2 • A = U VT - example: • V: “movie-to-concept” matrix • U: “user-to-concept” matrix first right singularvector Movie 2 rating v1 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 Movie 1 rating 12.4 0 0 0 9.5 0 0 0 1.3 = x x 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  23. SVD - Interpretation #2 • A = U VT - example: first right singularvector Movie 2 rating variance (‘spread’) on the v1 axis v1 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 Movie 1 rating 12.4 0 0 0 9.5 0 0 0 1.3 = x x 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  24. SVD - Interpretation #2 A = U VT - example: • U :Gives the coordinates of the points in the projection axis first right singularvector Movie 2 rating v1 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 Movie 1 rating Projection of users on the “Sci-Fi” axis (U )T: 1.61 0.19 -0.01 5.08 0.66 -0.03 6.82 0.85 -0.05 8.43 1.04 -0.06 1.86 -5.60 0.84 0.86 -6.93 -0.87 0.86 -2.75 0.41 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  25. SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 12.4 0 0 0 9.5 0 0 0 1.3 = x x 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  26. SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? • A: Set smallest singular values to zero 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 12.4 0 0 0 9.5 0 0 0 1.3 = x x 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  27. SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? • A: Set smallest singular values to zero 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 12.4 0 0 0 9.5 0 0 0 1.3  x x 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  28. SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? • A: Set smallest singular values to zero 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 12.4 0 0 0 9.5 0 0 0 1.3  x x 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  29. SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? • A: Set smallest singular values to zero 0.130.02 0.41 0.07 0.55 0.09 0.680.11 0.15 -0.59 0.07 -0.73 0.07 -0.29 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 12.4 0 0 9.5  x x 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  30. SVD - Interpretation #2 More details • Q:How exactly is dim. reduction done? • A: Set smallest singular values to zero 0.92 0.95 0.92 0.01 0.01 2.91 3.01 2.91 -0.01 -0.01 3.90 4.04 3.90 0.01 0.01 4.82 5.00 4.82 0.03 0.03 0.70 0.530.70 4.11 4.11 -0.69 1.34 -0.69 4.78 4.78 0.32 0.23 0.32 2.01 2.01 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2  Frobenius norm: ǁMǁF= Σij Mij2 ǁA-BǁF=  Σij(Aij-Bij)2 is “small” J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  31. SVD – Best Low Rank Approx. U A Sigma VT = B is best approximation of A B U Sigma = VT J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  32. SVD – Best Low Rank Approx. • Theorem:LetA = U  VTandB = U S VTwhere S=diagonal rxrmatrixwith si=σi (i=1…k) else si=0then Bis abest rank(B)=k approx. to A What do we mean by “best”: • Bis a solution to minBǁA-BǁFwhere rank(B)=k J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  33. SVD – Best Low Rank Approx. Details! • Theorem:LetA = U  VT(σ1σ2…, rank(A)=r) thenB = U S VT • S =diagonal rxr matrix where si=σi (i=1…k) else si=0 is a best rank-k approximation to A: • Bis a solution to minBǁA-BǁFwhere rank(B)=k • We will need 2 facts: • where M= PQR is SVD of M • U  VT - U S VT = U ( - S) VT J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  34. SVD – Best Low Rank Approx. Details! • We will need 2 facts: • where M= PQR is SVD of M • U VT -U S VT= U ( - S) VT We apply: -- P column orthonormal -- R row orthonormal -- Q is diagonal J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  35. SVD – Best Low Rank Approx. Details! • A = U  VT , B = U S VT(σ1σ2…  0, rank(A)=r) • S = diagonal nxn matrix where si=σi (i=1…k) else si=0 thenBis solution to minBǁA-BǁF, rank(B)=k • Why? • We want to choose si to minimize • Solution is to set si=σi(i=1…k) and other si=0 • We used: U  VT - U S VT = U ( - S) VT J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  36. SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 σ1 x x = u1 u2 σ2 v1 v2 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  37. σ2 σ1 u2 u1 vT2 vT1 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix m 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 k terms = + +... 1 x m n x 1 n Assume: σ1σ2σ3  ...  0 Why is setting small σito 0 the right thing to do? Vectors ui and vi are unit length, so σiscales them. So, zeroing small σi introduces less error. J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  38. σ1 σ2 u1 u2 vT1 vT2 SVD - Interpretation #2 Q: How many σsto keep? A: Rule-of-a thumb: keep 80-90% of ‘energy’ m 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 + = +... n Assume: σ1σ2σ3  ... J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  39. SVD - Complexity • To compute SVD: • O(nm2)or O(n2m) (whichever is less) • But: • Less work, if we just want singular values • or if we want first k singular vectors • or if the matrix is sparse • Implemented in linear algebra packages like • LINPACK, Matlab, SPlus, Mathematica ... J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  40. SVD - Conclusions so far • SVD:A= U  VT: unique • U: user-to-concept similarities • V: movie-to-concept similarities •  : strength of each concept • Dimensionality reduction: • keep the few largest singular values (80-90% of ‘energy’) • SVD: picks up linear correlations J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  41. Relation to Eigen-decomposition • SVD gives us: • A = UVT • Eigen-decomposition: • A = X L XT • A is symmetric • U, V, X are orthonormal (UTU=I), • L, are diagonal • Now let’s calculate: • AAT= U VT(U VT)T = U VT(VTUT) = UT UT • ATA = V T UT(U VT) = V TVT J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  42. Relation to Eigen-decomposition • SVD gives us: • A = UVT • Eigen-decomposition: • A = X L XT • A is symmetric • U, V, X are orthonormal (UTU=I), • L, are diagonal • Now let’s calculate: • AAT= U VT(U VT)T = U VT(VTUT) = UT UT • ATA = V T UT(U VT) = V TVT Shows how to computeSVD using eigenvaluedecomposition! X L2 XT X L2 XT J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  43. Example of SVD & Conclusion

  44. Case study: How to query? • Q: Find users that like ‘Matrix’ • A: Map query into a ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie 1 1 1 0 0 3 3 3 0 0 4 4 4 0 0 5 5 5 0 0 0 2 0 4 4 0 0 0 5 5 0 1 0 2 2 0.13 0.02 -0.01 0.41 0.07 -0.03 0.55 0.09 -0.04 0.68 0.11 -0.05 0.15 -0.590.65 0.07 -0.73-0.67 0.07 -0.290.32 SciFi 12.4 0 0 0 9.5 0 0 0 1.3 = x x Romnce 0.56 0.59 0.560.09 0.09 0.12 -0.02 0.12 -0.69 -0.69 0.40 -0.800.40 0.09 0.09 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  45. Case study: How to query? • Q: Find users that like ‘Matrix’ • A: Map query into a ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie Alien q v2 q= v1 Project into concept space:Inner product with each ‘concept’ vector vi Matrix J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  46. Case study: How to query? • Q: Find users that like ‘Matrix’ • A: Map query into a ‘concept space’ – how? Matrix Alien Serenity Casablanca Amelie q Alien v2 q= v1 q*v1 Project into concept space:Inner product with each ‘concept’ vector vi Matrix J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  47. Case study: How to query? Compactly, we have: qconcept = q V E.g.: Matrix Alien Serenity Casablanca Amelie SciFi-concept 0.56 0.12 0.59 -0.02 0.56 0.12 0.09 -0.69 0.09 -0.69 x = 2.8 0.6 q= movie-to-concept similarities (V) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  48. Case study: How to query? • How would the user d that rated (‘Alien’, ‘Serenity’) be handled?dconcept= d V E.g.: Matrix Alien Serenity Casablanca Amelie SciFi-concept 0.56 0.12 0.59 -0.02 0.56 0.12 0.09 -0.69 0.09 -0.69 x = 5.2 0.4 q= movie-to-concept similarities (V) J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  49. d = Case study: How to query? • Observation:User d that rated (‘Alien’, ‘Serenity’) will be similar to user q that rated (‘Matrix’), although d and q have zero ratings in common! Matrix Alien Serenity Casablanca Amelie SciFi-concept 5.2 0.4 q = 2.8 0.6 Zero ratings in common Similarity ≠ 0 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

  50. = SVD: Drawbacks • Optimal low-rank approximationin terms of Frobenius norm • Interpretability problem: • A singular vector specifies a linear combination of all input columns or rows • Lack of sparsity: • Singular vectors are dense! VT  U J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

More Related