1 / 70

Singular Value Decomposition and Data Management

Singular Value Decomposition and Data Management. SVD - Detailed outline. Motivation Definition - properties Interpretation Complexity Case studies Additional properties. SVD - Motivation. problem #1: text - LSI: find ‘concepts’ problem #2: compression / dim. reduction. SVD - Motivation.

Download Presentation

Singular Value Decomposition and Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Singular Value Decomposition and Data Management

  2. SVD - Detailed outline • Motivation • Definition - properties • Interpretation • Complexity • Case studies • Additional properties

  3. SVD - Motivation • problem #1: text - LSI: find ‘concepts’ • problem #2: compression / dim. reduction

  4. SVD - Motivation • problem #1: text - LSI: find ‘concepts’

  5. SVD - Motivation • problem #2: compress / reduce dimensionality

  6. Problem - specs • ~10**6 rows; ~10**3 columns; no updates; • random access to any cell(s) ; small error: OK

  7. SVD - Motivation

  8. SVD - Motivation

  9. SVD - Definition A[n x m] = U[n x r]L [ r x r] (V[m x r])T • A: n x m matrix (eg., n documents, m terms) • U: n x r matrix (n documents, r concepts) • L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) • V: m x r matrix (m terms, r concepts)

  10. SVD - Properties THEOREM [Press+92]:always possible to decomposematrix A into A = ULVT , where • U,L,V: unique (*) • U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) • UTU = I; VTV = I (I: identity matrix) • L: singular values, non-negative and sorted in decreasing order

  11. SVD - Example • A = ULVT - example: retrieval inf. lung brain data CS x x = MD

  12. SVD - Example • A = ULVT - example: retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

  13. SVD - Example doc-to-concept similarity matrix • A = ULVT - example: retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

  14. SVD - Example • A = ULVT - example: retrieval ‘strength’ of CS-concept inf. lung brain data CS x x = MD

  15. SVD - Example • A = ULVT - example: term-to-concept similarity matrix retrieval inf. lung brain data CS-concept CS x x = MD

  16. SVD - Example • A = ULVT - example: term-to-concept similarity matrix retrieval inf. lung brain data CS-concept CS x x = MD

  17. SVD - Detailed outline • Motivation • Definition - properties • Interpretation • Complexity • Case studies • Additional properties

  18. SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: • U: document-to-concept similarity matrix • V: term-to-concept sim. matrix • L: its diagonal elements: ‘strength’ of each concept

  19. SVD - Interpretation #2 • best axis to project on: (‘best’ = min sum of squares of projection errors)

  20. SVD - Motivation

  21. minimum RMS error SVD - interpretation #2 SVD: gives best axis to project v1

  22. SVD - Interpretation #2

  23. x x = v1 SVD - Interpretation #2 • A = ULVT - example:

  24. SVD - Interpretation #2 • A = ULVT - example: variance (‘spread’) on the v1 axis x x =

  25. SVD - Interpretation #2 • A = ULVT - example: • UL gives the coordinates of the points in the projection axis x x =

  26. x x = SVD - Interpretation #2 • More details • Q: how exactly is dim. reduction done?

  27. SVD - Interpretation #2 • More details • Q: how exactly is dim. reduction done? • A: set the smallest singular values to zero: x x =

  28. SVD - Interpretation #2 x x ~

  29. SVD - Interpretation #2 x x ~

  30. SVD - Interpretation #2 x x ~

  31. SVD - Interpretation #2 ~

  32. SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: x x =

  33. SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: l1 x x = u1 u2 l2 v1 v2

  34. l1 l2 u1 u2 vT1 vT2 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: m = + +... n

  35. l1 l2 u1 u2 vT1 vT2 SVD - Interpretation #2 ‘spectral decomposition’ of the matrix: m r terms = + +... n n x 1 1 x m

  36. l1 l2 u1 u2 vT1 vT2 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) m To do the mapping you use VT X’ = VT X = + +... n assume: l1 >= l2 >= ...

  37. l1 l2 u1 u2 vT1 vT2 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s) m = + +... n assume: l1 >= l2 >= ...

  38. SVD - Interpretation #3 • finds non-zero ‘blobs’ in a data matrix x x =

  39. SVD - Interpretation #3 • finds non-zero ‘blobs’ in a data matrix x x =

  40. SVD - Interpretation #3 • Drill: find the SVD, ‘by inspection’! • Q: rank = ?? ?? x x = ?? ??

  41. SVD - Interpretation #3 • A: rank = 2 (2 linearly independent rows/cols) ?? x x = ?? ?? ??

  42. SVD - Interpretation #3 • A: rank = 2 (2 linearly independent rows/cols) x x = orthogonal??

  43. SVD - Interpretation #3 • column vectors: are orthogonal - but not unit vectors: 0 0 x x 0 = 0 0 0 0 0 0 0

  44. SVD - Interpretation #3 • and the singular values are: 0 0 x x 0 = 0 0 0 0 0 0 0

  45. SVD - Interpretation #3 • A: SVD properties: • matrix product should give back matrix A • matrix U should be column-orthonormal, i.e., columns should be unit vectors, orthogonal to each other • ditto for matrix V • matrixLshould be diagonal, with positive values

  46. SVD - Complexity • O( n * m * m) or O( n * n * m) (whichever is less) • less work, if we just want singular values • or if we want first k left singular vectors • or if the matrix is sparse [Berry] • Implemented: in any linear algebra package (LINPACK, matlab, Splus, mathematica ...)

  47. Optimality of SVD Def: TheFrobenius norm of a n x m matrix M is (reminder) The rank of a matrix M is the number of independent rows (or columns) of M Let A=ULVT and Ak = UkLk VkT (SVD approximation of A) Ak is annxm matrix, Uk an nxk, Lk kxk, and Vk mxk Theorem: [Eckart and Young] Among all n x m matrices C of rank at most k, we have that:

  48. Kleinberg’s Algorithm • Main idea: In many cases, when you search the web using some terms, the most relevant pages may not contain this term (or contain the term only a few times) • Harvard : www.harvard.edu • Search Engines: yahoo, google, altavista • Authorities and hubs

  49. Kleinberg’s algorithm • Problem dfn: given the web and a query • find the most ‘authoritative’ web pages for this query Step 0: find all pages containing the query terms (root set) Step 1: expand by one move forward and backward (base set)

  50. Kleinberg’s algorithm • Step 1: expand by one move forward and backward

More Related