1 / 31

Optimal Column-Based Low-Rank Matrix Reconstruction

Optimal Column-Based Low-Rank Matrix Reconstruction. SODA’12 Ali Kemal Sinop. Joint work with Prof. Venkatesan Guruswami. Outline. Introduction Notation Problem Definition Motivations Results Upper Bound Randomized Algorithm Summary. Notation: Vectors and Matrices.

rhys
Download Presentation

Optimal Column-Based Low-Rank Matrix Reconstruction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Column-Based Low-Rank Matrix Reconstruction SODA’12 Ali Kemal Sinop Joint work with Prof. Venkatesan Guruswami

  2. Outline • Introduction • Notation • Problem Definition • Motivations • Results • Upper Bound • Randomized Algorithm • Summary SODA 2012: Ali Kemal Sinop

  3. Notation: Vectors and Matrices • X: m-by-n real matrix. • Xi:ith column of X. • C: Subset of columns of X. • XC: sub-matrix of X on C. SODA 2012: Ali Kemal Sinop

  4. Formal Problem Definition • Given m-by-n matrix X = [X1 X2 ... Xn], • Find r columns, C, which minimizes: • which is equal to: Projection distance of Xi to XC. SODA 2012: Ali Kemal Sinop

  5. What is Distance to Span? • Given any matrix A, and vector x, • (Pythagorean Theorem) • Thus Orthonormal projection matrix onto null space of A. SODA 2012: Ali Kemal Sinop

  6. Back to Formal Problem Definition • Given m-by-n matrix X = [X1 X2 ... Xn], • Find r columns, C, which minimizes: • No books, • No web pages, • No images. • Only geometry. SODA 2012: Ali Kemal Sinop

  7. Problem Formulation • Given m-by-n matrix X = [X1 X2 ... Xn], • Find r columns, C, which minimizes: • No books, • No web pages, • No images. • Only geometry. Reconstruction Error = is the orthonormal projection matrix onto null space of XC. SODA 2012: Ali Kemal Sinop

  8. An Example • n=2, m=2, r=1: For C={1}, For C={2}, X1 X2 135o Origin SODA 2012: Ali Kemal Sinop

  9. What is the minimum possible? • X is m-by-n. • XC is m-by-r: • Rank of XC is at most |C|=r. • Replace column restriction with rank restriction: • Choose any matrix X(r) of rank-r • Minimizing |C|≤r implies rank≤r SODA 2012: Ali Kemal Sinop

  10. Low Rank Matrix Approximation • Therefore • X(r): Can be found by Singular Value Decomposition (SVD). X(r): a rank-r matrix minimizing SODA 2012: Ali Kemal Sinop

  11. Singular Values of X • There exists m unique non-negative reals, • Best rank-r reconstruction error: • “Smooth rank” of X. • For example, if rank(X) = k, then SODA 2012: Ali Kemal Sinop

  12. First Example • n=2, m=2, r=1. Remember X1 X2 135o Origin Quick check: Worst Possible? SODA 2012: Ali Kemal Sinop

  13. Our Goal: Do as well as best rank-k • Given target rank k, • Allowed error ε>0, • Choose smallest C:|C|=r, such that • How does r depend on k and ε? Best possible rank-k approximation error. SODA 2012: Ali Kemal Sinop

  14. Practical Motivations • [Drineas, Mahoney’09] DNA microarray: • Unsupervised feature selection for cancer detection. • Column Selection + K-means: Better classification. • Many classification problems • Same idea. SODA 2012: Ali Kemal Sinop

  15. Theoretical Applications • Our motivation. • [Guruswami, S’11] Approximation schemes for many graph partitioning problems. • Running time: Exponential in r=r(k,ε) where k=number of eigenvalues < 1-ε. • [Guruswami, S’12] Significantly faster algorithm for sparsest cut and etc... • Running time: Exponential in r=r(k, ε) where k=number of eigenvalues < Φ/ε. r = Number of columns needed to get within (1+ ε) factor of best rank-k approximation SODA 2012: Ali Kemal Sinop

  16. Previous Results • [Frieze, Kannan, Vempala’04] Introduced this problem. • [Deshpande, Vempala’06] • [Sarlos’06] • [Deshpande, Rademacher’10] r=k when ε=k+1. SODA 2012: Ali Kemal Sinop

  17. Recent Results r is optimal (up to low order terms). • [This paper] We showed • r=k/ε+k-1 columns suffice • and r ≥ k/ε-o(k) necessary. • A randomized algorithm in time , • A deterministic algorithm in time • Using [Deshpande, Rademacher’10]. ω=matrix multiplication. • (Independently) [Boutsidis, Drineas, Magdon-Ismail’11] • r≤2 k / ε columns, • In randomized time O(knm/ε + k3ε-2/3n) SODA 2012: Ali Kemal Sinop

  18. Outline • Introduction • Upper Bound • Strategy • An Algebraic Expression • Eliminating Min • Wrapping Up • Randomized Algorithm • Summary SODA 2012: Ali Kemal Sinop

  19. Upper Bound • Input: m-by-n matrix X, target rank k, number of columns r. • Problem: Relate to • Our Approach: • Represent in an algebraic form. • Eliminate minimum by randomly sampling C. • Represent error as a function of σ’s. • Bound it in terms of Best possible rank-k approximation error. SODA 2012: Ali Kemal Sinop

  20. An Algebraic Expression • Remember, our problem is: • is hard to manipulate. • An equivalent algebraic expression? SODA 2012: Ali Kemal Sinop

  21. Base Case: r=1 • A simple case. When C={c}: Xc Xi SODA 2012: Ali Kemal Sinop

  22. Case of r=2 • Consider C={c,d} in 3-dimensions: Xc Xd Xi SODA 2012: Ali Kemal Sinop

  23. General Case • Fact: Volume2 = determinant. • Using Volume = Base-Volume * Height formula, • Hence SODA 2012: Ali Kemal Sinop

  24. Eliminating Min • Volume Sampling[Deshpande, Rademacher, Vempala, Wang’06] • Choose C with probability SODA 2012: Ali Kemal Sinop

  25. Symmetric Forms • Fact: For any k, kth elementary symmetric polynomial: SODA 2012: Ali Kemal Sinop

  26. Schur Concavity • Hence • This ratio is Schur-concave. In other words: ... ... < < ... ... σ1 σ1 σ1 σ2 σ2 σ2 σk σk σk σk+1 σk+1 σk+1 σm σm σm ... ... SODA 2012: Ali Kemal Sinop

  27. Wrapping Up QED For r=k/ε+k-1, this is (1+ε). SODA 2012: Ali Kemal Sinop

  28. Algorithms for Choosing C • (Main idea) A nice recursion: • Randomized Algorithm • Choose j wp • Can be done in time • For all i, • Choose r-1 columns on these vectors. SODA 2012: Ali Kemal Sinop

  29. Outline • Introduction • Upper Bound • Randomized Algorithm • Summary SODA 2012: Ali Kemal Sinop

  30. Summary • (Upper Bound) r=k/ε+k-1 columns suffice to achieve (1+ε)*best rank-k error. • (Randomized) Such columns can be found in time (r TSVD) = O(k) TSVD. • (Lower Bound) k/ε-o(k) columns needed. Thanks! Job market alert. SODA 2012: Ali Kemal Sinop

  31. SODA 2012: Ali Kemal Sinop

More Related