Optimal Column-Based Low-Rank Matrix Reconstruction

Optimal Column-Based Low-Rank Matrix Reconstruction SODA’12 Ali Kemal Sinop Joint work with Prof. Venkatesan Guruswami

Outline • Introduction • Notation • Problem Definition • Motivations • Results • Upper Bound • Randomized Algorithm • Summary SODA 2012: Ali Kemal Sinop

Notation: Vectors and Matrices • X: m-by-n real matrix. • Xi:ith column of X. • C: Subset of columns of X. • XC: sub-matrix of X on C. SODA 2012: Ali Kemal Sinop

Formal Problem Definition • Given m-by-n matrix X = [X1 X2 ... Xn], • Find r columns, C, which minimizes: • which is equal to: Projection distance of Xi to XC. SODA 2012: Ali Kemal Sinop

What is Distance to Span? • Given any matrix A, and vector x, • (Pythagorean Theorem) • Thus Orthonormal projection matrix onto null space of A. SODA 2012: Ali Kemal Sinop

Back to Formal Problem Definition • Given m-by-n matrix X = [X1 X2 ... Xn], • Find r columns, C, which minimizes: • No books, • No web pages, • No images. • Only geometry. SODA 2012: Ali Kemal Sinop

Problem Formulation • Given m-by-n matrix X = [X1 X2 ... Xn], • Find r columns, C, which minimizes: • No books, • No web pages, • No images. • Only geometry. Reconstruction Error = is the orthonormal projection matrix onto null space of XC. SODA 2012: Ali Kemal Sinop

An Example • n=2, m=2, r=1: For C={1}, For C={2}, X1 X2 135o Origin SODA 2012: Ali Kemal Sinop

What is the minimum possible? • X is m-by-n. • XC is m-by-r: • Rank of XC is at most |C|=r. • Replace column restriction with rank restriction: • Choose any matrix X(r) of rank-r • Minimizing |C|≤r implies rank≤r SODA 2012: Ali Kemal Sinop

Low Rank Matrix Approximation • Therefore • X(r): Can be found by Singular Value Decomposition (SVD). X(r): a rank-r matrix minimizing SODA 2012: Ali Kemal Sinop

Singular Values of X • There exists m unique non-negative reals, • Best rank-r reconstruction error: • “Smooth rank” of X. • For example, if rank(X) = k, then SODA 2012: Ali Kemal Sinop

First Example • n=2, m=2, r=1. Remember X1 X2 135o Origin Quick check: Worst Possible? SODA 2012: Ali Kemal Sinop

Our Goal: Do as well as best rank-k • Given target rank k, • Allowed error ε>0, • Choose smallest C:|C|=r, such that • How does r depend on k and ε? Best possible rank-k approximation error. SODA 2012: Ali Kemal Sinop

Practical Motivations • [Drineas, Mahoney’09] DNA microarray: • Unsupervised feature selection for cancer detection. • Column Selection + K-means: Better classification. • Many classification problems • Same idea. SODA 2012: Ali Kemal Sinop

Theoretical Applications • Our motivation. • [Guruswami, S’11] Approximation schemes for many graph partitioning problems. • Running time: Exponential in r=r(k,ε) where k=number of eigenvalues < 1-ε. • [Guruswami, S’12] Significantly faster algorithm for sparsest cut and etc... • Running time: Exponential in r=r(k, ε) where k=number of eigenvalues < Φ/ε. r = Number of columns needed to get within (1+ ε) factor of best rank-k approximation SODA 2012: Ali Kemal Sinop

Previous Results • [Frieze, Kannan, Vempala’04] Introduced this problem. • [Deshpande, Vempala’06] • [Sarlos’06] • [Deshpande, Rademacher’10] r=k when ε=k+1. SODA 2012: Ali Kemal Sinop

Recent Results r is optimal (up to low order terms). • [This paper] We showed • r=k/ε+k-1 columns suffice • and r ≥ k/ε-o(k) necessary. • A randomized algorithm in time , • A deterministic algorithm in time • Using [Deshpande, Rademacher’10]. ω=matrix multiplication. • (Independently) [Boutsidis, Drineas, Magdon-Ismail’11] • r≤2 k / ε columns, • In randomized time O(knm/ε + k3ε-2/3n) SODA 2012: Ali Kemal Sinop

Outline • Introduction • Upper Bound • Strategy • An Algebraic Expression • Eliminating Min • Wrapping Up • Randomized Algorithm • Summary SODA 2012: Ali Kemal Sinop

Upper Bound • Input: m-by-n matrix X, target rank k, number of columns r. • Problem: Relate to • Our Approach: • Represent in an algebraic form. • Eliminate minimum by randomly sampling C. • Represent error as a function of σ’s. • Bound it in terms of Best possible rank-k approximation error. SODA 2012: Ali Kemal Sinop

An Algebraic Expression • Remember, our problem is: • is hard to manipulate. • An equivalent algebraic expression? SODA 2012: Ali Kemal Sinop

Base Case: r=1 • A simple case. When C={c}: Xc Xi SODA 2012: Ali Kemal Sinop

Case of r=2 • Consider C={c,d} in 3-dimensions: Xc Xd Xi SODA 2012: Ali Kemal Sinop

General Case • Fact: Volume2 = determinant. • Using Volume = Base-Volume * Height formula, • Hence SODA 2012: Ali Kemal Sinop

Eliminating Min • Volume Sampling[Deshpande, Rademacher, Vempala, Wang’06] • Choose C with probability SODA 2012: Ali Kemal Sinop

Symmetric Forms • Fact: For any k, kth elementary symmetric polynomial: SODA 2012: Ali Kemal Sinop

Schur Concavity • Hence • This ratio is Schur-concave. In other words: ... ... < < ... ... σ1 σ1 σ1 σ2 σ2 σ2 σk σk σk σk+1 σk+1 σk+1 σm σm σm ... ... SODA 2012: Ali Kemal Sinop

Wrapping Up QED For r=k/ε+k-1, this is (1+ε). SODA 2012: Ali Kemal Sinop

Algorithms for Choosing C • (Main idea) A nice recursion: • Randomized Algorithm • Choose j wp • Can be done in time • For all i, • Choose r-1 columns on these vectors. SODA 2012: Ali Kemal Sinop

Outline • Introduction • Upper Bound • Randomized Algorithm • Summary SODA 2012: Ali Kemal Sinop

Summary • (Upper Bound) r=k/ε+k-1 columns suffice to achieve (1+ε)*best rank-k error. • (Randomized) Such columns can be found in time (r TSVD) = O(k) TSVD. • (Lower Bound) k/ε-o(k) columns needed. Thanks! Job market alert. SODA 2012: Ali Kemal Sinop

SODA 2012: Ali Kemal Sinop

Optimal Column-Based Low-Rank Matrix Reconstruction

Optimal Column-Based Low-Rank Matrix Reconstruction

Presentation Transcript

Rank-Sparsity Incoherence for Matrix Decomposition

Reconstruction from Randomized Graph via Low Rank Approximation

Nonparametric low-rank tensor imputation

Exploiting Low-Rank Structure in Computing Matrix Powers with Applications to Preconditioning

Low-Rank Solution of Convex Relaxation for Optimal Power Flow Problem

Column-based dbs

5.2 Rank of a Matrix

Nonnegative Matrix Factorization via Rank-one Downdate

Character-Based Phylogeny Reconstruction

Reconstruction by Convex Optimization under Low Rank and Cardinality

RECONSTRUCTION OF LOW MOMENTUM PARTICLES

RECONSTRUCTION OF LOW MOMENTUM PARTICLES

Optimal Path Planning on Matrix Lie Groups

Globally Optimal Estimates for Geometric Reconstruction Problems

Optimal Tag SNP Selection for Haplotype Reconstruction

Nonparametric low-rank tensor imputation

Depth Enhancement via Low-rank Matrix Completion and Travel Report in Wien

Rank Annihilation Based Methods

Reconstruction Document-Based Questions

Character-Based Phylogeny Reconstruction