190 likes | 214 Views
Numerical Linear Algebra in the Streaming Model. Ken Clarkson - IBM David Woodruff - IBM. The Problems. Given n x d matrix A and n x d’ matrix B , we want estimators for The matrix product A T B The matrix X * minimizing ||AX-B||
E N D
Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM
The Problems Given n x d matrix A and n x d’ matrix B, we want estimators for • The matrix product AT B • The matrix X* minimizing ||AX-B|| • A slightly generalized version of least-squares regression • Given integer k, the matrix Ak of rank k minimizing ||A-Ak|| • We consider the Frobenius matrix norm: root of sum of squares
General Properties of Our Algorithms • 1 pass over matrix entries, given in any order (allow multiple updates) • Maintain compressed versions or “sketches” of matrices • Do small work per entry to maintain the sketches • Output result using the sketches • Randomized approximation algorithms
Matrix Compression Methods • In a line of similar efforts… • Element-wise sampling [AM01], [AHK06] • Sketching / Random Projection: maintain a small number of random linear combinations of rows or columns [S06] • Row / column sampling [DK01], [DKM04], [DMM08] • Usually more than 1 pass • Here: sketching
Outline • Matrix Product • Regression • Low-rank approximation
Approximate Matrix Product • A and B have n rows, we want to estimate ATB • Let S be an n x m sign (Rademacher) matrix • Each entry is +1 or -1 with probability ½ • m is small, to be specified • Entries are O(log 1/δ)-wise independent • Sketches are STA and STB • Our estimate of ATB is [ATS][STB]/m • Easy to maintain sketches given updates • O(m) time per update, O(mc log(nc)) bits of space for STA and STB • Output [ATS][STB]/m using fast rectangular matrix multiplication
Expected Error and a Tail Estimate • Using linearity of expectation, E[ATSSTB/m] = ATE[SST]B/m = ATB • Moreover, for δ, ε > 0, there is m = O(log 1/ δ) ε-2so that Pr[||ATSSTB/m-ATB|| > ε ||A|| ||B||] ·δ (again ||C|| = [Σi, j Ci, j2]1/2) • This tail estimate seems to be new • Follows from bounding O(log 1/δ)-th moment of ||ATSSTB/m-ATB|| • Improves space of SarlÓs’ 1-pass algorithm by a log c factor
Matrix Product Lower Bound • Our algorithm is space-optimal for constant δ • a new lower bound • Reduction from a communication game • Augmented Indexing, players Alice and Bob • Alice has random x 2 {0,1}s • Bob has random i 2 {1, 2, …, s} • also xi+1, …, xs • Alice sends Bob one message • Bob should output xi with probability at least 2/3 • Theorem [MNSW]: Message must be (s) bits on average
Lower Bound Proof • Set s := £(cε-2log cn) • Alice makes matrix U • Uses x1…xs • Bob makes matrix U’ and B • Uses i and xi+1, …, xs • Alg input will be A:=U+U’ and B • A and B are n x c/2 • Alice: • Runs streaming matrix product Alg on U • Sends Alg state to Bob • Bob continues Alg with A := U + U’ and B • ATB determines xi with probability at least 2/3 • By choice of U, U’, B • Solving Augmented Indexing • So space of Alg must be (s) = (cε-2log cn) bits
Lower Bound Proof • U = U(1); U(2); …, U(log(cn)); 0s • U(k)is an£(ε-2) x c/2 submatrix with entries in {-10k, 10k} • U(k)i, j = 10k if matched entry of x is 0, else U(k)i, j = -10k • Bob’s index i corresponds to U(k*)i*, j* • U’ is such that A = U+U’ = U(1); U(2); …, U(k*); 0s • U’ is determined from xi+1, …, xs • ATB is i*-throw of U(k*) • ||A|| ¼ ||U(k*)|| since the entries of A grow geometrically • ε2||A||2¢ ||B||2, the squared error, is small, so most entries of the approximation to ATB have the correct sign
Linear Regression • The problem: minX ||AX-B|| • X* minimizing this has X* = A-B, where A- is the pseudo-inverse of A • The algorithm is • Maintain STA and STB • Return X’ solving minX ||ST(AX-B)|| • Main claim: if A has rank k, there is an m = O(kε-1log(1/δ)) so that with probability at least 1- δ,||AX’-B|| · (1+ε)||AX*-B|| • That is, relative error for X’ is small • A new lower bound (omitted here) shows our space is optimal
Regression Analysis • Why should X’ be good? • First reduce to showing that ||A(X*-X’)|| is small • Use normal equation ATAX* = ATB • Implies ||AX’-B||2 = ||AX*-B||2 + ||A(X’-X*)||2
Regression Analysis Continued • Bound ||A(X’-X*)||2 using several tools: • Normal equation (STA)T(STA)X’ = (STA)T(STB) • Our tail estimate for matrix product • Subspace JL: for m = O(kε-1log(1/δ)), STapproximately preserves lengths of all vectors in a k-space • Overall, ||A(X’-X*)||2 = O(ε)||AX*-B||2 • But ||AX’-B||2 = ||AX*-B||2 + ||A(X’-X*)||2
Best Low-Rank Approximation • For any matrix A and integer k, there is a matrix Ak of rank k that is closest to A among all matrices of rank k. • LSI, PCA, recommendation systems, clustering • The sketch STA holds information about A • There is a rank k matrix Ak’ in the rowspace of STA so that ||A-Ak’|| · (1+ε)||A-Ak||
Low-Rank Approximation via Regression • Why is there such an Ak’? Apply the regression results with A ! Ak, B ! A • The X’ minimizing ||ST(AkX-A)|| has ||AkX’-A||· (1+ ε)||Ak X*-A|| • But here X* = I, and X’ = (ST Ak)- STA • So the matrix AkX’ = Ak(STAk)-STA: • Has rank k • Is in the rowspace of STA • Is within 1+ εof the smallest distance of rank-k matrix • Problem: seems to require 2 passes
1 Pass Algorithm • Suppose R is a d x m sign matrix. By regression results transposed, the columnspace of AR contains a good rank-k approximation to A • X’ minimizing ||ARX-A|| has ||ARX’-A|| · (1+ ε)||A-Ak|| • Apply regression results with A ! AR and B ! A and X’’ minimizing ||ST(ARX-A)|| • So X’’=(STAR)-STA has ||ARX’’-A|| · (1+ ε)||ARX’-A|| · (1+ε)2||A-Ak|| • Algorithm maintains AR and STA, and computes ARX’’ = AR(STAR)-STA • Compute best rank-k approximation to ARX’’ in the columnspace of AR (ARX” has rank kε-2)
Concluding Remarks • Space bounds are tight for product, regression • Get 1-pass and O(kε-2(n+dε-2)log(nd)) space for low-rank approximation. • First sublinear 1-pass streaming algorithm with relative error • Show our space is almost optimal via new lower bound (omitted) • Total work is O(Nkε-4), where N is the number of non-zero entries of A • In next talk, [NDT] give a faster algorithm for dense matrices • Improve the dependence on ε • Look at tight bounds for multiple passes
A Lower Bound binary string x matrix A index i and xi+1, xi+2, … k ε-1 columns per block 0s k rows 0, 0 0, 0 0, 0 … … -1000, -1000 -1000, 1000 1000, 1000 … … 10000, -10000 -10000, -10000 -10000, -10000 … … 0, 0 0, 0 0, 0 … … … … … … … 0 0 … … … 10, -10 -10, 10 -10,-10 … … -100, -100 100, 100 100, 100 … … n-k rows Error now dominated by block of interest Bob also inserts a k x k identity submatrix into block of interest
Lower Bound Details Block of interest: k ε-1 columns k rows 0s P*Ik 0s n-k rows * Bob inserts k x k identity submatrix, scaled by large value P Show any rank-k approximation must err on all of shaded region So good rank-k approximation likely has correct sign on Bob’s entry