Numerical Linear Algebra in the Streaming Model

Numerical Linear Algebra in the Streaming Model Ken Clarkson - IBM David Woodruff - IBM

The Problems Given n x d matrix A and n x d’ matrix B, we want estimators for • The matrix product AT B • The matrix X* minimizing ||AX-B|| • A slightly generalized version of least-squares regression • Given integer k, the matrix Ak of rank k minimizing ||A-Ak|| • We consider the Frobenius matrix norm: root of sum of squares

General Properties of Our Algorithms • 1 pass over matrix entries, given in any order (allow multiple updates) • Maintain compressed versions or “sketches” of matrices • Do small work per entry to maintain the sketches • Output result using the sketches • Randomized approximation algorithms

Matrix Compression Methods • In a line of similar efforts… • Element-wise sampling [AM01], [AHK06] • Sketching / Random Projection: maintain a small number of random linear combinations of rows or columns [S06] • Row / column sampling [DK01], [DKM04], [DMM08] • Usually more than 1 pass • Here: sketching

Outline • Matrix Product • Regression • Low-rank approximation

Approximate Matrix Product • A and B have n rows, we want to estimate ATB • Let S be an n x m sign (Rademacher) matrix • Each entry is +1 or -1 with probability ½ • m is small, to be specified • Entries are O(log 1/δ)-wise independent • Sketches are STA and STB • Our estimate of ATB is [ATS][STB]/m • Easy to maintain sketches given updates • O(m) time per update, O(mc log(nc)) bits of space for STA and STB • Output [ATS][STB]/m using fast rectangular matrix multiplication

Expected Error and a Tail Estimate • Using linearity of expectation, E[ATSSTB/m] = ATE[SST]B/m = ATB • Moreover, for δ, ε > 0, there is m = O(log 1/ δ) ε-2so that Pr[||ATSSTB/m-ATB|| > ε ||A|| ||B||] ·δ (again ||C|| = [Σi, j Ci, j2]1/2) • This tail estimate seems to be new • Follows from bounding O(log 1/δ)-th moment of ||ATSSTB/m-ATB|| • Improves space of SarlÓs’ 1-pass algorithm by a log c factor

Matrix Product Lower Bound • Our algorithm is space-optimal for constant δ • a new lower bound • Reduction from a communication game • Augmented Indexing, players Alice and Bob • Alice has random x 2 {0,1}s • Bob has random i 2 {1, 2, …, s} • also xi+1, …, xs • Alice sends Bob one message • Bob should output xi with probability at least 2/3 • Theorem [MNSW]: Message must be (s) bits on average

Lower Bound Proof • Set s := £(cε-2log cn) • Alice makes matrix U • Uses x1…xs • Bob makes matrix U’ and B • Uses i and xi+1, …, xs • Alg input will be A:=U+U’ and B • A and B are n x c/2 • Alice: • Runs streaming matrix product Alg on U • Sends Alg state to Bob • Bob continues Alg with A := U + U’ and B • ATB determines xi with probability at least 2/3 • By choice of U, U’, B • Solving Augmented Indexing • So space of Alg must be (s) = (cε-2log cn) bits

Lower Bound Proof • U = U(1); U(2); …, U(log(cn)); 0s • U(k)is an£(ε-2) x c/2 submatrix with entries in {-10k, 10k} • U(k)i, j = 10k if matched entry of x is 0, else U(k)i, j = -10k • Bob’s index i corresponds to U(k*)i*, j* • U’ is such that A = U+U’ = U(1); U(2); …, U(k*); 0s • U’ is determined from xi+1, …, xs • ATB is i*-throw of U(k*) • ||A|| ¼ ||U(k*)|| since the entries of A grow geometrically • ε2||A||2¢ ||B||2, the squared error, is small, so most entries of the approximation to ATB have the correct sign

Linear Regression • The problem: minX ||AX-B|| • X* minimizing this has X* = A-B, where A- is the pseudo-inverse of A • The algorithm is • Maintain STA and STB • Return X’ solving minX ||ST(AX-B)|| • Main claim: if A has rank k, there is an m = O(kε-1log(1/δ)) so that with probability at least 1- δ,||AX’-B|| · (1+ε)||AX*-B|| • That is, relative error for X’ is small • A new lower bound (omitted here) shows our space is optimal

Regression Analysis • Why should X’ be good? • First reduce to showing that ||A(X*-X’)|| is small • Use normal equation ATAX* = ATB • Implies ||AX’-B||2 = ||AX*-B||2 + ||A(X’-X*)||2

Regression Analysis Continued • Bound ||A(X’-X*)||2 using several tools: • Normal equation (STA)T(STA)X’ = (STA)T(STB) • Our tail estimate for matrix product • Subspace JL: for m = O(kε-1log(1/δ)), STapproximately preserves lengths of all vectors in a k-space • Overall, ||A(X’-X*)||2 = O(ε)||AX*-B||2 • But ||AX’-B||2 = ||AX*-B||2 + ||A(X’-X*)||2

Best Low-Rank Approximation • For any matrix A and integer k, there is a matrix Ak of rank k that is closest to A among all matrices of rank k. • LSI, PCA, recommendation systems, clustering • The sketch STA holds information about A • There is a rank k matrix Ak’ in the rowspace of STA so that ||A-Ak’|| · (1+ε)||A-Ak||

Low-Rank Approximation via Regression • Why is there such an Ak’? Apply the regression results with A ! Ak, B ! A • The X’ minimizing ||ST(AkX-A)|| has ||AkX’-A||· (1+ ε)||Ak X*-A|| • But here X* = I, and X’ = (ST Ak)- STA • So the matrix AkX’ = Ak(STAk)-STA: • Has rank k • Is in the rowspace of STA • Is within 1+ εof the smallest distance of rank-k matrix • Problem: seems to require 2 passes

1 Pass Algorithm • Suppose R is a d x m sign matrix. By regression results transposed, the columnspace of AR contains a good rank-k approximation to A • X’ minimizing ||ARX-A|| has ||ARX’-A|| · (1+ ε)||A-Ak|| • Apply regression results with A ! AR and B ! A and X’’ minimizing ||ST(ARX-A)|| • So X’’=(STAR)-STA has ||ARX’’-A|| · (1+ ε)||ARX’-A|| · (1+ε)2||A-Ak|| • Algorithm maintains AR and STA, and computes ARX’’ = AR(STAR)-STA • Compute best rank-k approximation to ARX’’ in the columnspace of AR (ARX” has rank kε-2)

Concluding Remarks • Space bounds are tight for product, regression • Get 1-pass and O(kε-2(n+dε-2)log(nd)) space for low-rank approximation. • First sublinear 1-pass streaming algorithm with relative error • Show our space is almost optimal via new lower bound (omitted) • Total work is O(Nkε-4), where N is the number of non-zero entries of A • In next talk, [NDT] give a faster algorithm for dense matrices • Improve the dependence on ε • Look at tight bounds for multiple passes

A Lower Bound binary string x matrix A index i and xi+1, xi+2, … k ε-1 columns per block 0s k rows 0, 0 0, 0 0, 0 … … -1000, -1000 -1000, 1000 1000, 1000 … … 10000, -10000 -10000, -10000 -10000, -10000 … … 0, 0 0, 0 0, 0 … … … … … … … 0 0 … … … 10, -10 -10, 10 -10,-10 … … -100, -100 100, 100 100, 100 … … n-k rows Error now dominated by block of interest Bob also inserts a k x k identity submatrix into block of interest

Lower Bound Details Block of interest: k ε-1 columns k rows 0s P*Ik 0s n-k rows * Bob inserts k x k identity submatrix, scaled by large value P Show any rank-k approximation must err on all of shaded region So good rank-k approximation likely has correct sign on Bob’s entry

Numerical Linear Algebra in the Streaming Model

Numerical Linear Algebra in the Streaming Model

Presentation Transcript

Linear Algebra

Numerical Linear Algebra

Graph Algorithms in Numerical Linear Algebra: Past, Present, and Future

Numerical Methods using Linear Algebra

Linear Algebra

RandNLA : Randomized Numerical Linear Algebra

Linear Algebra

Sketching as a Tool for Numerical Linear Algebra

Sketching as a Tool for Numerical Linear Algebra

Minimizing Communication in Numerical Linear Algebra cs.berkeley/~demmel

Numerical Linear Algebra An Introduction

Linear Algebra

LINEAR ALGEBRA

Numerical Linear Algebra

Linear Algebra

Linear Algebra

Numerical Computations in Linear Algebra

Linear Algebra

LINEAR ALGEBRA

Numerical Linear Algebra in the Streaming Model