1 / 26

Sparse Recovery ( Using Sparse Matrices)

Sparse Recovery ( Using Sparse Matrices). Piotr Indyk MIT. Heavy Hitters. Also called frequent elements and elephants Define HH p φ ( x ) = { i : |x i | ≥ φ || x|| p } L p Heavy Hitter Problem: Parameters: φ and φ ’ (often φ ’ = φ - ε )

chaney
Download Presentation

Sparse Recovery ( Using Sparse Matrices)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sparse Recovery(Using SparseMatrices) PiotrIndyk MIT

  2. Heavy Hitters • Also called frequent elements and elephants • Define HHpφ(x) = { i: |xi| ≥ φ ||x||p} • Lp Heavy Hitter Problem: • Parameters: φ and φ’ (often φ’ = φ-ε) • Goal: return a set S of coordinates s.t. • S contains HHpφ(x) • S is included in HHpφ’ (x) • Lp Point Query Problem: • Parameter:  • Goal: at the end of the stream, given i, report x*i=xi ||x||p Can solve L2 point query problem, with parameter  and failure probability P by storing O(1/2 log(1/P))numbers sketches [A-Gibbons-M-S’99],[Gilbert-Kotidis-Muthukrishnan-Strauss’01]

  3. Ax A = x Sparse Recovery(approximation theory, learning Fourier coeffs, linear sketching, finite rate of innovation, compressed sensing...) • Setup: • Data/signal in n-dimensional space : x E.g., xis an 256x256 image n=65536 • Goal: compress x into a “sketch” Ax , where A is a mxn “sketch matrix”, m << n • Requirements: • Plan A: want to recover x from Ax • Impossible: underdetermined system of equations • Plan B: want to recover an “approximation” x* of x • Sparsity parameter k • Informally: want to recover largestkcoordinates of x • Formally: want x* such that ||x*-x||pC(k) minx’ ||x’-x||q over all x’ that are k-sparse (at mostk non-zero entries) • Want: • Good compression (small m=m(k,n)) • Efficient algorithms for encoding and recovery • Why linear compression ?

  4. Application I: Monitoring Network Traffic Data Streams • Router routs packets • Where do they come from ? • Where do they go to ? • Ideally, would like to maintain a traffic matrix x[.,.] • Easy to update: given a (src,dst) packet, increment xsrc,dst • Requires way too much space! (232 x 232 entries) • Need to compressx, increment easily • Using linear compression we can: • Maintain sketch Ax under increments to x, since A(x+) = Ax + A • Recover x* from Ax destination source x

  5. Applications, ctd. • Single pixel camera [Wakin, Laska, Duarte, Baron, Sarvotham, Takhar, Kelly, Baraniuk’06] • Pooling Experiments [Kainkaryam, Woolf’08], [Hassibi et al’07], [Dai-Sheikh, Milenkovic, Baraniuk], [Shental-Amir-Zuk’09],[Erlich-Shental-Amir-Zuk’09]

  6. Constructing matrix A • “Most” matrices A work • Sparse matrices: • Data stream algorithms • Coding theory (LDPCs) • Dense matrices: • Compressed sensing • Complexity/learning theory (Fourier matrices) • “Traditional” tradeoffs: • Sparse: computationally more efficient, explicit • Dense: shorter sketches • Recent results: the “best of both worlds”

  7. Scale: Excellent Very Good Good Fair Results Caveats: (1) most “dominated” results not shown (2) only results for general vectors x are displayed (3) sometimes the matrix type matters (Fourier, etc)

  8. Part I This talk: • Assume x≥0 – this simplifies the algorithm and analysis; see the original paper for the general case • Prove the following l∞/l1 guarantee: ||x-x*||∞≤ C Err1 /k This is actually stronger than the l1/l1 guarantee (cf. [CM’06], see also the Appendix) Note: [CM’04] originally proved a weaker statement where ||x-x*||∞≤ C||x||1 /k. The stronger guarantee follows from the analysis of [CCF’02] (cf. [GGIKMS’02]) who applied it to Err2 Theorem: There is a distribution over mxn matrices A, m=O(klogn), such that for any x, given Ax, we can recover x* such that ||x-x*||1≤ C Err1, where Err1=mink-sparsex’ ||x-x’||1 with probability 1-1/n. The recovery algorithm runs in O(n log n) time.

  9. First attempt • Matrix view: • A 0-1 wxn matrix A, with one 1 per column • The i-th column has 1 at position h(i), where h(i) be chosen uniformly at random from {1…w} • Hashing view: • Z=Ax • hhashes coordinates into “buckets” Z1…Zw • Estimator: xi*=Zh(i) 0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 xi xi* Z1 ………..Z(4/)k Closely related: [Estan-Varghese’03], “counting” Bloom filters

  10. Analysis x • We show xi* ≤ xi Err/k with probability >1/2 • Assume |xi1| ≥ … ≥ |xim| and let S={i1…ik} (“elephants”) • When is xi* > xi Err/k? • Event 1: S and i collide, i.e., h(i) in h(S-{i}) Probability: at most k/(4/)k = /4<1/4 (if <1) • Event 2: many “mice” collide with i., i.e., ∑tnotinSu{i}, h(i)=h(t)xt> Err/k Probability: at most ¼ (Markov inequality) • Total probability of “bad” events <1/2 xi1 xi2 … xik xi Z1 ………..Z(4/)k

  11. Second try • Algorithm: • Maintain d functions h1…hdand vectors Z1…Zd • Estimator: Xi*=mintZtht(i) • Analysis: • Pr[|xi*-xi| ≥ α Err/k ] ≤ 1/2d • Setting d=O(logn) (and thus m=O(k log n) ) ensures that w.h.p |x*i-xi|< α Err/k

  12. Part II

  13. Compressed Sensing • A.k.a. compressive sensing or compressive sampling [Candes-Romberg-Tao’04; Donoho’04] • Signal acquisition/processing framework: • Want to acquire a signal x=[x1…xn] • Acquisition proceeds by computing Ax+noise of dimension m<<n • We ignore noise in this lecture • From Ax we want to recover an approximation x* of x • Note: x* does not have to be k-sparse in general • Method: solve the following program: minimize ||x*||1 subject to Ax*=Ax • Guarantee: for some C>1 (1) ||x-x*||1≤ C mink-sparse x” ||x-x”||1 (2) ||x-x*||2≤ C mink-sparse x” ||x-x”||1 /k1/2

  14. Linear Programming • Recovery: • minimize ||x*||1 • subject to Ax*=Ax • This is a linear program: • minimize ∑i ti • subject to • -ti ≤ x*i ≤ ti • Ax*=Ax • Can solve in nΘ(1) time

  15. Intuition • LP: • minimize ||x*||1 • subject to Ax*=Ax • On the first sight, somewhat mysterious • But the approach has a long history in signal processing, statistics, etc. • Intuition: • The actual goal is to minimize ||x*||0 • But this comes down to the same thing (if A is “nice”) • The choice of L1 is crucial (L2 does not work) Ax*=Ax n=2, m=1, k=1

  16. Restricted Isometry Property • A matrix A satisfiesRIP of order k with constant δ if for any k-sparse vector x we have (1-δ) ||x||2 ≤ ||Ax||2≤ (1+δ) ||x||2 • Theorem 1: If each entry of A is i.i.d. as N(0,1), and m=Θ(klog(n/k)), then A satisfies (k,1/3)-RIP w.h.p. • Theorem 2: (O(k),1/3)-RIP implies L2/L1 guarantee

  17. dense vs. sparse • Restricted Isometry Property (RIP) [Candes-Tao’04]  is k-sparse ||||2 ||A||2 C ||||2 • Consider mxn 0-1 matrices with d ones per column • Do they satisfy RIP ? • No, unless m=(k2)[Chandar’07] • However, they can satisfy the following RIP-1 property [Berinde-Gilbert-Indyk-Karloff-Strauss’08]:  is k-sparse d (1-) ||||1 ||A||1  d||||1 • Sufficient (and necessary) condition: the underlying graph is a ( k, d(1-/2) )-expander

  18. N(S) S Expanders n m • A bipartite graph is a (k,d(1-))-expander if for any left set S, |S|≤k, we have |N(S)|≥(1-)d |S| • Objects well-studied in theoretical computer science and coding theory • Constructions: • Probabilistic: m=O(k log (n/k)) • Explicit: m=k quasipolylog n • High expansion implies RIP-1:  is k-sparse d (1-) ||||1 ||A||1  d||||1 [Berinde-Gilbert-Indyk-Karloff-Strauss’08] d m n

  19. Proof: d(1-/2)-expansion  RIP-1 • Want to show that for any k-sparse  we have d (1-) ||||1 ||A ||1  d||||1 • RHS inequality holds for any • LHS inequality: • W.l.o.g. assume |1|≥… ≥|k| ≥ |k+1|=…= |n|=0 • Consider the edges e=(i,j) in a lexicographic order • For each edge e=(i,j) define r(e) s.t. • r(e)=-1 if there exists an edge (i’,j)<(i,j) • r(e)=1 if there is no such edge • Claim 1: ||A||1 ≥∑e=(i,j) |i|re • Claim 2: ∑e=(i,j) |i|re ≥ (1-) d||||1 d m n

  20. Recovery: algorithms

  21. L1 minimization • Algorithm: • minimize ||x*||1 • subject to Ax=Ax* • Theoretical performance: • RIP1 replaces RIP2 • L1/L1 guarantee • Experimental performance • Thick line: Gaussian matrices • Thin lines: prob. levels for sparse matrices (d=10) • Same performance! k/m m/n

  22. Matching Pursuit(s) A x*-x Ax-Ax* • Iterative algorithm: given current approximation x* : • Find (possibly several) i s. t. Ai “correlates” with Ax-Ax* . This yields i and z s. t. ||x*+zei-x||p << ||x* - x||p • Update x* • Sparsify x* (keep only k largest entries) • Repeat • Norms: • p=2 : CoSaMP, SP, IHTetc (RIP) • p=1 : SMP, SSMP (RIP-1) • p=0 : LDPC bit flipping (sparse matrices) = i i

  23. Sequential Sparse Matching Pursuit [Berinde-Indyk’09] • Algorithm: • x*=0 • Repeat T times • Repeat S=O(k) times • Find i and zthat minimize* ||A(x*+zei)-Ax||1 • x* = x*+zei • Sparsify x* (set all but k largest entries of x* to 0) • Similar to SMP, but updates done sequentially A i N(i) Ax-Ax* x-x* * Set z=median[ (Ax*-Ax)N(I).Instead, one could first optimize (gradient) i and then z [ Fuchs’09]

  24. Experiments 256x256 SSMP is ran with S=10000,T=20. SMP is ran for 100 iterations. Matrix sparsity is d=8.

  25. Conclusions • Sparse approximation using sparse matrices • State of the art: deterministically can do 2 out of 3: • Near-linear encoding/decoding • O(k log (n/k)) measurements • Approximation guarantee with respect to L2/L1 norm • Open problems: • 3 out of 3 ? • Explicit constructions ? This talk

  26. References • Survey: A. Gilbert, P. Indyk, “Sparse recovery using sparse matrices”, Proceedings of IEEE, June 2010. • Courses: • “Streaming, sketching, and sub-linear space algorithms”, Fall’07 • “Sub-linear algorithms” (with RonittRubinfeld), Fall’10 • Blogs: • Nuit blanche: nuit-blanche.blogspot.com/

More Related