370 likes | 494 Views
1+eps-Approximate Sparse Recovery. Eric Price MIT. David Woodruff IBM Almaden. Compressed Sensing. Choose an r x n matrix A Given x 2 R n Compute Ax Output a vector y so that |x-y| p · (1+ ε ) |x-x top k | p x top k is the k-sparse vector of largest magnitude coefficients of x
E N D
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden
Compressed Sensing • Choose an r x n matrix A • Given x 2 Rn • Compute Ax • Output a vector y so that |x-y|p· (1+ε) |x-xtop k|p • xtop k is the k-sparse vector of largest magnitude coefficients of x • p = 1 or p = 2 • Minimize number r = r(n, k, ε) of “measurements” PrA[ ] > 2/3
Previous Work • p = 1 [IR, …] r = O(k log(n/k) / ε) (deterministic A) • p = 2 [GLPS] r = O(k log(n/k) / ε) In both cases, r = (k log(n/k)) [DIPW] What is the dependence on ε?
Why 1+ε is Important • Suppose x = ei + u • ei = (0, 0, …, 0, 1, 0, …, 0) • u is a random unit vector orthogonal to ei • Consider y = 0n • |x-y|2 = |x|2· 21/2¢ |x-ei|2 It’s a trivial solution! • (1+ε)-approximate recovery fixes this In some applications, can have 1/ε = 100, log n = 32
Our Results Vs. Previous Work • p = 1 [IR, …] r = O(k log(n/k) / ε) r = O(k log(n/k) ¢ log2(1/ ε) / ε1/2) (randomized) r = (k log(1/ε) / ε1/2) • p = 2: [GLPS] r = O(k log(n/k) / ε) r = (k log(n/k) / ε) Previous lower bounds (k log(n/k)) Lower bounds for randomized constant probability
Comparison to Deterministic Schemes • We getr = O~(k/ε1/2) randomized upper bound for p = 1 • We show(k log (n/k) /ε) for p = 1 for deterministic schemes • So randomized easier than deterministic
Our Sparse-Output Results • Output a vector y from Ax so that |x-y|p· (1+ε) |x-xtop k|p • Sometimes want y to be k-sparse r = ~(k/εp) • Both results tight up to logarithmic factors • Recall that for non-sparse output r = £~(k/εp/2)
Talk Outline • O~(k / ε1/2) upper bound for p = 1 • Lower bounds
Simplifications • Want O~(k/ε1/2) for p = 1 • Replace k with 1 • Sample 1/k fraction of coordinates • Solve the problem for k = 1 on the sample • Repeat O~(k) times independently • Combine the solutions found ε/k, ε/k, …, ε/k, 1/n, 1/n, …, 1/n ε/k, 1/n, …, 1/n
k = 1 • Assume |x-xtop|1 = 1, and xtop = ε • First attempt • Use CountMin [CM] • Randomly partition coordinates into B buckets, maintain sum in each bucket • The expected l1-mass of “noise” in a bucket is 1/B • If B = £(1/ε), most buckets have count < ε/2, but bucket that contains xtop has count > ε/2 • Repeat O(log n) times
Second Attempt • But we wanted O~(1/ε1/2) measurements • Error in a bucket is 1/B, need B ¼ 1/ε • What about CountSketch? [CCF-C] • Give each coordinate i a random ¾(i) 2 {-1,1} • Randomly partition coordinates into B buckets, maintain Σi s.t. h(i) = j¾(i)¢xi in j-th bucket • Bucket error is (Σi top xi2/ B)1/2 • Is this better?
CountSketch • Bucket error Err = (Σ i topxi2 / B)1/2 • All |xi| ·ε and |x-xtop|1 = 1 • Σi top xi2· 1/ ε¢ε2·ε • So Err · (ε/B)1/2 which needs to be at most ε • Solving, B ¸ 1/ ε • CountSketch isn’t better than CountMin
Main Idea • We insist on using CountSketch with B = 1/ε1/2 • Suppose Err = (Σ i top xi2/ B)1/2 = ε • This means Σ i top xi2= ε3/2 • Forget about xtop ! • Let’s make up the mass another way
Main Idea • We have: Σ i top xi2= ε3/2 • Intuition: suppose all xi, i top, are the same or 0 • Then: (# non-zero)*value = 1 (# non-zero)*value2 = ε3/2 • Hence, value = ε3/2and # non-zero = 1/ε3/2 • Sample ε-fraction of coordinates uniformly at random! • value = ε3/2and # non-zero sampled = 1/ε1/2, so l1-contribution = ε • Find all non-zeros with O~(1/ε1/2) measurements
General Setting • Σ i top xi2= ε3/2 • Sj = {i | 1/4j < xi2· 1/4j-1} • Σ i top xi2= ε3/2implies there is a j for which |Sj|/4j = ~(ε3/2) ε3/4 … 16ε3/2, …, 16ε3/2 4ε3/2, …, 4ε3/2 ε3/2, …, ε3/2
General Setting • If |Sj| < 1/ε1/2, then 1/4j > ε2, so 1/2j > ε, can’t happen • Else, sample at rate 1/(|Sj| ε1/2) to get 1/ε1/2elements of |Sj| • l1-mass of |Sj| in sample is > ε • Can we find the sampled elements of Sj? Use Σ i top xi2= ε3/2 • The l22 of the sample is about ε3/2¢ 1/(|Sj| ε1/2) = ε/|Sj| • Using CountSketch with 1/ε1/2 buckets: Bucket error = sqrt{ε1/2¢ε3/2¢1/(|Sj| ε1/2)} = sqrt{ε3/2/|Sj|} < 1/2j since |Sj|/4j > ε3/2
Algorithm Wrapup • Sub-sample O(log 1/ε) times in powers of 2 • In each level of sub-sampling maintain CountSketch with O~(1/ε1/2) buckets • Find as many heavy coordinates as you can! • Intuition: if CountSketch fails, there are many heavy elements that can be found by sub-sampling • Wouldn’t work for CountMin as bucket error could be ε because of n-1 items each of value ε/(n-1)
Talk Outline • O~(k / ε1/2) upper bound for p = 1 • Lower bounds
Our Results • General results: • ~(k / ε1/2) for p = 1 • (k log(n/k) / ε) for p = 2 • Sparse output: • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2 • Deterministic: • (k log(n/k) / ε) for p = 1
Simultaneous Communication Complexity Bob Alice What is f(x,y)? y x MB(y) MA(x) • Alice and Bob send a single message to the referee who outputs f(x,y) with constant probability • Communication cost CC(f) is maximum message length, over randomness of protocol and all possible inputs • Parties share randomness
Reduction to Compressed Sensing • Shared randomness decides matrix A • Alice sends Ax to referee • Bob sends Ay to referee • Referee computes A(x+y), uses compressed sensing recovery algorithm • If output of algorithm solves f(x,y), then # rows of A * # bits per measurement > CC(f)
A Unified View • General results: Direct-Sum Gap-l1 • ~(k / ε1/2) for p = 1 • ~(k / ε) for p = 2 • Sparse output: Indexing • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2 • Deterministic: Equality • (k log(n/k) / ε) for p = 1 Tighter log factors achievable by looking at Gaussian channels
General Results: k = 1, p = 1 • Alice and Bob have x, y, respectively, in Rm • There is a unique i* for which (x+y)i* = d For all j i*, (x+y)j2 {0, c, -c}, where |c| < |d| • Finding i* requires (m/(d/c)2) communication [SS, BJKS] • m = 1/ε3/2, c = ε3/2 , d = ε • Need (1/ε1/2) communication
General Results: k = 1, p = 1 • But the compressed sensing algorithm doesn’t need to find i* • If not then it needs to transmit a lot of information about the tail • Tail a random low-weight vector in {0, ε3/2, - ε3/2}1/ε3 • Uses distributional lower bound and RS codes • Send a vector y within 1-ε of tail in l1-norm • Needs 1/ε1/2 communication
General Results: k = 1, p = 2 • Same argument, different parameters • (1/ε) communication • What about general k?
Handling General k • Bounded Round Direct Sum Theorem [BR] (with slight modification) given k copies of a function f, with input pairs independently drawn from ¹, solving a 2/3 fraction needs communication (k¢CC¹ (f)) ε1/2 ε3/2, …, ε3/2 } k ε3/2, …, ε3/2 ε1/2 … ε1/2 ε3/2, …, ε3/2 Instance for p = 1
Handling General k • CC = (k/ε1/2) for p = 1 • CC = (k/ε) for p = 2 • What is implied about compressed sensing?
Rounding Matrices [DIPW] • A is a matrix of real numbers • Can assume orthonormal rows • Round the entries of A to O(log n) bits, obtaining matrix A’ • Careful • A’x = A(x+s) for “small” s • But s depends on A, no guarantee recovery works • Can be fixed by looking at A(x+s+u) for random u
Lower Bounds for Compressed Sensing • # rows of A * # bits per measurement > CC(f) • By rounding, # bits per measurement = O(log n) • In our hard instances, universe size = poly(k/ε) • So # rows of A * O(log (k/ε)) > CC(f) • # rows of A = ~(k/ε1/2) for p = 1 • # rows of A = ~(k/ε) for p = 2
Sparse-Output Results • Sparse output: Indexing • ~(k/ε) for p = 1 • ~(k/ε2) for p = 2
Sparse Output Results - Indexing What is xi? i 2 {1, 2, …, n} x 2 {0,1}n CC(Indexing) = (n)
(1/ε) Bound for k=1, p = 1 Generalizes to k > 1 to give ~(k/ε) Generalizes to p = 2 to give ~(k/ε2) y = ei x 2 {- ε, ε}1/ε • Consider x+y • If output is required to be 1-sparse must place mass on the i-th coordinate • Mass must be 1+ε if xi = ε, otherwise 1-ε
Deterministic Results • Deterministic: Equality • (k log(n/k) / ε) for p = 1
Deterministic Results - Equality Is x = y? y 2 {0,1}n x 2 {0,1}n Deterministic CC(Equality) = (n)
(k log(n/k) / ε) for p = 1 Choose log n signals x1, …, xlog n, each with k/ε values equal to ε/k x = Σi=1log n 10i xi Choose log n signals y1, …, ylog n, each with k/ε values equal to ε/k y = Σi=1log n 10i yi Consider x-y Compressed sensing output is 0n iff x = y
General Results – Gaussian Channels (k = 1, p = 2) • Alice has a signal x =ε1/2eifor random i 2 [n] • Alice transmits x over a noisy channel with independent • N(0, 1/n) noise on each coordinate • Consider any row vector a of A • Channel output = <a,x> + <a,y>, where <a,y> is N(0, |a|22/n) • Ei[<a,x>2] = ε |a|22/n • Shannon-Hartley Theorem: • I(i; <a,x>+<a,y>) = I(<a,x>; <a,x>+<a,y>) · ½ log(1+ ε) = O(ε)
Summary of Results • General results • £~(k/εp/2) • Sparse output • £~(k/εp) • Deterministic • £(k log(n/k) / ε) for p = 1