240 likes | 373 Views
Tight Lower Bounds for the Distinct Elements Problem. David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk. 4. 3. 7. 3. 1. 1. 0. The Problem. Stream of elements a 1 , …, a n each in {1, …, m} Want F 0 = # of distinct elements Elements in adversarial order
E N D
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT dpwood@mit.edu Joint work with Piotr Indyk
4 3 7 3 1 1 0 The Problem • Stream of elements a1, …, an each in {1, …, m} • Want F0 = # of distinct elements • Elements in adversarial order • Algorithms given one pass over stream • Goal: Minimum-space algorithm …
A Trivial Algorithm … 4 3 7 3 1 1 0 00000000 10011011 • Keep m-bit characteristic vector v of stream • j in stream $ vj = 1 • F0 = wt(10011011) = 5 • Space = m Can we do better?
Negative Results • Any algorithm computing F0 exactly must use (m) space [AMS96] • Any deterministic alg. that outputs x with |F0 – x| < F0 must use (m) space [AMS96] • What about randomized approximation algorithms?
Rand. Approx. Algorithms for F0 • O(log log m/2 + log m log 1/) alg. outputs x with Pr[| F0 – x| < F0 ] > ¾ [BJKST02] • Lots of hashing tricks Is this optimal? • Previous lower bounds • (log m) [AMS96] • (1/) [Bar-Yossef] • Open Problem of [BJKST02]: GAP: 1/ << 1/2
Idea Behind Lower Bounds Alice Bob y 2 {0,1}m x 2 {0,1}m Stream s(y) Stream s(x) S Internal state of A (1 §) F0 algorithm A (1 §) F0 algorithm A • Compute (1 §) F0(s(x) ± s(y)) w.p. > ¾ • Idea: If can decide f(x,y) w.p. > ¾, space used • by A at least f’s rand. 1-way comm. complexity
Randomized 1-way comm. complexity • Boolean function f: X£Y! {0,1} • Alice has x 2 X, Bob y 2 Y. Bob wants f(x,y) • Only 1 message sent: must be from Alice to Bob • Comm. cost of protocol = expected length of longest message sent over all inputs. • -error randomized 1-way comm. complexity of f, R(f), is comm. cost of optimal protocol computing f w.p. ¸ 1- • How do we lower bound R(f)?
The VC Dimension [KNR] • F = {f : X! {0,1}} family of Boolean functions • f 2F is length-|X | “bit string” • For S µX, shatter coefficient SC(fS) of S is |{f |S}f 2 F| = # distinct bit strings when F restricted to S • SC(F, p) = maxS 2X, |S| = p SC(fS) • If SC(fS) = 2|S|, S shattered by F • VC Dimension of F, VCD(F), = size of largest S shattered by F
Shatter Coefficient Theorem • Notation: For f: X£Y! {0,1}, define: fX = { fx(y) : Y ! {0,1} | x 2X }, where fx(y) = f(x,y) • Theorem [BJKS]: For every f: X £ Y ! {0,1}, every p ¸ VCD( fX ), R1/4(f) = (log(SC(fX, p)))
The (1/) Lower Bound [Bar-Yossef] • Alice has x 2R {0,1}m, wt(x) = m/2 • Bob has y 2R {0,1}m, wt(y) = m and: • Either wt(x Æ y) = 0 OR wt(x Æ y) = m f(x,y) = 0 f(x,y) = 1 • R1/4(f) = (VCD(fX)) = (1/) [Bar-Yossef] • s(x), s(y) any streams w/char. vectors x, y • f(x,y) = 1 ! F0(s(x) ± s(y)) = m/2 • f(x,y) = 0 ! F0(s(x) ± s(y)) = m/2 + m • (1+’)m/2 < (1 - ’)(m/2 + m) for ’ = () • Hence, can decide f ! F0 alg. uses (1/) space
Our Results • Remainder of talk: (1/2) lower bound for = (m-1/(9+k)) for any k > 0. • !O(log log m/2 + log m log 1/) upper bound almost optimal • IDEA: Reduce from protocol for computing dot product
The Promise Problem • t = (1/2), Y = basis of unit vectors of Rt Alice Bob x 2 [0,1]t ||x|| = 1 y 2Y • Promise Problem : • hx,yi = 0 hx,yi = 2/t1/2 • f(x,y) = 0 OR f(x,y) = 1 • X = {x 2 [0,1]t, ||x|| = 1 and 9 y 2Y s.t. (x,y) 2 } • We lower bound R1/4(f) via SC(fX, t)
Bounding SC(fX, t) • Theorem: SC(fX, t/4) = 2(t) • Proof: • 8 T ½ {Y} s.t. |T| = t/4, put xT = (2/t1/2) ¢e 2 T e • Define X1½X as X1 = {xT | T ½ {Y}, |T| = t/4} • Claim: 8 s 2 {0,1}t w/ wt(x) = t/4, s 2 truth tab. of fX1 • Proof: • Let s 2 {0,1}t with 1s in positions i1, …, it/4 • Put T = {ei1, …, eit/4}. 8 e 2 T, he, xTi = 2/t1/2 = 2 • 8 e 2Y - T, h e, xTi = 0 • There are 2(t) such s.
Bounding R1/4(f) • Corollary: • Reduction: we need protocol computing f with communication = space used by any (1 §) F0 approx. alg.
Reduction • Recall: • hx,yi = 0 if f(x,y) = 0 • hx,yi = 2/t1/2 if f(x,y) = 1 • Goal: Reduce “separation” of hx,yi to separation of F0(s(x) ± s(y)) for streams s(x),s(y) Alice/Bob can derive from x,y • Use relation: ||y-x||2 = ||y||2 + ||x||2 – 2hx, yi • f(x,y) = 0 ! ||y-x|| = 21/2 • f(x,y) = 1 ! ||y-x|| < 21/2 (1- 1/t1/2) = 21/2 (1 - ())
Overview of Reduction x 2 [0,1]t ||x|| = 1 y 2 E • Low-distortion embedding • : l2t! l1poly(t) (y) (x) 2. Rational Approximation 3. Scale rationals to integers s 4. Convert integer coords to unary to get {0,1} vectors x’,y’ y’ x’ s(x’) s(y’) F0 Alg State F0 Alg F0(s(x’) ± s(y’)) can decide f(x,y) w.p. ¸ 3/4 F0(s(x’) ± s(y’))
Embedding l2t into l1poly(t) • A (1+)-distortion embedding : l2t! l1d is mapping s.t. 8 p,q 2 l2t, • Theorem [FLM77]: 89 a (1+ )-distortion embedding : l2t! l1d with:
Embedding l2t into l1d x 2 [0,1]t ||x|| = 1 y 2 E Low-distortion embedding : l2t! l1d (y) (x) • Using Theorem [FLM77], Alice/Bob get (x), (y) 2 Rd with d = O(t ¢ (log 1/) / 2): • specified later
Rational Approximation • z = z(t): N ! N; assume z ¸ d • Approximate each coord. of output of embedding by integer multiple of 1/z
Scaling • Alice (resp. Bob) multiplies each coord. of (resp. ) by z • Obtains s( ) (resp. s( ) • Claim: coords. are integers in range [-2z, 2z] • Proof: • | | · |(¢)| + d/z · 2 • |s( )| = z||
Converting to Unary • For i=1 to d • j à s( )i • Replace s( )i with 12z+j02z-j • Bob does same for s( ) • x’, y’ denote new length 4dz bitstrings • wt(x’) = |s()|, wt(y’) = |s( )| • (x’,y’) = |s( ) – s()|
Reducing (x’,y’) to F0 • Alice (Bob) chooses stream ax’ (ay’) with char. vector x’ (y’). • Lemma: If 1 < wt(x’), wt(y’) < 2, then: 1 + (x’,y’)/2 < F0(ax’±ay’) < 2 + (x’,y’)/2 Follows from fact: F0(ax’±ay’) = wt(x’ Ç y’)
Reducing (x’,y’) to F0 • Use lemma to show: • Set = (), z = (1/5 log 1/) so that two cases distinguished by (1 §()) F0 alg
Conclusions • ax’, ay’ must be in universe of size ¸ 4zd = (log (1/)/9) • Reduction only valid if 4zd · m • (1/2) bound for = (m-1/(9+k)) 8 k > 0. • Recently lower bound improved to: • (1/2) for ¸ m-1/2, which is optimal • Find set of vectors directly in Hamming space via involved prob. method argument