180 likes | 338 Views
Efficient Algorithms via Precision Sampling. Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU). Goal. Compute the fraction of Dacians in the empire. Estimate S=a 1 +a 2 +…a n where a i [0,1]. Sampling.
E N D
Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU)
Goal Compute the fraction of Dacians in the empire Estimate S=a1+a2+…an where ai[0,1]
Sampling • Send accountants to a subset J of provinces, |J|=m • Estimator: S̃=∑jJaj* n / m • Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S̃ < 2*S + O(n/m) • For constant additive error, need m~n
Precision Sampling Framework • Send accountants to each province, but require only approximate counts • Estimate aiup to pre-selected precisionuii.e. |ai–ãi|<ui • Challenge: achieve good tradeoff between • quality of approximation to S • total cost of computing each ãi(within precision ui)
Formalization • What is our cost model? • Here, average cost = 1/n *∑i 1/ui • Achieving precision ui requires 1/ui “resources”: e.g., if ai is itself a sum ai=∑jaij computed by subsampling, then one needs Θ(1/ui) samples • For example, can choose all ui=1/n • Average cost ≈ n • This is best possible, if estimator S̃ = ∑iãi Estimator (Alg) Adversary 1. fix precisions ui 1. fix (hidden) a1,a2,…an • 2. fixã1,ã2,…ãns.t.|ai–ãi|<ui 3. report S̃s.t.|∑iai–S̃| < 1
Precision Sampling Lemma • Goal: estimate ∑aifrom {ãi} satisfying |ai-ãi|<ui. • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost O(log n) • Example: distinguish Σai=3 vsΣai=1 • Consider two extreme cases: • if three ai=1: estimate all ai with crude approx(ui=0.1) • if all ai=3/n: estimate few with good approxui=1/n, the rest with ui=1 ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n)
Precision Sampling Algorithm • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost equal to O(log n) • Algorithm: • Choose each ui[0,1] i.i.d. • Estimator: S̃ = count number of i‘s s.t.ãi/ui > 6 (and normalize) • Outline of analysis: • E[S̃] = ∑iPr[ãi/ui > 6] = ∑iPr[ai > (6±1)ui] ≈ ∑ ai/6. • Actually, ãimay have also 1.5-multiplicative error w.r.t. ai • E[1/ui] = O(log n)w.h.p. (after truncation) ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n) concrete distrib. = minimum of O(ε-3)uniform r.v. function of [ãi/ui- 4/ε]+andui’s
Why? • Save time: • Problem: computing edit distance between two strings [FOCS’10] • new algorithm that obtains (log n)1/ε approximation in n1+O(ε) time • via property-testing-like algorithm using Precision Sampling (recursively) • Save space: • Problem: compute norms/moments of frequencies in a data-stream [FOCS’11] • a simple and unified approach to compute all lp-norms/moments, and related problems
Streaming/sketching 131.107.65.14 Challenge: log statistics of the data, using smallspace 18.0.1.12 131.107.65.14 80.97.56.20 18.0.1.12 80.97.56.20 131.107.65.14
Streaming moments • Setup: • 1+ε estimate frequencies in small space • Let xi= frequency of IP i • pth moment: Σixip • p=1: keep one counter! • p[0,2]: space O(ε-2 ¢log n) [AMS’96, I’00, GC’07, Li’08, NW’10, KNW’10, KNPW’11] • p>2: space Õε(n1-2/p) [AMS’96, SS’02, BJKS’02, CKS’03, IW’05, BGKS’06, BO’11] • Generally, xRn (updates: to coordinate i with ±1) • Sketch = embedding into a “space” of small dimension • Usually, linear L:RnRm for m¿n, thus L(x±ei)=Lx±Lei
lp moments • Theorem: linear sketch for lp with O(1) approximation, and O(n1-2/p log n) space (90% succ. prob.). • =weak embedding of lpninto l∞mof dim m=O(n1-2/p log n) • Sketch: • pick random ui[0,1], ri{±1} and let yi = ri∙xi/ui1/p • throw yi‘s into hash table H with m=O(n1-2/p log n) cells • Estimator: • via PSL or just Maxj[m] |H[j]|p • Randomness: O(1) independence suffices x= H= 1 … m
Under the Hood: Using PSL • Idea: Use PSL to compute the sum ||x||pp=∑i |xi|p • Assume ||x||2=1 by scaling • Set PSL additive error ε small compared to ||x||2p/np/2-1·||x||pp • Outline: • 1. Pick ui’s according to PSL and let yi=xi/ui1/p • 2. Compute every yip=xip/ui within additive approximation 1 • done via heavy hitters of the vector y • 3. Use PSL on |yipui|=|xi|p to compute the sum ∑i |xi|p • Space bound is controlled by the norm ||y||22. • Since heavy hitters under l2 is the best we can do • Notice E||y||22 = ||x||22¢ E[1/u2/p] · (1/ε)2/p=(np/2-1)2/p.
More Streaming Algorithms • Other streaming algorithms: • Same algorithm for all p-moments, including p≤2 • For p>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] • For p≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] • Algorithms for mixed norms (lp of lq)[CM05, GBD08, JW09] • Space bounded by (Rademacher) p-type constant • Algorithm for lp-sampling problem [MW’10] • This work extended to give tight bounds by [JST’10] • Connections: • Inspired by the streaming algorithm of [IW05], but simpler • Turns out to be distant relative of Priority Sampling [DLT’07]
Finale • Other applications for Precision Sampling framework? • Better algorithms for precision sampling? • For average cost (for 1+ε approximation) • Upper bound: O(ε-3 log n) (tight for our algorithm) • Lower bound:Ω(ε-2 log n) • Bounds for other cost models? • E.g., for 1/square root of precision, the bound is O(ε-3/2) • Other forms of “access” to ai’s? Thank you!