140 likes | 249 Views
Sublinear Algorithms via Precision Sampling. Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU). Goal. Compute the number of Dacians in the empire. Estimate S=a 1 +a 2 +…a n where a i [0,1]. sublinearly…. Sampling.
E N D
Sublinear Algorithms via Precision Sampling AlexandrAndoni (Microsoft Research) joint work with: Robert Krauthgamer(Weizmann Inst.) Krzysztof Onak (CMU)
Goal Compute the number of Dacians in the empire Estimate S=a1+a2+…an where ai[0,1] sublinearly…
Sampling • Send accountants to a subset J of provinces • Estimator: S̃=∑jJaj* n/J • Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S̃ < 2*S + O(n/m) • For constant additive error, need m~n
Precision Sampling Framework • Send accountants to each province, but require only approximate counts • Estimate ãi, up to some pre-selected precisionui: |ai – ãi| < ui • Challenge: achieve good trade-off between • quality of approximation to S • total cost of estimating each ãi to precision ui
Formalization • What is cost? • Here, average cost = 1/n *∑ 1/ui • to achieve precision ui, use 1/ui “resources”: e.g., if ai is itself a sum ai=∑jaij computed by subsampling, then one needs Θ(1/ui) samples • For example, can choose all ui=1/n • Average cost ≈ n • This is best possible, if estimator S̃ = ∑ãi Sum Estimator Adversary 1. fix precisions ui 1. fix a1,a2,…an • 2. fixã1,ã2,…ãns.t. |ai – ãi| < ui 3. given ã1,ã2,…ãn, output S̃ s.t. |∑ai – S̃| < 1.
Precision Sampling Lemma • Goal: estimate ∑aifrom {ãi} satisfying |ai-ãi|<ui. • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost equal to O(log n) • Example: distinguish Σai=5 vsΣai=0 • Consider two extreme cases: • if five ai=1: sample all, but need only crude approx(ui=1/10) if all ai=5/n: only few with good approxui=1/n, and the rest with ui=1 ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n)
Precision Sampling Algorithm • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost equal to O(log n) • Algorithm: • Choose each ui[0,1] i.i.d. • Estimator: S̃ = count number of i‘s s.t. ãi / ui > 6 (modulo a normalization constant) • Proof of correctness: • we use only ãi which are (1+ε)-approximation to ai • E[S̃] ≈ ∑ Pr[ai / ui > 6] = ∑ ai/6. • E[1/u] = O(log n) w.h.p. ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n) concrete distrib. = minimum of O(ε-3) u.r.v. function of [ãi /ui - 4/ε]+and ui’s
Why? • Save time: • Problem: computing edit distance between two strings • new algorithm that obtains (log n)1/ε approximation in n1+O(ε) time • via efficient property-testing algorithm that uses Precision Sampling • More details: see the talk by Robi on Friday! • Save space: • Problem: compute norms/frequency moments in streams • gives a simple and unified approach to compute all lp, Fk moments, and other goodies • More details: now
Streaming frequencies • Setup: • 1+ε estimate frequencies in small space • Let xi= frequency of ethnicity i • kth moment: Σxik • k[0,2]: space O(1/ε2) [AMS’96,I’00, GC07, Li08, NW10, KNW10, KNPW11] • k>2: space Õ(n1-2/k) [AMS’96,SS’02,BYJKS’02,CKS’03,IW’05,BGKS’06,BO10] • Sometimes frequencies xi are negative: • If measuring traffic difference (delay, etc) • We want linear “dim reduction” L:RnRm m<<n
Norm Estimation via Precision Sampling • Idea: • Use PSL to compute the sum ||x||kk=∑ |xi|k • General approach • 1. Pick ui’s according to PSL and let yi=xi/ui1/k • 2. Compute all yik up to additive approximation O(1) • Can be done by computing the heavy hitters of the vector y • 3. Use PSL to compute the sum ||x||kk=∑ |xi|k • Space bound is controlled by the norm ||y||2 • Since heavy hitters under l2 is the best we can do • Note that ||y||2≤||x||2 * E[1/ui]
Streaming Fk moments • Theorem: linear sketch for Fk with O(1) approximation, O(1) update, and O(n1-2/k log n) space (in words). • Sketch: • Pick random ui [0,1], si{±1}, and let yi = si * xi / ui1/k • throw into one hash table H, • size m=O(n1-2/k log n) cells • Update: on (i, a) • H[h(i)] += si*a/ui1/k • Estimator: • Maxj[m] |H[j]|k • Randomness: O(1) independence suffices x= H=
More Streaming Algorithms • Other streaming algorithms: • Algorithm for all k-moments, including k≤2 • For k>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] • For k≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] • Improved algorithm for mixed norms (lp of lk)[CM05, GBD08, JW09] • space bounded by (Rademacher) p-type constant • Algorithm for lp-sampling problem [MW’10] • This work extended to give tight bounds by [JST’11] • Connections: • Inspired by the streaming algorithm of [IW05], but simpler • Turns out to be distant relative of Priority Sampling [DLT’07]
Finale • Other applications for Precision Sampling framework ? • Better algorithms for precision sampling ? • Best bound for average cost (for 1+ε approximation) • Upper bound: O(1/ ε3 * log n) (tight for our algorithm) • Lower bound:Ω(1/ ε2 * log n) • Bounds for other cost models? • E.g., for 1/square root of precision, the bound is O(1 /ε3/2) • Other forms of “access” to ai’s ?