1 / 13

Sublinear Algorithms via Precision Sampling

Sublinear Algorithms via Precision Sampling. Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU). Goal. Compute the number of Dacians in the empire. Estimate S=a 1 +a 2 +…a n where a i  [0,1]. sublinearly…. Sampling.

elmo
Download Presentation

Sublinear Algorithms via Precision Sampling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sublinear Algorithms via Precision Sampling AlexandrAndoni (Microsoft Research) joint work with: Robert Krauthgamer(Weizmann Inst.) Krzysztof Onak (CMU)

  2. Goal Compute the number of Dacians in the empire Estimate S=a1+a2+…an where ai[0,1] sublinearly…

  3. Sampling • Send accountants to a subset J of provinces • Estimator: S̃=∑jJaj* n/J • Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S̃ < 2*S + O(n/m) • For constant additive error, need m~n

  4. Precision Sampling Framework • Send accountants to each province, but require only approximate counts • Estimate ãi, up to some pre-selected precisionui: |ai – ãi| < ui • Challenge: achieve good trade-off between • quality of approximation to S • total cost of estimating each ãi to precision ui

  5. Formalization • What is cost? • Here, average cost = 1/n *∑ 1/ui • to achieve precision ui, use 1/ui “resources”: e.g., if ai is itself a sum ai=∑jaij computed by subsampling, then one needs Θ(1/ui) samples • For example, can choose all ui=1/n • Average cost ≈ n • This is best possible, if estimator S̃ = ∑ãi Sum Estimator Adversary 1. fix precisions ui 1. fix a1,a2,…an • 2. fixã1,ã2,…ãns.t. |ai – ãi| < ui 3. given ã1,ã2,…ãn, output S̃ s.t. |∑ai – S̃| < 1.

  6. Precision Sampling Lemma • Goal: estimate ∑aifrom {ãi} satisfying |ai-ãi|<ui. • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost equal to O(log n) • Example: distinguish Σai=5 vsΣai=0 • Consider two extreme cases: • if five ai=1: sample all, but need only crude approx(ui=1/10) if all ai=5/n: only few with good approxui=1/n, and the rest with ui=1 ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n)

  7. Precision Sampling Algorithm • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost equal to O(log n) • Algorithm: • Choose each ui[0,1] i.i.d. • Estimator: S̃ = count number of i‘s s.t. ãi / ui > 6 (modulo a normalization constant) • Proof of correctness: • we use only ãi which are (1+ε)-approximation to ai • E[S̃] ≈ ∑ Pr[ai / ui > 6] = ∑ ai/6. • E[1/u] = O(log n) w.h.p. ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n) concrete distrib. = minimum of O(ε-3) u.r.v. function of [ãi /ui - 4/ε]+and ui’s

  8. Why? • Save time: • Problem: computing edit distance between two strings • new algorithm that obtains (log n)1/ε approximation in n1+O(ε) time • via efficient property-testing algorithm that uses Precision Sampling • More details: see the talk by Robi on Friday! • Save space: • Problem: compute norms/frequency moments in streams • gives a simple and unified approach to compute all lp, Fk moments, and other goodies • More details: now

  9. Streaming frequencies • Setup: • 1+ε estimate frequencies in small space • Let xi= frequency of ethnicity i • kth moment: Σxik • k[0,2]: space O(1/ε2) [AMS’96,I’00, GC07, Li08, NW10, KNW10, KNPW11] • k>2: space Õ(n1-2/k) [AMS’96,SS’02,BYJKS’02,CKS’03,IW’05,BGKS’06,BO10] • Sometimes frequencies xi are negative: • If measuring traffic difference (delay, etc) • We want linear “dim reduction” L:RnRm m<<n

  10. Norm Estimation via Precision Sampling • Idea: • Use PSL to compute the sum ||x||kk=∑ |xi|k • General approach • 1. Pick ui’s according to PSL and let yi=xi/ui1/k • 2. Compute all yik up to additive approximation O(1) • Can be done by computing the heavy hitters of the vector y • 3. Use PSL to compute the sum ||x||kk=∑ |xi|k • Space bound is controlled by the norm ||y||2 • Since heavy hitters under l2 is the best we can do • Note that ||y||2≤||x||2 * E[1/ui]

  11. Streaming Fk moments • Theorem: linear sketch for Fk with O(1) approximation, O(1) update, and O(n1-2/k log n) space (in words). • Sketch: • Pick random ui [0,1], si{±1}, and let yi = si * xi / ui1/k • throw into one hash table H, • size m=O(n1-2/k log n) cells • Update: on (i, a) • H[h(i)] += si*a/ui1/k • Estimator: • Maxj[m] |H[j]|k • Randomness: O(1) independence suffices x= H=

  12. More Streaming Algorithms • Other streaming algorithms: • Algorithm for all k-moments, including k≤2 • For k>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] • For k≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] • Improved algorithm for mixed norms (lp of lk)[CM05, GBD08, JW09] • space bounded by (Rademacher) p-type constant • Algorithm for lp-sampling problem [MW’10] • This work extended to give tight bounds by [JST’11] • Connections: • Inspired by the streaming algorithm of [IW05], but simpler • Turns out to be distant relative of Priority Sampling [DLT’07]

  13. Finale • Other applications for Precision Sampling framework ? • Better algorithms for precision sampling ? • Best bound for average cost (for 1+ε approximation) • Upper bound: O(1/ ε3 * log n) (tight for our algorithm) • Lower bound:Ω(1/ ε2 * log n) • Bounds for other cost models? • E.g., for 1/square root of precision, the bound is O(1 /ε3/2) • Other forms of “access” to ai’s ?

More Related