Efficient Algorithms via Precision Sampling

Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak (CMU)

Goal Compute the fraction of Dacians in the empire Estimate S=a1+a2+…an where ai[0,1]

Sampling • Send accountants to a subset J of provinces, |J|=m • Estimator: S̃=∑jJaj* n / m • Chebyshev bound: with 90% success probability 0.5*S – O(n/m) < S̃ < 2*S + O(n/m) • For constant additive error, need m~n

Precision Sampling Framework • Send accountants to each province, but require only approximate counts • Estimate aiup to pre-selected precisionuii.e. |ai–ãi|<ui • Challenge: achieve good tradeoff between • quality of approximation to S • total cost of computing each ãi(within precision ui)

Formalization • What is our cost model? • Here, average cost = 1/n *∑i 1/ui • Achieving precision ui requires 1/ui “resources”: e.g., if ai is itself a sum ai=∑jaij computed by subsampling, then one needs Θ(1/ui) samples • For example, can choose all ui=1/n • Average cost ≈ n • This is best possible, if estimator S̃ = ∑iãi Estimator (Alg) Adversary 1. fix precisions ui 1. fix (hidden) a1,a2,…an • 2. fixã1,ã2,…ãns.t.|ai–ãi|<ui 3. report S̃s.t.|∑iai–S̃| < 1

Precision Sampling Lemma • Goal: estimate ∑aifrom {ãi} satisfying |ai-ãi|<ui. • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost O(log n) • Example: distinguish Σai=3 vsΣai=1 • Consider two extreme cases: • if three ai=1: estimate all ai with crude approx(ui=0.1) • if all ai=3/n: estimate few with good approxui=1/n, the rest with ui=1 ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n)

Precision Sampling Algorithm • Precision Sampling Lemma: can get, with 90% success: • O(1) additive error and 1.5 multiplicative error: S – O(1) < S̃ < 1.5*S + O(1) • with average cost equal to O(log n) • Algorithm: • Choose each ui[0,1] i.i.d. • Estimator: S̃ = count number of i‘s s.t.ãi/ui > 6 (and normalize) • Outline of analysis: • E[S̃] = ∑iPr[ãi/ui > 6] = ∑iPr[ai > (6±1)ui] ≈ ∑ ai/6. • Actually, ãimay have also 1.5-multiplicative error w.r.t. ai • E[1/ui] = O(log n)w.h.p. (after truncation) ε 1+ε S – ε < S̃ < (1+ ε)S + ε O(ε-3 log n) concrete distrib. = minimum of O(ε-3)uniform r.v. function of [ãi/ui- 4/ε]+andui’s

Why? • Save time: • Problem: computing edit distance between two strings [FOCS’10] • new algorithm that obtains (log n)1/ε approximation in n1+O(ε) time • via property-testing-like algorithm using Precision Sampling (recursively) • Save space: • Problem: compute norms/moments of frequencies in a data-stream [FOCS’11] • a simple and unified approach to compute all lp-norms/moments, and related problems

Streaming/sketching 131.107.65.14 Challenge: log statistics of the data, using smallspace 18.0.1.12 131.107.65.14 80.97.56.20 18.0.1.12 80.97.56.20 131.107.65.14

Streaming moments • Setup: • 1+ε estimate frequencies in small space • Let xi= frequency of IP i • pth moment: Σixip • p=1: keep one counter! • p[0,2]: space O(ε-2 ¢log n) [AMS’96, I’00, GC’07, Li’08, NW’10, KNW’10, KNPW’11] • p>2: space Õε(n1-2/p) [AMS’96, SS’02, BJKS’02, CKS’03, IW’05, BGKS’06, BO’11] • Generally, xRn (updates: to coordinate i with ±1) • Sketch = embedding into a “space” of small dimension • Usually, linear L:RnRm for m¿n, thus L(x±ei)=Lx±Lei

lp moments • Theorem: linear sketch for lp with O(1) approximation, and O(n1-2/p log n) space (90% succ. prob.). • =weak embedding of lpninto l∞mof dim m=O(n1-2/p log n) • Sketch: • pick random ui[0,1], ri{±1} and let yi = ri∙xi/ui1/p • throw yi‘s into hash table H with m=O(n1-2/p log n) cells • Estimator: • via PSL or just Maxj[m] |H[j]|p • Randomness: O(1) independence suffices x= H= 1 … m

Under the Hood: Using PSL • Idea: Use PSL to compute the sum ||x||pp=∑i |xi|p • Assume ||x||2=1 by scaling • Set PSL additive error ε small compared to ||x||2p/np/2-1·||x||pp • Outline: • 1. Pick ui’s according to PSL and let yi=xi/ui1/p • 2. Compute every yip=xip/ui within additive approximation 1 • done via heavy hitters of the vector y • 3. Use PSL on |yipui|=|xi|p to compute the sum ∑i |xi|p • Space bound is controlled by the norm ||y||22. • Since heavy hitters under l2 is the best we can do • Notice E||y||22 = ||x||22¢ E[1/u2/p] · (1/ε)2/p=(np/2-1)2/p.

More Streaming Algorithms • Other streaming algorithms: • Same algorithm for all p-moments, including p≤2 • For p>2, improves existing space bounds [AMS96, IW05, BGKS06, BO10] • For p≤2, worse space bounds [AMS96, I00, GC07, Li08, NW10, KNW10, KNPW11] • Algorithms for mixed norms (lp of lq)[CM05, GBD08, JW09] • Space bounded by (Rademacher) p-type constant • Algorithm for lp-sampling problem [MW’10] • This work extended to give tight bounds by [JST’10] • Connections: • Inspired by the streaming algorithm of [IW05], but simpler • Turns out to be distant relative of Priority Sampling [DLT’07]

Finale • Other applications for Precision Sampling framework? • Better algorithms for precision sampling? • For average cost (for 1+ε approximation) • Upper bound: O(ε-3 log n) (tight for our algorithm) • Lower bound:Ω(ε-2 log n) • Bounds for other cost models? • E.g., for 1/square root of precision, the bound is O(ε-3/2) • Other forms of “access” to ai’s? Thank you!

Efficient Algorithms via Precision Sampling

Efficient Algorithms via Precision Sampling

Presentation Transcript

Sampling Lower Bounds via Information Theory

Precision Nutrient Management: Grid-Sampling Basis

Space Efficient Alignment Algorithms

Efficient Algorithms for Matching

Provably Efficient GPU Algorithms

Bug Isolation via Remote Sampling

Random Sampling Algorithms with Applications

Bug Isolation via Remote Program Sampling

Evolving Efficient List Search Algorithms

Energy-Efficient Algorithms

Efficient manufacturing in Lithuania via

Sublinear Algorithms via Precision Sampling

Evolving Efficient List Search Algorithms

Efficient Algorithms Lecture 1

I/O-Efficient Graph Algorithms

Precision Nutrient Management: Grid-Sampling Basis

COMP 308 Parallel Efficient Algorithms

Efficient Algorithms for Motif Search

Chapter 22 Developing Efficient Algorithms

Precision Nutrient Management: Soil Sampling Strategies

Evolving Efficient List Search Algorithms

I/O-Efficient Graph Algorithms