Sketching and Streaming Entropy via Approximation Theory

Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson(MIT) Krzysztof Onak (MIT)

Streaming Model m updates Increment x4 Increment x1 x ∈ ℤn x = (0, 0, 0, 0, …, 0) x = (1, 0, 0, 0, …, 0) x = (1, 0, 0, 1, …, 0) x = (9, 2, 0, 5, …,12) Goal: Compute statistics, e.g. ||x||1, ||x||2 … Algorithm Algorithm Algorithm Algorithm Trivial solution: Store x(or store all updates)O(n·log(m)) space Goal: Compute using O(polylog(nm)) space

Streaming Algorithms(a very brief introduction) • Fact: [Alon-Matias-Szegedy ’99], [Bar-Yossef et al. ’02], [Indyk-Woodruff ’05], [Bhuvanagiri et al. ‘06], [Indyk ’06], [Li ’08], [Li ’09] Can compute (1±) = (1±)FpusingO(-2 logc n) bits of space (if 0p2)O(-O(1) n1-2/p ∙logO(1)(n)) bits (if 2<p) • Another Fact: Mostly optimal: [Alon-Matias-Szegedy ‘99], [Bar-Yossef et al. ’02], [Saks-Sun ’02], [Chakrabarti-Khot-Sun ‘03], [Indyk-Woodruff ’03], [Woodruff ’04] • Proofs using communication complexity and information theory

Practical Motivation • General goal: Dealing with massive data sets • Internet traffic, large databases, … • Network monitoring & anomaly detection • Stream consists of internet packets • xi = # packets sent to port i • Under typical conditions, x is very concentrated • Under “port scan attack”, x less concentrated • Can detect by estimating empirical entropy[Lakhina et al. ’05], [Xu et al. ‘05], [Zhao et al. ‘07]

Entropy • Probability distribution a = (a1, a2, …, an) • Entropy H(a) = -Σ ailg(ai) • Examples: • a = (1/n, 1/n, …, 1/n) : H(a) = lg(n) • a = (0, …, 0, 1, 0, …, 0) : H(a) = 0 • small when concentrated, LARGE when not

Streaming Algorithms for Entropy • How much space to estimate H(x)? • [Guha-McGregor-Venkatasubramanian ‘06], [Chakrabarti-Do Ba-Muthu ‘06], [Bhuvanagiri-Ganguly ‘06] • [Chakrabarti-Cormode-McGregor ‘07]:multiplicative (1±) approx: O(-2 log2 m) bits additive  approx:O(-2 log4 m) bitsΩ(-2) lower bound for both • Our contributions: • Additive  or multiplicative (1±) approximation • Õ(-2 log3 m) bits, and can handle deletions • Can sketch entropy in the same space ~

First Idea If you can estimate Fp for p≈1, then you can estimate H(x) Why? Rényi entropy

Review of Rényi • Definition: • Convergence to Shannon: Hp(x) 0 1 2 … Claude Shannon Alfred Rényi p

Analysis Overview of Algorithm ~ • Set p=1.01 and let x = • Compute • Set • So ~ (using Li’s “compressed counting”) ~ ~ ~ As p1this gets betterthis gets worse!

Making the tradeoff • How quickly does Hp(x) converge to H(x)? • Theorem: Let x be distr., with mini xi ≥ 1/m. Let . Then Let . Then • Plugging in: O(-3 log4 m) bits of space suffice for additive  approximation ~ ~ Multiplicative Approximation ~ ~ Additive Approximation ~ ~

Proof: A trick worth remembering • l’Hopital’s rule says that • It actually says more! It says converges toat least as fast as does. Let f : ℝ ℝ and g : ℝ ℝ be such that

LEGEND Shannon Multiple Rényis Single Rényi Hp(x) 0 1 2 … p Improvements • Status: additive  approx using O(-3 log4 m) bits • How to reduce space further? • Interpolate with multiple points: Hp1(x), Hp2(x), ...

Analyzing Interpolation • Let f(z) be a Ck+1 function • Interpolate f with polynomial q with q(zi)=f(zi), 0≤i≤k • Fact: where y, zi [a,b] • Our case: Set f(z) = H1+z(x) • Goal: Analyze f(k+1)(z) Hp(x) 0 1 2 … p

~ ~ Define: Bounding Derivatives • Rényi derivatives are messy to analyze • Switch to Tsallis entropy f(z) = S1+z(x), • Can prove Tsallis also converges to Shannon ~ Fact: (when a=-O(1/(k·log m)), b=0) can set k = log(1/ε)+loglog m

Key Ingredient:Noisy Interpolation • We don’t have f(zi), we have f(zi)±ε • How to interpolate in presence of noise? • Idea: we pick our zi very carefully

Chebyshev Polynomials • Rogosinski’s Theorem: q(x) of degree k and |q(βj)|≤ 1 (0≤j≤k) |q(x)| ≤ |Tk(x)| for |x| > 1 • Map [-1,1] onto interpolation interval [z0,zk] • Choose zj to be image of βj, j=0,…,k • Let q(z) interpolate f(zj)±ε and q(z) interpolate f(zj) • r(z) = (q(z)-q(z))/ ε satisfies Rogosinski’s conditions! ~ ~

Tradeoff in Choosing zk Tk grows quickly once leaving [z0, zk] • zk close to 0 |Tk(preimage(0))|still small • …but zk close to 0 high space complexity • Just how close do we need 0 and zk to be? 0 z0 zk

The Magic of Chebyshev • [Paturi ’92]:Tk(1 + 1/kc) ≤ e4k1-(c/2). Set c = 2. • Suffices to set zk=-O(1/(k3log m)) • Translates to Õ(-2 log3 m) space

The Final Algorithm(additive approximation) • Set k = lg(1/) + lglg(m), zj = (k2cos(jπ/k)-(k2+1))/(9k3lg(m)) (0 ≤ j ≤ k) • Estimate S1+zj = (1-(F1+zj/(F1)1+zj))/zj for 0 ≤ j ≤ k • Interpolate degree-k polynomial q(zj) = S1+zj • Output q(0) ~ ~ ~ ~ ~

Multiplicative Approximation • How to get multiplicative approximation? • Additive approximation is multiplicative, unless H(x) is small • H(x) small large [CCM ’07] • Suppose and define • We combine (1±ε)RF1 and (1±ε)RF1+zj to get (1±ε)f(zj) • Question: How do we get (1±ε)RFp? • Two different approaches: • A general approach (for any p, and negative frequencies) • An approach exploiting p ≈ 1, only for nonnegative freqs(better by log(m))

Questions / Thoughts • For what other problems can we use this “generalize-then-interpolate” strategy? • Some non-streaming problems too? • The power of moments? • The power of residual moments?CountMin (CM ’05) + CountSketch (CCF ’02)  HSS (Ganguly et al.) • WANTED: Faster moment estimation (some progress in [Cormode-Ganguly ’07])

Sketching and Streaming Entropy via Approximation Theory

Sketching and Streaming Entropy via Approximation Theory

Presentation Transcript

Noise, Information Theory, and Entropy

Noise, Information Theory, and Entropy

CHAPTER 8. Approximation Theory

Ch. 6 - Approximation via Reweighting

Streaming , Sketching and Datamining Panel

Sketching, Sampling and other Sublinear Algorithms: Streaming

Entropy and Information Theory

Approximation via Doubling

Video Streaming via Transcoding

Segmentation via Maximum Entropy Model

4. Function Approximation Theory

Sketching and Streaming Entropy via Approximation Theory

Abstraction and Approximation via Abstract Interpretation:

Approximation Theory

Approximation Theory and Computational Geometry

Replicate Variance Estimation and High Entropy Variance Approximation

Noise, Information Theory, and Entropy (cont.)

Embedding and Sketching Sketching for streaming

Approximation via Doubling

Approximation via Doubling (Part II)

Chapter 8 Approximation Theory

Filter Approximation Theory