1 / 21

Sketching and Streaming Entropy via Approximation Theory

This paper discusses streaming algorithms for estimating entropy, such as Shannon and Rényi entropy, with applications in network monitoring and anomaly detection. The authors present novel additive and multiplicative approximation algorithms that achieve space efficiency and handle deletions.

Download Presentation

Sketching and Streaming Entropy via Approximation Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson(MIT) Krzysztof Onak (MIT)

  2. Streaming Model m updates Increment x4 Increment x1 x ∈ ℤn x = (0, 0, 0, 0, …, 0) x = (1, 0, 0, 0, …, 0) x = (1, 0, 0, 1, …, 0) x = (9, 2, 0, 5, …,12) Goal: Compute statistics, e.g. ||x||1, ||x||2 … Algorithm Algorithm Algorithm Algorithm Trivial solution: Store x(or store all updates)O(n·log(m)) space Goal: Compute using O(polylog(nm)) space

  3. Streaming Algorithms(a very brief introduction) • Fact: [Alon-Matias-Szegedy ’99], [Bar-Yossef et al. ’02], [Indyk-Woodruff ’05], [Bhuvanagiri et al. ‘06], [Indyk ’06], [Li ’08], [Li ’09] Can compute (1±) = (1±)FpusingO(-2 logc n) bits of space (if 0p2)O(-O(1) n1-2/p ∙logO(1)(n)) bits (if 2<p) • Another Fact: Mostly optimal: [Alon-Matias-Szegedy ‘99], [Bar-Yossef et al. ’02], [Saks-Sun ’02], [Chakrabarti-Khot-Sun ‘03], [Indyk-Woodruff ’03], [Woodruff ’04] • Proofs using communication complexity and information theory

  4. Practical Motivation • General goal: Dealing with massive data sets • Internet traffic, large databases, … • Network monitoring & anomaly detection • Stream consists of internet packets • xi = # packets sent to port i • Under typical conditions, x is very concentrated • Under “port scan attack”, x less concentrated • Can detect by estimating empirical entropy[Lakhina et al. ’05], [Xu et al. ‘05], [Zhao et al. ‘07]

  5. Entropy • Probability distribution a = (a1, a2, …, an) • Entropy H(a) = -Σ ailg(ai) • Examples: • a = (1/n, 1/n, …, 1/n) : H(a) = lg(n) • a = (0, …, 0, 1, 0, …, 0) : H(a) = 0 • small when concentrated, LARGE when not

  6. Streaming Algorithms for Entropy • How much space to estimate H(x)? • [Guha-McGregor-Venkatasubramanian ‘06], [Chakrabarti-Do Ba-Muthu ‘06], [Bhuvanagiri-Ganguly ‘06] • [Chakrabarti-Cormode-McGregor ‘07]:multiplicative (1±) approx: O(-2 log2 m) bits additive  approx:O(-2 log4 m) bitsΩ(-2) lower bound for both • Our contributions: • Additive  or multiplicative (1±) approximation • Õ(-2 log3 m) bits, and can handle deletions • Can sketch entropy in the same space ~

  7. First Idea If you can estimate Fp for p≈1, then you can estimate H(x) Why? Rényi entropy

  8. Review of Rényi • Definition: • Convergence to Shannon: Hp(x) 0 1 2 … Claude Shannon Alfred Rényi p

  9. Analysis Overview of Algorithm ~ • Set p=1.01 and let x = • Compute • Set • So ~ (using Li’s “compressed counting”) ~ ~ ~ As p1this gets betterthis gets worse!

  10. Making the tradeoff • How quickly does Hp(x) converge to H(x)? • Theorem: Let x be distr., with mini xi ≥ 1/m. Let . Then Let . Then • Plugging in: O(-3 log4 m) bits of space suffice for additive  approximation ~ ~ Multiplicative Approximation ~ ~ Additive Approximation ~ ~

  11. Proof: A trick worth remembering • l’Hopital’s rule says that • It actually says more! It says converges toat least as fast as does. Let f : ℝ ℝ and g : ℝ ℝ be such that

  12. LEGEND Shannon Multiple Rényis Single Rényi Hp(x) 0 1 2 … p Improvements • Status: additive  approx using O(-3 log4 m) bits • How to reduce space further? • Interpolate with multiple points: Hp1(x), Hp2(x), ...

  13. Analyzing Interpolation • Let f(z) be a Ck+1 function • Interpolate f with polynomial q with q(zi)=f(zi), 0≤i≤k • Fact: where y, zi [a,b] • Our case: Set f(z) = H1+z(x) • Goal: Analyze f(k+1)(z) Hp(x) 0 1 2 … p

  14. ~ ~ Define: Bounding Derivatives • Rényi derivatives are messy to analyze • Switch to Tsallis entropy f(z) = S1+z(x), • Can prove Tsallis also converges to Shannon ~ Fact: (when a=-O(1/(k·log m)), b=0) can set k = log(1/ε)+loglog m

  15. Key Ingredient:Noisy Interpolation • We don’t have f(zi), we have f(zi)±ε • How to interpolate in presence of noise? • Idea: we pick our zi very carefully

  16. Chebyshev Polynomials • Rogosinski’s Theorem: q(x) of degree k and |q(βj)|≤ 1 (0≤j≤k) |q(x)| ≤ |Tk(x)| for |x| > 1 • Map [-1,1] onto interpolation interval [z0,zk] • Choose zj to be image of βj, j=0,…,k • Let q(z) interpolate f(zj)±ε and q(z) interpolate f(zj) • r(z) = (q(z)-q(z))/ ε satisfies Rogosinski’s conditions! ~ ~

  17. Tradeoff in Choosing zk Tk grows quickly once leaving [z0, zk] • zk close to 0 |Tk(preimage(0))|still small • …but zk close to 0 high space complexity • Just how close do we need 0 and zk to be? 0 z0 zk

  18. The Magic of Chebyshev • [Paturi ’92]:Tk(1 + 1/kc) ≤ e4k1-(c/2). Set c = 2. • Suffices to set zk=-O(1/(k3log m)) • Translates to Õ(-2 log3 m) space

  19. The Final Algorithm(additive approximation) • Set k = lg(1/) + lglg(m), zj = (k2cos(jπ/k)-(k2+1))/(9k3lg(m)) (0 ≤ j ≤ k) • Estimate S1+zj = (1-(F1+zj/(F1)1+zj))/zj for 0 ≤ j ≤ k • Interpolate degree-k polynomial q(zj) = S1+zj • Output q(0) ~ ~ ~ ~ ~

  20. Multiplicative Approximation • How to get multiplicative approximation? • Additive approximation is multiplicative, unless H(x) is small • H(x) small large [CCM ’07] • Suppose and define • We combine (1±ε)RF1 and (1±ε)RF1+zj to get (1±ε)f(zj) • Question: How do we get (1±ε)RFp? • Two different approaches: • A general approach (for any p, and negative frequencies) • An approach exploiting p ≈ 1, only for nonnegative freqs(better by log(m))

  21. Questions / Thoughts • For what other problems can we use this “generalize-then-interpolate” strategy? • Some non-streaming problems too? • The power of moments? • The power of residual moments?CountMin (CM ’05) + CountSketch (CCF ’02)  HSS (Ganguly et al.) • WANTED: Faster moment estimation (some progress in [Cormode-Ganguly ’07])

More Related