210 likes | 234 Views
This paper discusses streaming algorithms for estimating entropy, such as Shannon and Rényi entropy, with applications in network monitoring and anomaly detection. The authors present novel additive and multiplicative approximation algorithms that achieve space efficiency and handle deletions.
E N D
Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson(MIT) Krzysztof Onak (MIT)
Streaming Model m updates Increment x4 Increment x1 x ∈ ℤn x = (0, 0, 0, 0, …, 0) x = (1, 0, 0, 0, …, 0) x = (1, 0, 0, 1, …, 0) x = (9, 2, 0, 5, …,12) Goal: Compute statistics, e.g. ||x||1, ||x||2 … Algorithm Algorithm Algorithm Algorithm Trivial solution: Store x(or store all updates)O(n·log(m)) space Goal: Compute using O(polylog(nm)) space
Streaming Algorithms(a very brief introduction) • Fact: [Alon-Matias-Szegedy ’99], [Bar-Yossef et al. ’02], [Indyk-Woodruff ’05], [Bhuvanagiri et al. ‘06], [Indyk ’06], [Li ’08], [Li ’09] Can compute (1±) = (1±)FpusingO(-2 logc n) bits of space (if 0p2)O(-O(1) n1-2/p ∙logO(1)(n)) bits (if 2<p) • Another Fact: Mostly optimal: [Alon-Matias-Szegedy ‘99], [Bar-Yossef et al. ’02], [Saks-Sun ’02], [Chakrabarti-Khot-Sun ‘03], [Indyk-Woodruff ’03], [Woodruff ’04] • Proofs using communication complexity and information theory
Practical Motivation • General goal: Dealing with massive data sets • Internet traffic, large databases, … • Network monitoring & anomaly detection • Stream consists of internet packets • xi = # packets sent to port i • Under typical conditions, x is very concentrated • Under “port scan attack”, x less concentrated • Can detect by estimating empirical entropy[Lakhina et al. ’05], [Xu et al. ‘05], [Zhao et al. ‘07]
Entropy • Probability distribution a = (a1, a2, …, an) • Entropy H(a) = -Σ ailg(ai) • Examples: • a = (1/n, 1/n, …, 1/n) : H(a) = lg(n) • a = (0, …, 0, 1, 0, …, 0) : H(a) = 0 • small when concentrated, LARGE when not
Streaming Algorithms for Entropy • How much space to estimate H(x)? • [Guha-McGregor-Venkatasubramanian ‘06], [Chakrabarti-Do Ba-Muthu ‘06], [Bhuvanagiri-Ganguly ‘06] • [Chakrabarti-Cormode-McGregor ‘07]:multiplicative (1±) approx: O(-2 log2 m) bits additive approx:O(-2 log4 m) bitsΩ(-2) lower bound for both • Our contributions: • Additive or multiplicative (1±) approximation • Õ(-2 log3 m) bits, and can handle deletions • Can sketch entropy in the same space ~
First Idea If you can estimate Fp for p≈1, then you can estimate H(x) Why? Rényi entropy
Review of Rényi • Definition: • Convergence to Shannon: Hp(x) 0 1 2 … Claude Shannon Alfred Rényi p
Analysis Overview of Algorithm ~ • Set p=1.01 and let x = • Compute • Set • So ~ (using Li’s “compressed counting”) ~ ~ ~ As p1this gets betterthis gets worse!
Making the tradeoff • How quickly does Hp(x) converge to H(x)? • Theorem: Let x be distr., with mini xi ≥ 1/m. Let . Then Let . Then • Plugging in: O(-3 log4 m) bits of space suffice for additive approximation ~ ~ Multiplicative Approximation ~ ~ Additive Approximation ~ ~
Proof: A trick worth remembering • l’Hopital’s rule says that • It actually says more! It says converges toat least as fast as does. Let f : ℝ ℝ and g : ℝ ℝ be such that
LEGEND Shannon Multiple Rényis Single Rényi Hp(x) 0 1 2 … p Improvements • Status: additive approx using O(-3 log4 m) bits • How to reduce space further? • Interpolate with multiple points: Hp1(x), Hp2(x), ...
Analyzing Interpolation • Let f(z) be a Ck+1 function • Interpolate f with polynomial q with q(zi)=f(zi), 0≤i≤k • Fact: where y, zi [a,b] • Our case: Set f(z) = H1+z(x) • Goal: Analyze f(k+1)(z) Hp(x) 0 1 2 … p
~ ~ Define: Bounding Derivatives • Rényi derivatives are messy to analyze • Switch to Tsallis entropy f(z) = S1+z(x), • Can prove Tsallis also converges to Shannon ~ Fact: (when a=-O(1/(k·log m)), b=0) can set k = log(1/ε)+loglog m
Key Ingredient:Noisy Interpolation • We don’t have f(zi), we have f(zi)±ε • How to interpolate in presence of noise? • Idea: we pick our zi very carefully
Chebyshev Polynomials • Rogosinski’s Theorem: q(x) of degree k and |q(βj)|≤ 1 (0≤j≤k) |q(x)| ≤ |Tk(x)| for |x| > 1 • Map [-1,1] onto interpolation interval [z0,zk] • Choose zj to be image of βj, j=0,…,k • Let q(z) interpolate f(zj)±ε and q(z) interpolate f(zj) • r(z) = (q(z)-q(z))/ ε satisfies Rogosinski’s conditions! ~ ~
Tradeoff in Choosing zk Tk grows quickly once leaving [z0, zk] • zk close to 0 |Tk(preimage(0))|still small • …but zk close to 0 high space complexity • Just how close do we need 0 and zk to be? 0 z0 zk
The Magic of Chebyshev • [Paturi ’92]:Tk(1 + 1/kc) ≤ e4k1-(c/2). Set c = 2. • Suffices to set zk=-O(1/(k3log m)) • Translates to Õ(-2 log3 m) space
The Final Algorithm(additive approximation) • Set k = lg(1/) + lglg(m), zj = (k2cos(jπ/k)-(k2+1))/(9k3lg(m)) (0 ≤ j ≤ k) • Estimate S1+zj = (1-(F1+zj/(F1)1+zj))/zj for 0 ≤ j ≤ k • Interpolate degree-k polynomial q(zj) = S1+zj • Output q(0) ~ ~ ~ ~ ~
Multiplicative Approximation • How to get multiplicative approximation? • Additive approximation is multiplicative, unless H(x) is small • H(x) small large [CCM ’07] • Suppose and define • We combine (1±ε)RF1 and (1±ε)RF1+zj to get (1±ε)f(zj) • Question: How do we get (1±ε)RFp? • Two different approaches: • A general approach (for any p, and negative frequencies) • An approach exploiting p ≈ 1, only for nonnegative freqs(better by log(m))
Questions / Thoughts • For what other problems can we use this “generalize-then-interpolate” strategy? • Some non-streaming problems too? • The power of moments? • The power of residual moments?CountMin (CM ’05) + CountSketch (CCF ’02) HSS (Ganguly et al.) • WANTED: Faster moment estimation (some progress in [Cormode-Ganguly ’07])