210 likes | 303 Views
Sketching and Streaming Entropy via Approximation Theory. Nick Harvey (MSR/Waterloo) Jelani Nelson (MIT) Krzysztof Onak (MIT). Streaming Model. m updates. Increment x 4. Increment x 1. x ∈ ℤ n. x = (0, 0, 0, 0, …, 0). x = ( 1 , 0, 0, 0, …, 0). x = (1, 0, 0, 1 , …, 0).
E N D
Sketching and Streaming Entropy via Approximation Theory Nick Harvey (MSR/Waterloo) Jelani Nelson(MIT) Krzysztof Onak (MIT)
Streaming Model m updates Increment x4 Increment x1 x ∈ ℤn x = (0, 0, 0, 0, …, 0) x = (1, 0, 0, 0, …, 0) x = (1, 0, 0, 1, …, 0) x = (9, 2, 0, 5, …,12) Goal: Compute statistics, e.g. ||x||1, ||x||2 … Algorithm Algorithm Algorithm Algorithm Trivial solution: Store x(or store all updates)O(n·log(m)) space Goal: Compute using O(polylog(nm)) space
Streaming Algorithms(a very brief introduction) • Fact: [Alon-Matias-Szegedy ’99], [Bar-Yossef et al. ’02], [Indyk-Woodruff ’05], [Bhuvanagiri et al. ‘06], [Indyk ’06], [Li ’08], [Li ’09] Can compute (1±) = (1±)FpusingO(-2 logc n) bits of space (if 0p2)O(-O(1) n1-2/p ∙logO(1)(n)) bits (if 2<p) • Another Fact: Mostly optimal: [Alon-Matias-Szegedy ‘99], [Bar-Yossef et al. ’02], [Saks-Sun ’02], [Chakrabarti-Khot-Sun ‘03], [Indyk-Woodruff ’03], [Woodruff ’04] • Proofs using communication complexity and information theory
Practical Motivation • General goal: Dealing with massive data sets • Internet traffic, large databases, … • Network monitoring & anomaly detection • Stream consists of internet packets • xi = # packets sent to port i • Under typical conditions, x is very concentrated • Under “port scan attack”, x less concentrated • Can detect by estimating empirical entropy[Lakhina et al. ’05], [Xu et al. ‘05], [Zhao et al. ‘07]
Entropy • Probability distribution a = (a1, a2, …, an) • Entropy H(a) = -Σ ailg(ai) • Examples: • a = (1/n, 1/n, …, 1/n) : H(a) = lg(n) • a = (0, …, 0, 1, 0, …, 0) : H(a) = 0 • small when concentrated, LARGE when not
Streaming Algorithms for Entropy • How much space to estimate H(x)? • [Guha-McGregor-Venkatasubramanian ‘06], [Chakrabarti-Do Ba-Muthu ‘06], [Bhuvanagiri-Ganguly ‘06] • [Chakrabarti-Cormode-McGregor ‘07]:multiplicative (1±) approx: O(-2 log2 m) bits additive approx:O(-2 log4 m) bitsΩ(-2) lower bound for both • Our contributions: • Additive or multiplicative (1±) approximation • Õ(-2 log3 m) bits, and can handle deletions • Can sketch entropy in the same space ~
First Idea If you can estimate Fp for p≈1, then you can estimate H(x) Why? Rényi entropy
Review of Rényi • Definition: • Convergence to Shannon: Hp(x) 0 1 2 … Claude Shannon Alfred Rényi p
Analysis Overview of Algorithm ~ • Set p=1.01 and let x = • Compute • Set • So ~ (using Li’s “compressed counting”) ~ ~ ~ As p1this gets betterthis gets worse!
Making the tradeoff • How quickly does Hp(x) converge to H(x)? • Theorem: Let x be distr., with mini xi ≥ 1/m. Let . Then Let . Then • Plugging in: O(-3 log4 m) bits of space suffice for additive approximation ~ ~ Multiplicative Approximation ~ ~ Additive Approximation ~ ~
Proof: A trick worth remembering • l’Hopital’s rule says that • It actually says more! It says converges toat least as fast as does. Let f : ℝ ℝ and g : ℝ ℝ be such that
LEGEND Shannon Multiple Rényis Single Rényi Hp(x) 0 1 2 … p Improvements • Status: additive approx using O(-3 log4 m) bits • How to reduce space further? • Interpolate with multiple points: Hp1(x), Hp2(x), ...
Analyzing Interpolation • Let f(z) be a Ck+1 function • Interpolate f with polynomial q with q(zi)=f(zi), 0≤i≤k • Fact: where y, zi [a,b] • Our case: Set f(z) = H1+z(x) • Goal: Analyze f(k+1)(z) Hp(x) 0 1 2 … p
~ ~ Define: Bounding Derivatives • Rényi derivatives are messy to analyze • Switch to Tsallis entropy f(z) = S1+z(x), • Can prove Tsallis also converges to Shannon ~ Fact: (when a=-O(1/(k·log m)), b=0) can set k = log(1/ε)+loglog m
Key Ingredient:Noisy Interpolation • We don’t have f(zi), we have f(zi)±ε • How to interpolate in presence of noise? • Idea: we pick our zi very carefully
Chebyshev Polynomials • Rogosinski’s Theorem: q(x) of degree k and |q(βj)|≤ 1 (0≤j≤k) |q(x)| ≤ |Tk(x)| for |x| > 1 • Map [-1,1] onto interpolation interval [z0,zk] • Choose zj to be image of βj, j=0,…,k • Let q(z) interpolate f(zj)±ε and q(z) interpolate f(zj) • r(z) = (q(z)-q(z))/ ε satisfies Rogosinski’s conditions! ~ ~
Tradeoff in Choosing zk Tk grows quickly once leaving [z0, zk] • zk close to 0 |Tk(preimage(0))|still small • …but zk close to 0 high space complexity • Just how close do we need 0 and zk to be? 0 z0 zk
The Magic of Chebyshev • [Paturi ’92]:Tk(1 + 1/kc) ≤ e4k1-(c/2). Set c = 2. • Suffices to set zk=-O(1/(k3log m)) • Translates to Õ(-2 log3 m) space
The Final Algorithm(additive approximation) • Set k = lg(1/) + lglg(m), zj = (k2cos(jπ/k)-(k2+1))/(9k3lg(m)) (0 ≤ j ≤ k) • Estimate S1+zj = (1-(F1+zj/(F1)1+zj))/zj for 0 ≤ j ≤ k • Interpolate degree-k polynomial q(zj) = S1+zj • Output q(0) ~ ~ ~ ~ ~
Multiplicative Approximation • How to get multiplicative approximation? • Additive approximation is multiplicative, unless H(x) is small • H(x) small large [CCM ’07] • Suppose and define • We combine (1±ε)RF1 and (1±ε)RF1+zj to get (1±ε)f(zj) • Question: How do we get (1±ε)RFp? • Two different approaches: • A general approach (for any p, and negative frequencies) • An approach exploiting p ≈ 1, only for nonnegative freqs(better by log(m))
Questions / Thoughts • For what other problems can we use this “generalize-then-interpolate” strategy? • Some non-streaming problems too? • The power of moments? • The power of residual moments?CountMin (CM ’05) + CountSketch (CCF ’02) HSS (Ganguly et al.) • WANTED: Faster moment estimation (some progress in [Cormode-Ganguly ’07])