Tight Bounds for Distributed Functional Monitoring

Tight Bounds for Distributed Functional Monitoring David Woodruff IBM Almaden Qin Zhang Aarhus University MADALGO

Distributed Functional Monitoring Communication coordinator C P1 P2 P3 … Pk sites inputs: x1 x2 x3 xk Updates: xiÃ xi + ej time Static case vs. Dynamic case Problems on x1 + x2 + … + xk: sampling, p-norms, heavy hitters, compressed sensing, quantiles, entropy Authors: Can, Cormode, Huang, Muthukrishnan, Patt-Shamir, Shafrir, Tirthapura, Wang, Yi, Zhao, many others

Motivation • Data distributed and stored in the cloud • Impractical to put data on a single device • Sensor networks • Communication very power-intensive • Network routers • Bandwidth limitations

Problems What is the randomized communication cost of these problems? I.e., the minimal cost of a protocol, which for every input, fails with probability < 1/3 Static case, Dynamic Case • Which functions f(x1, …, xk) do we care about? • x1, …, xk are non-negative length-n vectors • x = i=1k xi • f(x1, …, xk) = |x|p = (i=1n xip)1/p • |x|0 is the number of non-zero coordinates

Exact Answers • An (n) communication bound for computing |x|p , p  1 • Reduction from 2-Player Set-Disjointness (DISJ) • Alice has a set S µ [n] of size n/4 • Bob has a set T µ [n] of size n/4 with either |S Å T| = 0 or |S Å T| = 1 • Is S Å T = ;? • |X Å Y| = 1 ! DISJ(X,Y) = 1, |X Å Y| = 0 !DISJ(X,Y) = 0 • [KS, R] (n) communication • Prohibitive for applications

Approximate Answers f(x1, …, xk) = (1 ± ε) |x|p What is the randomized communication cost as a function of k, ε, and n? Ignore log(nk/ε) factors

Previous Results Lower bounds in static model, upper bounds in dynamic model (underlying vectors are non-negative) • |x|0: (k + ε-2) and O(k¢ε-2) • |x|p: (k + ε-2) • |x|2: O(k2/ε + k1.5/ε3) • |x|p, p > 2: O(k2p+1n1-2/p ¢ poly(1/ε))

First lower bounds to depend on product of k and ε-2 Our Results Lower bounds in static model, upper bounds in dynamic model (underlying vectors are non-negative) • |x|0: (k + ε-2) and O(k¢ε-2) (k¢ε-2) • |x|p: (k + ε-2) (kp-1¢ε-2). Talk will focus on p = 2 • |x|2: O(k2/ε + k1.5/ε3) O(k¢poly(1/ε)) • |x|p, p > 2: O(k2p+1n1-2/p ¢ poly(1/ε)) O(kp-1¢poly(1/ε)) Upper bound doesn’t depend polynomially on n

Talk Outline • Lower Bounds • Non-zero elements • Euclidean norm • Upper Bounds • p-norm

Previous Lower Bounds • Lower bounds for any p-norm, p != 1 • [CMY] (k) • [ABC] (ε-2) • Reduction from Gap-Orthogonality (GAP-ORT) • Alice, Bob have u, v 2 {0,1}ε-2 , respectively • |¢(u, v) – 1/(2ε2)| < 1/ε or |¢(u, v) - 1/(2ε2)| > 2/ε • [CR, S] (ε-2) communication

Lower Bound for Distinct Elements • Improve bound to optimal (k¢ε-2) • Simpler problem: k-GAP-THRESH • Each site Pi holds a bit Zi • Zi are i.i.d. Bernoulli(¯) • Decide if • i=1k Zi > ¯ k + (¯ k)1/2 or i=1k Zi < ¯ k - (¯ k)1/2 Otherwise don’t care • Rectangle property: for any correct protocol transcript ¿, Z1, Z2, …, Zk are independent conditioned on ¿

A Key Lemma • Lemma: For any protocol ¦ which succeeds w.pr. >.9999, the transcript ¿ is such that w.pr. > 1/2, for at least k/2 different i, H(Zi | ¿) < H(.01 ¯) • Proof: Suppose ¿ does not satisfy this • With large probability, ¯ k - O(¯ k)1/2 < E[i=1k Zi | ¿] < ¯ k + O(¯ k)1/2 • Since the Zi are independent given ¿, i=1k Zi | ¿ is a sum of independent Bernoullis • Since most H(Zi | ¿) are large, by anti-concentration, both events occur with constant probability: i=1k Zi | ¿ > ¯ k + (¯ k)1/2 , i=1k Zi | ¿ < ¯ k - (¯ k)1/2 So ¦ can’t succeed with large probability

Composition Idea Can think of C as a player C DISJ DISJ DISJ DISJ P1 P2 P3 … Pk Zk Z1 Z2 Z3 The input to Pi in k-GAP-THRESH, denoted Zi, is the output of a 2-party Disjointness (DISJ) instance between C and Si - Let X be a random set of size 1/(4ε2) from {1, 2, …, 1/ε2} - For each i, if Zi = 1, then choose Yi so that DISJ(X, Yi) = 1, else choose Yi so that DISJ(X, Yi) = 0 - Distributional complexity (1/ε2) [Razborov]

Putting it All Together • Key Lemma ! For most i, H(Zi | ¿) < H(.01¯) • Since H(Zi) = H(¯) for all i, for most i protocol ¦ solves DISJ(X, Yi) with constant probability • Since the Zi | ¿ are independent, solving DISJ requires communication (ε-2) on each of k/2 copies • Total communication is (k¢ε-2) • Can show a reduction: • |x|0 > 1/(2ε2) + 1/ε if i=1k Zi > ¯ k + (¯ k)1/2 • |x|0 < 1/(2ε2) - 1/ε if i=1k Zi < ¯ k - (¯ k)1/2

Lower Bound for Euclidean Norm • Improve (k + ε-2) bound to optimal (k¢ε-2) • Base problem: Gap-Orthogonality (GAP-ORT(X, Y)) • Consider uniform distribution on (X,Y) • We observe information lower bound for GAP-ORT • Sherstov’s lower bound for GAP-ORT holds for uniform distribution on (X,Y) • [BBCR] + [Sherstov] ! for any protocol ¦ and t > 0, I(X, Y; ¦) = (1/(ε2log t)) or ¦ uses t (1) communication

Information Implications • By chain rule, I(X, Y ; ¦) = i=11/ε2 I(Xi, Yi ; ¦ | X< i, Y< i) = (ε-2) • For most i, I(Xi, Yi ; ¦ | X< i, Y< i) = (1) • Maximum Likelihood Principle: non-trivial advantage in guessing (Xi, Yi)

2-BIT k-Party DISJ We compose GAP-ORT with a variant of k-Party DISJ • Choose a random j 2 [k2] • j doesn’t occur in any Ti • j occurs only in T1, …, Tk/2 • j occurs only in Tk/2+1, …, Tk • j occurs in T1, …, Tk • All j’  j occur in at most one set Ti (assume k ¸ 4) • We show (k) information cost P1 P2 P3 … Pk T1 T2 T3 Tk 2 [k2]

Rough Composition Idea Bits Xi and Yi in GAP-ORT determine output of i-th 2-BIT k-party DISJ instance 2-BIT k-party DISJ instance { GAP -ORT • Information adds (if we condition on enough “helper” variables) • Pi participates in all instances 2-BIT k-party DISJ instance An algorithm for approximating Euclidean norm solves GAP-ORT, therefore solves most 2-BIT k-party DISJ instances 1/ε2 … 2-BIT k-party DISJ instance Show (k/ε2) overall information is revealed

Algorithm for p-norm • We get kp-1 poly(1/ε), improving k2p+1n1-2/p poly(1/ε) for general p and O(k2/ε + k1.5/ε3) for p = 2 • Our protocol is the first 1-way protocol, that is, all communication is from sites to coordinator • Focus on Euclidean norm (p = 2) in talk • Non-negative vectors • Just determine if Euclidean norm exceeds a threshold θ

The Most Naïve Thing to Do • xi is Site i’s current vector • x = i=1k xi • Suppose Site i sees an update xiÃ xi + ej • Send j to Coordinator with a certain probability that only depends on k and θ?

Send each update with probability at least 1/k Communication = O(k), so okay Sample and Send |x|2 = k2 C Suppose x has k4 coordinates that are 1, and may have a unique coordinate which is k2, occurring k times on each site P1 P2 P3 … Pk |x|2 = 2k2 1 … 1 0 … 0 0 … 0 … … … 0 … 0 { 0 … 0 1 … 1 0 … 0 … … … 0 … 0 0 … 0 0 … 0 1 … 1 … … … 0 … 0 … … … … … … … … … … … … … … … 0 … 0 0 … 0 0 … 0 … … … 1 … 1 k • Send update with probability 1/k2 • Will find the large coordinate • But communication is (k2) 1 1 1 1 1

What Is Happening? • Sampling with probability ¼ 1/k2 is good to get a few samples from heavy item • But all the light coordinates are in the way, making the communication (k2) • Suppose we put a barrier of k, that is, sample with probability ¼ 1/k2 but only send an item if it has occurred at least k times on a site • Now communication is O(1) and found heavy coordinate • But light coordinates also contribute to overall |x|2 value

Algorithm for Euclidean Norm • Sample at different scales with different barriers • Use public coin to create O(log n) groups T1, …, Tlog n of the n input coordinates • Tz contains n/2z random coordinates • Suppose Site i sees the update xiÃ xi + ej • For each Tz containing j • If xij > (θ/2z)1/2/k then with probability (2z/θ)1/2¢poly(ε-1 log n), send (j, z) to the coordinator • Expected communication O~(k) • If a group of coordinates contributes to • |x|2, there is a z for which a few coordinates in the group are sampled multiple times

Conclusions • Improved communication lower and upper bounds for estimating |x|p • Implies tight lower bounds for estimating entropy, heavy hitters, quantiles • Implications for data stream model • First lower bound for |x|0 without Gap-Hamming • Useful information cost lower bound for Gap-Hamming, or protocol has very large communication • Improve (n1-2/p/ε2/p) bound for estimating |x|p in a stream to (n1-2/p/ε4/p)

Tight Bounds for Distributed Functional Monitoring