250 likes | 344 Views
Continuous Distributed Counting for Non-monotonic Streams. Milan Vojnović Microsoft Research Joint work with Zhenming Liu and Božidar Radunović. January 14, 2012. SUM Tracking Problem. : . Maintain estimate . k. 1. 2. 3. SUM:. SUM Tracking: Applications.
E N D
Continuous Distributed Counting for Non-monotonic Streams Milan Vojnović Microsoft Research Joint work with Zhenming Liu and BožidarRadunović January 14, 2012
SUM Tracking Problem : Maintain estimate k 1 2 3 SUM:
SUM Tracking: Applications • Ex 1: database queriesSELECT SUM(AdBids)from Ads • Ex 2: iterative solving input data
Related Work • Count tracking [Huang, Yi and Zhang, 2011] • Worst-case input, monotonic sum • Expected communication cost, for : and lower bound • Lower bound for worst case input[Arackaparambil, Brody and Chakrabarti, 2009] • Expected communication cost:
Questions • Worst case complexity Ex. +1, -1, +1, … • Complexity under random input? • Random permutation • Random i.i.d. • Fractional Brownian motion
Outline • Upper bounds • Lower bounds • Applications
Our Tracker Algorithm • Sampling based algorithm: upon arrival of update , send a message to the coordinator w. p. • If any site sends a message: sync all S S1 S, S1 site S = S1+ … + Sk coordinator S, Sk Mi = 1 Xi Sk S site
Algorithm’s Modes Sample Use two monotonic counters Always report Sample
Communication Cost Upper BoundSingle Site • Input: i.i.d. Bernoulli • Sampling probability with and Expected communication cost:
Proof Key Idea message sent
Communication Cost Upper BoundMultiple Sites • sites • Updates i.i.d. Bernoulli • large enough and Expected communication cost:
Communication Cost Upper BoundUnknown Drift Case • Input: i.i.d. Bernoulli • : unknown drift parameter Expected communication cost:
Communication Cost Upper BoundRandom Permutation Input • Input: a random permutation of values • sufficiently large and Expected communication cost:
Communication Cost Upper BoundFractional Brownian Motion • Input: a fractional Brownian motion with Hurst parameter Expected communication cost:
Outline • Upper bounds • Lower bounds • Applications
Lower BoundsSingle Site, Zero Drift • Input: i.i.d. Bernoulli Expected communication cost:
Lower BoundsMultiple Sites • Input: i.i.d. Bernoulli or a random permutation Expected communication cost:
Proof Key Ideas • Under , maximum deviation 1 2 k
Proof Key Ideas (cont’d) k-input problem: • Query: or ? • Answer: incorrect only if and the answer is under or under • Lemma: messages is necessary to answer the query correctly with a constant positive probability 1 2 k i. i. d.
Outline • Upper bounds • Lower bounds • Applications
App 1: Tracking (cont’d) • Input: random permutation of • , • Problem: track within a prescribed relative tolerance with high probability
AMS Sketch • , 4-wise independent hash function
App 1: tracking (cont’d) • AMS: within w. p. using and • Sum tracking: Expected total communication:
App 2: Bayesian Linear Regression • Feature vector , output • Prior osterior • Sum tracking: • Under random permutation input, the expected communication cost =
Summary • We considered the sum tracking problem with non-monotonic distributed streams under random permutation, random i. i. d. and fractional Brownian motion • Derived a practical algorithm that has order optimal communication complexity