Distributed Top-K Monitoring. Brian Babcock & Chris Olston Presented by Yuval Altman. To be presented at ACM SIGMOD 2003 International Conference on Management of Data. The problem. Continuously report the k largest values obtained from distributed data streams. Motivation -.
The problem Continuously report the k largest values obtained from distributed data streams.
Motivation - • Google is the most popular search engine in the world. • Servers in multiple sites in the world handle millions of queries an hour. • What are the top 20 search terms?
The problem • Continuously report the k largest values obtained from distributed data streams. • Multiple sources - physically far away • Communication is expensive. • Inefficient to transmit large amounts of data • Streaming model • Values change over time • Approximation may be sufficient
Formal problem definition • m+1 nodes: • Monitor nodes: N1, N2 , … , Nm • Coordinator node: N0 • Set of n data objects U = {O1, O2 , … , On} • i.e. Search terms, IP addresses • Objects are associated with real values V1, V2 , … , Vn • i.e. # of requests DNS queries to IP address in last 15 minutes
Distributed streaming model • Updates to values through a sequence of < Oi , Nj , > touples where: • Nj detects a change in the value Vi of Oi. • Change is not seen by other nodes Nk(ki) • For each node j, Define Partial values V1,j, V2,j,…, Vn,j: Vi,j= < Oi , Nj , > () • The value Vifor an object Oi: Vi= j (Vi,j)
Model example U = {O1, O2 , O3 , O4} < O1 , N1 , 2> < O2 , N1 , 3> < O4 , N1 , 4> < O3 , N1 , 2> < O1 , N1 , 1> < O2 , N2 , 3> < O4 , N2 , 5> < O4 , N2 , -2> < O3 , N2 , 4> < O3 , N2 , 5> < O2 , N3 , -1> < O3 , N3 , 4> < O2 , N3 , 2> < O3 , N3 , 3> < O2 , N3 , 5> N1 N2 N3 V1,1 = 3 V2,1 = 3 V3,1 = 2 V4,1 = 4 V1,2 = 0 V2,2 = 3 V3,2 = 9 V4,2 = 3 V1,3 = 0 V2,3 = 6 V3,3 = 7 V4,3 = 0 V1=3 , V2=12 , V3=18 , V4=7
Using the model • Top-k IP addresses in the last 15 minutes: • <IPAddr,Router,1> when receiving a request for an IP address. • A cancelling <IPAddr,Router,-1> 15 minutes afterwards • Can Adopt a different strategy: • <IPAddr, Router, 15> when receiving a request. • <IPAddr, Router, -1> 15 times on the minute
The problem Example=5 1009795 92908887838075 • The coordinator node N0 must report a set TU, |T|=k, that represents the top-k data objects. • Must be the correct within . • Formally. If OtT and OsU-T : Vt+ VS
Related work • One time distributed top-k calculation • Bruno, Gravano, Marian 2002 • Fagin, Lotem, Naor 2001 • Much better than transmitting all the values to coordinator node • Not streaming • no means to detect changes to data • Running algorithm continuously is very expensive • Monitor nodes have limited query capabilities • Sorted (GetNext) and random (GetValue)
Related work • Streaming top-k monitoring from single source • Charikar, Chen, Farach-Colton 2002 • Manku, Motwani 2002 • Gibbons, Matias 1998 • Randomized Algorithms • Focus on minimizing space • Reminder: The objective is to minimize communication costs
Overview of algorithm • Initialize a top-k set at the coordinator node • Set arithmetic constraints at monitor nodes • Depend on current top-k set • Constraints valid No communications • Constraints invalidated Resolution • Possibly new top-k set • Reallocation of constraints
Choosing the constraints • Ideally, data is distributed evenly at monitor nodes, such that the top-k sets are the same • In this case, the global top-k set matches the local local top-k sets • It suffices that local constraints remain valid N1 (US) Money=100Sex=98 Health=94 Mail=92 N2 (Germany) Sex=30Money=20 Mail=5 Health=3 N3 (Japan) Money=50Sex=5 Mail=4 Health=1 Global List Money=170Sex=133 Mail=101 Health=98
Adjustment factors • In real life, data is not distributed evenly <N1,Sex,-8> <N3,Health,5> N1 (US) Money=100Health=94 Mail=92 Sex=90 N2 (Germany) Sex=30Money=20 Mail=5 Health=3 N3 (Japan) Money=50Health=6 Sex=5 Mail=4 Global List Money=170Sex=125 Health=103 Mail=101 • Local constraints are invalidated, but global top-k still valid
Adjustment factors • For each node Njand object Oi associate an adjustment factor i,j • Constraints are evaluated after adding the adjustment factors • If OtT and OsU-T : Vt,i+ t,i Vs,i + t,i • Adjustment factors for each object sum to zero: • This ensures sum remains valid
Adjustment factors example N1 (US) Money=100Health=94 Mail=92 Sex=90 N2 (Germany) Sex=30Money=10 Mail=5 Health=3 N3 (Japan) Money=50Health=6 Sex=5 Mail=4 Global List Money=170Sex=125 Health=103 Mail=101 Sex,1=10, Sex,2=-15, Sex,3=5 N1 (US) Money=100Sex=100 Health=94 Mail=92 N2 (Germany) Money=20 Sex=15Mail=5 Health=3 N3 (Japan) Money=50Sex=10 Health=6 Mail=4 Global List Money=170Sex=125 Health=103 Mail=101
Coordinator adjustment factor • For each object Oj add an adjustment factor j,0at the coordinator node • Factors for each object Ojmust still sum to 0 • To allow error, if OtT and OsU-T : • Give Ot values a “bonus” of • Let Vt,0= Vs,0= 0 • The constraint: t,0+ s,0
Allowing error – example =5 <N3,Health,40> N1 (US) Money=100Sex=98 Health=94 Mail=92 N2 (Germany) Sex=30Money=20 Mail=5 Health=3 N3 (Japan) Money=50Health=41 Sex=5 Mail=4 Global List Money=170Health=138 Sex=133 Mail=101 sex,1=-4, 2,sex,2=-25, sex,3=29 health,2=2, health,3=-7 sex,0 + 5 health,0 The trick: Health,0 =5
Why do adjustment factors work? For OtT and OsU-T : • As long as for each node Ni the adjusted constraints and the coordinator constraint are valid: • Vt,i+ t,i Vs,i + t,I • t,0+ s,0 • We can sum for the nodesand the error constraint and get: Vt+ Vs
Algorithm details • Coordinator node Nomaintains • Current approximate Top-k set • All adjustment factors i,j • Each monitor node Nj maintains • Current approximate top-k set • For each object Oi • Partial value: Vi,j • Relevant adjustment factor: i,j
Algorithm details • Initialization. Coordinator: • Computes the approximate top-k set once. • Chooses adjustment factors • Sends adjustment factors and top-k set to monitors • Monitor node constraints: • For OtT and OsU-T : Vt,j+ t,j Vs,j + t,j • Adjustment factor constraints: • For each object Oi: j (i,j) = 0 • For objects OtT and OsU-T: t,0+ s,0
Algorithm for monitor node Nj Algorithm for monitor node Nj • While (1) • Read tuple < Oi , Nj , > • Vi,j = Vi,j+ • Check constraints: For OtT and OsU-T :Vt,j+ t,j Vs,j + t,j • If invalid, initiate resolution. • End To check constraints: Use two Heaps (or Fibheaps)
Resolution – phase 1 N3 (Japan) Money=50Mail=10 Sex=5 Health=1 Love=0 • First, Njsends a message to N0with: • F - The set of objects involved in violated constraints • All partial values for objects in R = FT • The border value Bf - Maximum adjusted value not in the resolution set F3= {Mail, Sex} R3= {Money,Mail, Sex} Vmoney,3 = 50 Vmail,3= 10 Vsex,3 = 5 B3 = 1
Resolution – phase 2 • The coordinator N0 attempts to resolve the constraints using the *,0 slack • For each violated constraint N0tests: • Vt,j+ t,j+ t,0 + Vs,j + s,j + s,0 • If all tests succeed, the top-k set is valid, and there’s no need to communicate with other nodes. • No reallocates adjustment factors. • Resolution is over • If at least one test fails, proceed to phase 3
Phase 2 resolution example =5 *,* =0 Money=100Sex=98 Mail=96 Health=92 Money=35Sex =20 Mail=5 Health=3 Money=50Sex=5 Mail=4 Health=1 Money=185Sex=123 Mail=105 Health=96 <N2,Mail,17> Money=100Sex=98 Mail=96 Health=92 Money=35Mail=22 Sex =20 Health=3 Money=50Sex=5 Mail=4 Health=1 Money=185 Sex=123 Mail=122 Health=96 To fix: sex,0 =-2 sex,2 =2
Phase 2 resolution failure sex,0 =-2 sex,2 =2 <N2,Sex,5> Money=100Sex=98 Mail=96 Health=92 Money=35Sex =27 Mail=22 Health=3 Money=50Sex=5 Mail=4 Health=1 Money=185 Sex=128 Mail=122 Health=96 <N3,Mail,5> Money=100Sex=98 Mail=96 Health=92 Money=35Sex =27 Mail=22 Health=3 Money=50Mail=9 Sex=5 Health=1 Money=185 Sex=128 Mail=127 Health=96 Can’t “loan” 4 from sex,0
Resolution – phase 3 • The coordinator N0 contacts all the nodes Ni excluding Nj, requesting: • Partial values for objects in R = FT • Border values Bi • N0sums the partial values and sorts them to compute new top-k list T’ • N0 reallocates new adjustment factors for T’ • N0 sends T’ and adjustment factors to all nodes
Resolution – summary • Phase 1 - Njdetects failed constraints and notifies N0. Initiates resolution for R = FT • Phase 2 – N0 attempts to resolve constraints using *,0 – the “bank” • If success, reallocate adjustment factors & stop • Phase 3 - N0 requests all updated partial values for R, sorts, computes new top-k list • Reallocate adjustment factors
Resolution Performance • Means to measure algorithm performance • Messages are usually small • Only resolution set R = FT is involved • Two phase resolution • Initiation + reallocation • Only two messages • Three phase resolution • Initiation + Query + reallocation • 1 + 2(m-1) + m = 3m –1
Adjustment factor reallocation Money=50Mail=10 Sex=5 Health=1 Love=0 • Input: • top-k list T’ • Partial values in resolution set R • Border values • Output • New adjustment factors i,j • Method - For each object: • Meet border value constraints • Calculate leeway • Distribute leeway evenly F = {Mail, Sex} R = {Money,Mail, Sex} Vmoney = 50 Vmail = 10 Vsex = 5 B= 1
Leeway computation • For each object in R compute leeway : the slack above the sum of border values • Define: • Sum of border values: B= j (Bj) • Computed values: Vi = j (Vi,j) • Vi,0 = 0 ; Bj = max (i,0) where Oi not in R • If Oi T’ : i= Vi – B + • Otherwise : i= Vi – B
Leeway computation example N1 (US) Money=100Sex=98 Health=94 Mail=92 Love = 85 N2 (Germany) Sex=30Money=20 Mail=5 Love = 5 Health=3 N3 (Japan) Money=50Mail=10 Sex=5 Health=1 Love=0 Global List Money=170Sex=133 Mail=107 Health=98Love=90 • B = 94+5+1 = 100 • money = 170 – B = 70 • sex = 133 – B = 33 • Mail = 107 – B = 7 =0
Leeway distribution • Initialization: Meet constraints • i,j = Bj- Vi,j • For Oi T’ , j = 0 : i,0 = B0- • Leeway distribution: • i,j = i,j+ (i/ m) • Correctness: Vt,j+ t,j Vs,j + t,j • If Os R: follows from Vt,i, > Bi • If Os R: follows from t,i > s,i
Leeway distribution example N1 (US) Money=100Sex=98 Health=94 Mail=92 Love = 85 N2 (Germany) Sex=30Money=20 Mail=5 Love = 5 Health=3 N3 (Japan) Money=50Mail=10 Sex=5 Health=1 Love=0 Global List Money=170Sex=133 Mail=107 Health=98Love=90 • sex = 33 • sex,1 = B1– Vsex,1 + 33/3 = 94 – 98 + 11 = 7 • sex,2 = B2– Vsex,2 + 33/3 = 5 – 30 + 11 = -14 • sex,3 = B3– Vsex,3 + 33/3 = 1 – 5 + 11 = 7
Leeway distribution example • money = 70 • money,1 = B1– Vmoney,1 + 70/3 = 94 – 100 + 24 = 18 • money,2 = B2– Vmoney,2 + 70/3 = 5 – 20 + 23 = 8 • money,3 = B3– Vmoney,3 + 70/3 = 1 – 50 + 23 = -26 • mail = 7 • mail,1 = B1– Vmail,1 + 7/3 = 94 – 92 + 3 = 5 • mail,2 = B2– Vmail,2 + 7/3 = 5 – 5 + 2 = 2 • mail,3 = B3– Vmail,3 + 7/3 = 1 – 10 + 2 = -7
Reallocation Results N1 (US) Money=100Sex=98 Health=94 Mail=92 Love = 85 N2 (Germany) Sex=30Money=20 Mail=5 Love = 5 Health=3 N3 (Japan) Money=50Mail=10 Sex=5 Health=1 Love=0 Global List Money=170Sex=133 Mail=107 Health=98Love=90 N1 (US) Money=118Sex=105 Mail=97 Health=94 Love = 85 N2 (Germany) Money=28 Sex=16Mail=7 Love = 5 Health=3 N3 (Japan) Money=24 Sex=12 Mail=3 Health=1 Love=0 Global List Money=170Sex=133 Mail=107 Health=98Love=90
Leeway distribution to N0 • Leeway also distributed to monitor node • added to leeway computation for Ot T’ • Initialization for t,0for Ot T’ is B0 - • Any addition can be “loaned” to monitor nodes • Amount distributed to N0 • Higher (i/ 2) – Less chance for phase 3 in resolution • Lower (0) – Less resolutions (More leeway to monitor nodes)
Proportional leeway distribution • Allocate more leeway to monitor nodes updated more often • Top-k likely to change more • Good for monitor notes that exhibit characteristic behavior • Google locations • Enterprise routers
Experiments • Query 1: • FIFA ’98 Servers at 4 locations throughout the world. • 20 top Web site page hit statistics • Query 2: • Most loaded server in a cluster • Single value per monitor node • Query 3: • Berkly to world WAN link, with 4 monitor points • 20 top destination hosts by number outgoing tcp packets
Analysis of results • Allowing error improves results dramatically • Leeway for N0 – Dominant factor • Low – Half leeway to N0 • Low little leeway • Resolutions are bound to happen. Make them less expensive • High – No leeway to N0
Analysis of results • Even / Proportional leeway distribution depends on query. • Server load – Proportional • Berkly WAN – Monitor nodes simulated, so even distribution better • FIFA – Proportional for lower . Even for higher .
Comparison to alternative • Caching • Coordinator holds cached partial data values • Monitor must send update to coordinator when partial value deviates by /2m • Monitor will always have correct partial values, within /2 • Top-k list always correct within
Results: Note the log scale!
Summary • Problem – find top-k set within error • Distributed – multiple sources • Streaming – frequent updates • Naive approach • Transmit streams to coordinator node • If error is allowed, transmit only when deviation from cached value threatens correctness • New approach offers dramatic improvement over naïve approach for low-medium .
Summary • Use adjustment factors to establish constraints • Monitor node initiates resolution when constraint gets broken • Resolution • Attempt to use coordinator node leeway. If successful, fix constraints by adjustment factor reallocation. • Get partial values for resolution set from all nodes, compute new top-k set. Reallocate leeway to all nodes. • Reallocation • Distribute leeway evenly between monitor nodes • Distribute leeway for monitor on on low