460 likes | 579 Views
C. Georgiou, S. Gilbert, R. Guerraoui , D. R. Kowalski. On the Complexity of Asynchronous Gossip. Presented by: Tamar Aizikowitz, Spring 2009. Introduction. In previous lectures, we considered the problem of gossiping in synchronous systems.
E N D
C. Georgiou, S. Gilbert, R. Guerraoui, D. R. Kowalski On the Complexity of Asynchronous Gossip Presented by: Tamar Aizikowitz, Spring 2009
Introduction • In previous lectures, we considered the problem of gossiping in synchronous systems. • It is common to argue that distributive applications are generally synchronous. However, sometimes… • Delay bounds are not known. • Known bounds may be conservative. • Today, we consider gossip in asynchronous systems. • No a priori bounds on message delay and relative processor speeds.
Outline • Model Definition and Assumptions • Lower Bound and “Cost of Asynchrony” • Asynchronous Gossip Algorithms • EARS • SEARS • TEARS • Application for Randomized Consensus
Model Definitions • Asynchronous message-passing • Fixed set of n processes, with known unique identifiers: [n] = {1,…,n} • Direct communication between all processes • Physical communication network ignored • Up to f < ncrash failures • No lost or corrupt messages
Timing Assumptions • Time proceeds in discrete steps. • Each step, some subset of processes are scheduled to take a local step. • Each local step, a process: • As long as a process has not crashed, it will eventually be scheduled for a local step Local Computations
Timing Bounds • For a given execution, we define bounds on delays: • d = maximum message delivery time • If p sent m to q at time t, then q will receive m no later than time t + d (assuming q is not crashed). • Simulates communication delay. • δ = maximum step size • Every δ time steps, every non-crashed process is scheduled at least once. • Simulates relative processor speeds.
Gossip • Every process has a rumor it wants to spread • A gossip protocol must satisfy: • Rumor gathering: eventually, every correct process has collected all rumors of all other correct process. • Validity: any rumor added to a processes’ collection must be the initial rumor of some process. • Quiescence: eventually, every process stops sending messages forever.
Gossip Continued… • Gossip completes when each correct process has: • Received the rumors of all other correct processes • Stopped sending messages • All other processes are crashed. • Note:In an asynchronous system, a process can never terminate. • It cannot be “sure” that it received all messages. • It can, however, stop sending messages.
Complexity Measures • Let A be an asynchronous gossip algorithm. • A has time complexityTas(d,δ) and message complexityMas(d,δ) if for every infinite execution where bounds d and δ hold: • Every correct process completes by expected time Tas • The total number of messages sent is ≤ Mas • If d=δ=1 are known a priori to the algorithm then A is synchronous and Ts, Ms are defined analogously.
Adversary Models • We consider two adversary models… • Adaptive Adversary: • Schedules processes, message deliveries, and crashes, dynamically during the computation. • Determines d and δ bounds. • Knows the distribution of the algorithm’s random choices. • Oblivious Adversary: • Determines schedule beforehand.
Lower Bound The “cost” of asynchrony.
Background • Best results for synchronous gossip: • Time:O(polylogn) • Messages:O(n polylogn) B.S. Chlebus, D.R. Kowalski, Time and Communication Efficient Consensus for Crash Failures (will be presented next week…) • Trivial algorithm for asynchronous gossip: • Time:O(d+δ) • Messages:Θ(n2)
Lower Bound • Theorem 1: For every gossip algorithm A, there exist d,δ≥1 and an adaptive adversary that causes up to f<n failures such that, in expectation, either: • Mas(d,δ) = Ω(n + f2 ) , or • Tas(d,δ) = Ω( f (d+δ)) • In other words… No randomized asynchronous gossip protocol can be both time and message efficient against an adaptive adversary. • Efficient = w.r.t. best known synchronous protocol.
Adversary Strategy • Main Idea: two types of gossiping techniques… • Send to many: • Send to few: message inefficient time inefficient
Proof of Lower Bound • The Ω(n) lower bound for the number of messages is straightforward: • Therefore, we need to show Ω(f 2) for the number of messages or Ω(f(d+δ)) for the time… Every proc. needs to send its rumor to at least one other proc.
Divide and Conquer • Set f = min{f,n/4} • Partition [n] into two sets: • |S1| = n-f/2 and |S2| = f/2 • Execute set S1 with d=δ=1 until all processes in S1 complete, and cease to send messages. S1 S2
Choose Adversary Strategy • Let t be the time at which S1 completes. If t > f: • Fail all processes in S2 Gossip is complete at time t • As d=δ=1 and t > ft = Ω(f(d+δ)) ✓ • If t ≤ f, check whether most processes in S2 send “many” messages or “few” messages. • Apply appropriate adversarial strategy • Many messages Mas(d,δ) = Ω( f2 ) • Few messages Tas(d,δ) = Ω( f (d+ δ))
Examine S2 • For each p in S2 simulate: • p receives all messages sent to it from S1 • p executes f/2 isolated steps, i.e., doesn’t receive any messages • p is promiscuousif, in expectation, p sends at least f/32 messages. • Let PS2denote the set of promiscuous procs.
S2 Mostly Promiscuous • Case 1:|P| ≥ f/4 (most procs. are promiscuous) • At time t, deliver all messages from S1 to S2. • Schedule all processes from S2 in each of the next f/2 time steps δ = 1 • Do not deliver any messages d > f/2 • All processes in S2 have taken f/2 isolated steps • In expectation, each proc. in P sends f/32 messages • Mas(d,δ) =Ω(f/4 ∙ f/32) =Ω( f2 )✓
S2 Mostly Non-promiscuous • Case 2:|P| < f/4 (most procs. not promiscuous) • NonP = S2 – P, i.e. the non-promiscuous procs. • Main idea: find two procs. in NonP with a constant probability of not communicating directly, and make sure they don’t communicate for a “long” time. S1
Finding two Disconnected Procs. • We need to find two processes with a constant probability of not communicating directly… • N(p) = all processes q s.t. p sends a message to q with probability < 1/4 during f/2 isolated steps. • For pNonP, the number of processes not in N(p) is less than f/8. • Else, p sends a message with probability > 1/4 to at least f/8 processes p sends > f/32 messages, which is a contradiction to pNonP.
Finding two Disconnected Procs. • Claim: For pNonP, there are “many” processes in N(p) from NonP. • |N(p) ∩NonP|≥ f/8 |NonP| ≥ f/4 |N(p)| ≤ f/8
Finding two Disconnected Procs. • Consider the following directed graph: • Nodes: processes from NonP at least f/4 nodes • Edges: if qN(p) • Each p has ≥ f/8 outgoing edges • Total of ≥ f/8∙f/4 = f2/32 edges • There are () = f/4 (f/4 - 1)/2 = f2/32 - f/8 pairs. • There exists a bi-directional edge in the graph. • There exist p,qs.t. pN(q) and qN(p). • p,q have a constant probability of not communicating. p q f/4 2
Isolating Two Processes • At time t, fail all processes in S2 except p and q. • Execute p,q for f/2 local steps with d=1. • Fail all processes in S1 that p,q send messages to. S1
Isolating Two Processes Continued… • Pr[p,qdo not communicate] = (1-1/4)(1-1/4)=9/16 • All processes which receive messages in S1 are failed p and q are isolated with probability 9/16. • By Markov’s inequality: the probability that p or q send less than f/8 messages is at least 3/4. • Pr[X ≥ f/8] ≤ f/32/f/8 = 1/4 • With probability 9/16, p and q send at most f/4 messages. • Number failed ≤f/4+ f/2 - 2 =3f/4 – 2 < f .
Proof of Lower Bound Completion! • Using a union bound, the probability that p,q do not communicate and that they send no more than f/4 messages is at least (1-(7/16 + 7/16)) = 1/8. • In this case, gossip is not complete after f/2 local steps, as p and q do not know each other’s rumor. • d=1 and each local step takes δp,q run for time at least (d +δ)f/2 with probability at least 1/8. • In expectation, Tas(d,δ) = Ω(f(d+δ)) .
Cost of Asynchrony • Consider the worst cast ratio between asynchronous algorithms and synchronous ones: • CostT = Tas / min Ts • CostM = Mas / min Ms • Based on Theorem 1 we have: • CostT = Ω( f ) • CostM = Ω(1 + f2/n) • Note: For f = Θ(n) we have either a Θ(n) slowdown or a Θ(n) increase in messages.
Gossip Algorithms EARS SEARS TEARS
Epidemic Asynchronous Rumor Spreading • Each process has the following data: • rp = the rumor of process p • Vp = the set of all rumors known to p • Ip = a set of pairs (r,q) s.t. p knows r was sent to q • Lp = { q | rVp , (r,q) Ip} • Main idea: • Send Vp and Ip to a random process • Update Vp and Ip according to messages received • Use Lp to know when to “sleep”
EARS(rp) • Init:Vp{rp} ; IpØ ; Lp[n] ; sleep_cnt0 • repeat: • for every message m = < V,I > received do • VpVpUm.V; IpIpUm.I • update Lp based on Vp and Ip • ifLp= Ø then sleep_cnt++elsesleep_cnt0 • ifsleep_cnt<Θ(n/n-f log n) then • choose q uniformly at random from [n] • send m = <Vp,Ip> to q • for every r in Vpdo IpIpU (r,q) • update Lp based on Vp and Ip
EARS Analysis • Rumor Gathering: • Every correct process eventually takes a local step and sends its rumor to another process. • Every process that receives this rumor will continue spreading it until it knows that all procs. have received it. • Eventually, w.h.p., every process has received the rumor. • Validity: Only original rp values are gossiped • Quiescence: After all processes have gathered all the rumors, all Lp -s will be empty, and eventually, w.h.p., all processes will go to sleep.
EARS Analysis Continued… • Theorem 6: Algorithm EARS completes gossip w.h.p. under an oblivious adversary with • O(n/n-f log2n(d+δ)) time complexity • O(n log3n(d+δ)) message complexity • Note: for small f and d=δ=1, complexity is comparable to best synchronous algorithm. • O(log2n) time complexity • O(n log3n) message complexity
SpammingEARS • Same as EARS except: • Message is sent to Θ(nε log n) processes • Only one shut-down step • Theorem 7: For every constant ε < 1, algorithm SEARS has, w.h.p. • O(n/ε(n-f) (d+δ)) time complexity • O(n2ε/ε(n-f)log n(d+δ)) message complexity • Note: for f < n/2 we have constant time w.r.t. n. • Intuition: Send more messages each round to save time, but pay with high message complexity.
Two-hop EARS • Majority gossip: Each correct process receives only a majority of the rumors. • Assumption:f < n/2 • Majority gossip is useful for applications such as Consensus… • Main idea: two phase algorithm: • Phase 1: send rumor to a set of processes • Phase 2: every certain number of phase 1 messages received, send all known rumors to a set of processes
TEARS(rp) Sketch • Init: • a4 n1/2 log n; Vp{rp} ; first_cnt0 • set1q, putq in set1 with probability a/n • set2q, putq in set2 with probability a/n • for every q in set1do send m = < Vp , first > to q • for every m received do • VpVp U m.V • if m.flag = firstthen first_cnt ++ • if pred(first_cnt) then // check number of first messages • for every q in set2do send m = < Vp , second > to q
TEARS Correctness • Best case analysis: The sets set1 and set2 are of size a, in expectation. Therefore: • Each process sends its rumor, in expectation, a times in the first phase. • Every process eventually receives a first level messages with processes rumors. • If all processes receives all a first level messages before sending their final second level message, then they will send a second level messages with a rumors. • Every process will receive a2=16n log2n > n/2 .
TEARS Correctness Continued… • Worst case analysis: Using the Chernoff bound, it can be shown, w.h.p., that • A sufficient number of rumors reach a sufficient number of processes in first level messages before they finish their second phase. • These are called well distributed rumors. • These rumors are then sent by “enough” processes in second phase messages. • Therefore, w.h.p., each process receives an additional amount of rumors that complements the number of well distributed rumors to at least a majority.
TEARS Analysis • Theorem 12: Algorithm TEARS completes majority gossip w.h.p. under an oblivious adversary with: • Time complexity:O(d+δ) • Message complexity:O(n7/4 log2n) • Proof of time complexity: • By time δ, all 1st level messages have been sent. • By time δ+d, all these messages have arrived. • By time 2δ+d, all 2nd level messages have been sent. • By time 2δ+2d, all these messages have arrived • Gossip completes in O(d+δ).
The Consensus Problem • n processes, each with an initial value vp. • Each process must choose an output value dp satisfying: • Agreement: All output values are the same. • Validity: Every output value is vp for some p. • Termination: Every process eventually decides and outputs a value, w.h.p. (preferably 1). • Recall: Non-randomized consensus with even one crash failure is impossible.
The Rabin-Canetti Framework • Initially:r = 1 and prefer = vp • while true do • votes get-core(vote,prefer,r) // get votes of majority • let v be the majority of phase r votes • if all phase r votes are vthendpv // decide v • outcomes get-core(outcome,v,r) • if all phase r outcome values are wthenpreferw • elseprefer common-coin() • r ++
Routine get-core • initially:set1 = set2 = set3 = Ø ; values[j] = • when get-cor(val) invoked do • values[i]val • broadcast(1,val) • when (1,v) received from pj • values[j]v • add j to set1 • if | set1| = n-fthen broadcast(2,values) • when (2,V) received from pj • merge V into values • add j to set2 • if |set2| = n-fthen broadcast(3,values) • when (3,V) received from pj • merge V into values • add j to set3 • if |set3| = n-fthenreturn(values)
Implementing get-core using Gossip • Replace broadcast sends with asynchronous gossip. • Majority gossip is sufficient. • Note: gossip start asynchronously (not all processes finish phase 1 and start phase 2 at the same time). • Assuming a process begins gossip as soon as it receives a rumor, the asymptotic complexity remains the same. • To do so, if a process receives a rumor from a gossip protocol it has not yet initiated, it adopts the state of the sender, and proceeds to gossip accordingly.
Analysis of Algorithms • Theorem 13: For an oblivious adversary and f < n/2, consensus algorithms based on EARS, SEARS and TEARS using the Canetti-Rabin framework have the same complexity as the gossip protocols. • In particular: the algorithm based on TEARS has: • O(d+δ) time complexity • O(n7/4 log2n) message complexity • This is the first randomized asynchronous consensus algorithm to terminate in constant time w.r.t. n and with strictly sub-quadratic message complexity.