280 likes | 391 Views
Timeliness, Failure Detectors, and Consensus Performance. Idit Keidar and Alexander Shraer Technion – Israel Institute of Technology. Basic Model. Message passing Links between every pair of processes do not create, duplicate or alter messages ( integrity ) Process and link failures.
E N D
Timeliness, Failure Detectors,and Consensus Performance IditKeidar andAlexander Shraer Technion – Israel Institute of Technology
Basic Model • Message passing • Links between every pair of processes • do not create, duplicate or alter messages (integrity) • Process and link failures
Eventually Stable (Indulgent) Models • Initially asynchronous • for unbounded period of time • Eventually reach stabilization • GST (Global Stabilization Time) • following GST certain assumptions hold • Examples • ES (Eventual Synchrony) – starting from GST all links have a bound on message delay [Dwork, Lynch, Stockmeyer 88] • failure detectors [Chandra, Toueg 96], [Chandra, Hadzilacos, Toueg 96]
Indulgent Models: Research Trend • Weaken post-GST assumptions as much as possible [Guerraoui, Schiper96], [Aguilera et al. 03, 04], [Malkhi et al. 05] Weaker = better?
You only need ONE machine with eventually ONE timely link. Buy the hardware to ensure it, set the timeout accordingly, and EVERYTHING WILL WORK. Indulgent Models: Research Trend
Why isn’t anything happening ??? Network Don’t worry! It will eventually happen! Consensus with Weak Assumptions
Network Consensus with Weak Assumptions
What’s Going On? • In practice, bounds just need to hold “long enough” for the algorithm (TA) to finish • But TA depends on our synchrony assumptions • with weak assumptions, TA might be unbounded • For practical systems, eventual completion of the job is not enough!
Our Goal • Understand the relationship between: • assumptions (1 timely link, failure detectors, etc.) that eventually hold • performance of algorithms that exploit these assumptions, and only them • Challenge: How do we understand the performance of asynchronous algorithms that make very different assumptions?
Typical Metric: Count “Rounds” • Algorithms normally progress in rounds, though rounds are not synchronized among processes at process pi: forever do send messages receive messages while (!some conditions) compute… • Previous work: • look at synchronous runs (every message takes exactly time) • count rounds or “s” [Keidar, Rajsbaum 01], [Dutta, Guerraoui 02], [Guerraoui, Raynal 04] [Dutta et al. 03], etc.
Are All “Rounds” the Same? • Algorithm 1 waits for messages from a majority that includes a pre-defined leader in each round • takes 3 rounds • Algorithm 2 waits for messages from all (unsuspected) processes in each round • E.g., group membership • takes 2 rounds
GIRAFGeneral Round-based Algorithm Framework • Inspired by Gafni’s RRFD, generalizes it • Organize algorithms into rounds • Separate algorithm logic from waiting condition • Waiting condition defines model • Allows reasoning about lower and upper bounds for rounds of different types
Defining Properties in GIRAF • Environment can have • perpetual properties • eventual properties • In every run r, there exists a round GSR(r) • GSR(r) – the first round from which: • no process fails • all eventual properties hold in each round
Defining Timeliness • Timely link in round k: pd receives the round k message of ps, in round k • if pd is correct, and ps executes round k (end-of-rounds occurs in round k) Time – free!
Some Results: Context • Consensus problem • Global decision time metric • Time until all correct processes decide • Message passing • Crash failures • t < n/2 potential failures out of n>1 processes
◊LM Model: Leader and Majority • Nothing required before GSR • In every round k ≥ GSR • Every correct process receives a round k message from a majority of processes, one of which is the Ω-leader. • Practically requires much shorter timeouts than Eventual Synchrony[Bakr, Keidar]
◊LM: Previous Work • Most Ω-based algorithms wait for majority in each round (not ◊LM) • Paxos [Lamport 98] works for ◊LM • Takes constant number of rounds in Eventual Synchrony (ES) • But how many rounds without ES?
5 . . . 20 Paxos Run in ES ΩLeader (“prepare”,2) (“prepare”,21) (Commit, 21, v1) yes 21 1 2 21 20 21 21 yes 21 21 5 21 . . . . . . no . . . . . . 21 21 20 21 (Commit, 21,v1) BallotNum number of attempts to decide initiated by leaders decide v1
ok ok 1 2 9 no (5) ok 5 9 5 no (8) no (13) 8 8 9 13 13 13 20 20 20 Paxos in ◊LM (w/out ES) ΩLeader Commit may take Ω(n) rounds! (“prepare”,2) (“prepare”,9) (“prepare”,14) 2 9 BallotNum GSR GSR+1 GSR+2 GSR+3
What Can We Hope For? • Tight lower bound for ES: 3 rounds from GSR [DGK05] • ◊LM weaker than ES • One might expect it to take a longer time in ◊LM than in ES
Result 1: Don't Need ES • Leader and majority can give you the same performance! • Algorithm that matches lower bound for ES!
Our ◊LM Algorithm in a Nutshell • Commit with increasing ballot numbers, decide on value committed by majority • like Paxos, etc. • Challenge: Don’t know all ballots, how to choose the new one to be highest one? • Solution: Choose it to be the round number • Challenge: rounds are wasted if a prepare/commit fails. • Solution: pipeline prepares and commits: try in each round • Challenge: do they really need to say no? • Solution: support leader’s prepare even if have a higher ballot number • challenge: higher number may reflect later decision! Won’t agreement be compromised? • solution: new field “trustMe” ensures supported leader doesn't miss real decisions
All DECIDE 1 5 8 101 101 8 8 101 101 13 13 101 101 20 13 101 101 20 101 101 Example Run: GSR=100 <PREPARE, …, trustMe> All PREPARE with !trustMe All COMMIT ΩLeader Did not lead to decision Rounds: GSR GSR+1 GSR+2
Question 2: ◊S and Ω Equivalent? • ◊S and Ω equivalent in the “classical” sense [Chandra, Hadzilacos, Toueg 96] • Weakest for consensus • ◊S: eventually (from GSR onward), • all faulty processes are suspected by every correct process • there exists one correct process that is not suspected by any correct process. • Can we substitute Ω with ◊S in ◊LM?
Result 2: ◊S and Ω not that Equivalent • Consensus takes linear time from GSR • By reduction to mobile failure model[Santoro, Widmayer 89]
Result 3: Do We Need Oracles? • Timely communication with majority suffices! • ◊AFM (All-From-Majority) simplified: • In every round k ≥ GSR, every correct process p receives round k message from a majority of processes, and p’s message reaches a majority of processes. • Decision in 5 rounds from GSR • 1st constant time algorithm w/out oracle or ES • idea: information passes to all nodes in 2 rounds
Result 4: Can We Assume Less? • ◊MFM: Majority from Majority • The rest receive a message from a minority • Only a little missing for ◊AFM • Stronger than models in literature[Aguilera et al. 03, 04], [Malkhi et al. 05] • Bounded time from GSR impossible!
Conclusions • Which guarantees should one implement ? • weaker ≠ better • some previously suggested assumptions are too weak • sometimes a little stronger = much better • worth longer timeouts / better hardware • ES is not essential • not worth longer timeouts / better hardware • future: more models, bounds to explore • GIRAF