500 likes | 656 Views
Asynchronous Consensus. Ken Birman. Outline of talk. Reminder about models Asynchronous consensus: Impossibility result Solution to the problem With an “oracle” that detects failures Without oracles, using timeout Big issues? Revisit from Byzantine agreement
E N D
Asynchronous Consensus Ken Birman
Outline of talk • Reminder about models • Asynchronous consensus: Impossibility result • Solution to the problem • With an “oracle” that detects failures • Without oracles, using timeout • Big issues? Revisit from Byzantine agreement • Is this model realistic? In what ways is it “legitimate”? • Should we focus on impossibility, or “possibility”? • Asynchronous consensus in real world systems
Distributed Computing Models • Recall that we had two models • To reason about networks and applications we need to be precise about the setting in which our protocols run • But “real world” networks are very complex • They can drop packets, or reorder them • Intruders might be able to intercept and modify data • Timing is totally unpredictable
Asynchronous network model • Asynchronous because we lack clocks: • Network can arbitrarily delay a message • But we assume that messages are sequenced and retransmitted (arbitrary numbers of times), so they eventually get through. • “Free” to say: lossless, ordered • No value to assumptions about process speed • Failures in asynchronous model? • Usually, limited to process “crash” faults • If detectable, we call this “fail-stop” – but how to detect?
An asynchronous network Not causal!
An asynchronous network Time shrinks…
An asynchronous network Time shrinks… Time stretches…
Justification? • If we can do something in the asynchronous model, we can probably do it even better in a real network • Clocks, a-priori knowledge can only help… • But today we will focus on an impossibility result • By definition, impossibility in this model means “xxx can’t always be done”
Paradigms • Fundamental problems, the solution of which yields general insight into a broad class of questions • In distributed systems: • Agreement (on value proposed by a leader) • Consensus (everyone proposes a value… pick one) • Electing a leader • Atomic broadcast/multicast (send a message, reliably, to everyone who isn’t faulty, such that concurrent messages are delivered in the same order everywhere) • Deadlock detection, clock or process synchronization, taking a snapshot (“picture”) of the system state….
Consensus problem • Models distributed agreement • Comes in various forms (with subtle differences in the associated results)! • With a leader: leader gives an order, like “attack”, and non-faulty participants either attack or do nothing, despite some limited number of failures: Byzantine Agreement • Without a leader: participants have an initial vote; protocol runs and eventually all non-faulty participants chose the same outcome, and it is one of the initial votes (typically, 0 or 1): Fault-tolerant Consensus
Consensus problem P0 Q0 R1 P1 Q1 R1
Fault-tolerance • Goal: an algorithm tolerant of one failure • Failure: process crashes but this is not detectable • So the algorithm must work both in the face of arbitrary message delay caused by the network, and in the event of a single failure
If some process stays up… • Suppose we knew that P won’t fail • Then P could simply broadcast it’s input • All would “decide” upon this value • Solves the problem
If one process stays up • Indeed, suppose that P stays up only long enough to send one message • But there is only one failure • And we knew that P would “lead” • Then we can relay P’s message, using an all-to-all broadcast
Algorithm • P: broadcast my input • Q P: on receiving P’s message for first time, broadcast a copy • Tolerates anything except failure of P in the first step, but we need to agree upon “P” before starting (ie P is the least ranked process, using alphabetic ranking)
Another algorithm • All processes start by broadcasting own value to all other processes • If we know that there is always exactly one failure, could wait until n-1 messages received, then using any deterministic rule • But doesn’t work if sometimes we have one failure, sometimes none
FLP result • Considers general case • Assumes an algorithm that can decide with zero or one failures • Proves that this algorithm can be prevented from reaching decision, indefinitely
Basic idea • Think of system state as a “configuration” • Configuration is v-valent if decision to pick v has become inevitable: all runs lead to v • If not 0-valent or 1-valent, configuration is bivalent • Initial configuration includes • At least one 0-valent: {0,0,0….0} • At least one 1-valent: {1,1,1…..1} • At least one bivalent: {0,0,…1,1}
Basic idea 0-valentconfigurations bi-valentconfigurations 1-valentconfigurations
Transitions between configurations • Configuration is a set of processes and messages • Applying a message to a process changes its state, hence it moves us to a new configuration • Because the system is asynchronous, can’t predict which of a set of concurrent messages will be delivered “next” • But because processes only communicate by messages, this is unimportant
Basic Lemma • Suppose that from some configuration C, the schedules 1, 2 lead to configurations C1 and C2, respectively. • If the sets of processes taking actions in 1 and 2, respectively, are disjoint than 2 can be applied to C1 and 1 to C2, and both lead to the same configuration C3
Basic Lemma C 2 1 C1 C2 2 1 C3
Main result • No consensus protocol is totally correct in spite of one fault • Note: Uses total in formal sense (guarantee of termination)
Basic FLP theorem • Suppose we are in a bivalent configuration now and later will enter a univalent configuration • We can draw a form of frontier, such that a single message to a single process triggers the transition from bivalent to univalent
Basic FLP theorem C e’ e bivalent D0 C1 univalent e’ e D1
Single step decides • They prove that any run that goes from a bivalent state to a univalent state has a single decision step, e • They show that it is always possible to schedule events so as to block such steps • Eventually, e can be scheduled but in a state where it no longer triggers a decision
Basic FLP theorem • They show that we can delay this “magic message” and cause the system to take at least one step, remaining in a new bivalent configuration • Uses the diamond-relation seen earlier • But this implies that in a bivalent state there are runs of indefinite length that remain bivalent • Proves the impossibility of fault-tolerant consensus
Notes on FLP • No failures actually occur in this run, just delayed messages • Result is purely abstract. What does it “mean”? • Says nothing about how probable this adversarial run might be, only that at least one such run exists
FLP intuition • Suppose that we start a system up with n processes • Run for a while… close to picking value associated with process “p” • Someone will do this for the first time, presumably on receiving some message from q • If we delay that message, and yet our protocol is “fault-tolerant”, it will somehow reconfigure • Now allow the delayed message to get through but delay some other message
Key insight • FLP is about forcing a system to attempt a form of reconfiguration • This takes time • Each “unfortunate” suspected failure causes such a reconfiguration
FLP and our first algorithm • P is the leader and is supposed to send its input to Q • Q “times out” and • Tells everyone that P has apparently failed • Then can disseminate its own value • If P wakes up, we re-admit it to the system but it is no longer considered least ranked • One can make such algorithms work… • But they can be attacked by delaying first P, then Q, then R, etc
FLP in the real world • Real systems are subject to this impossibility result • But in fact often are subject to even more severe limitations, such as inability to tolerate network partition failures • Also, asynchronous consensus may be too slow for our taste • And FLP attack is not probable in a real system • Requires a very smart adversary!
Chandra/Toueg • Showed that FLP applies to many problems, not just consensus • In particular, they show that FLP applies to group membership, reliable multicast • So these practical problems are impossible in asynchronous systems, in formal sense • But they also look at the weakest condition under which consensus can be solved
Chandra/Toueg Idea • Separate problem into • The consensus algorithm itself • A “failure detector:” a form of oracle that announces suspected failure • But it can change its mind • Question: what is the weakest oracle for which consensus is always solvable?
Sample properties • Completeness: detection of every crash • Strong completeness: Eventually, every process that crashes is permanently suspected by every correct process • Weak completeness: Eventually, every process that crashes is permanently suspected by some correct process
Sample properties • Accuracy: does it make mistakes? • Strong accuracy: No process is suspected before it crashes. • Weak accuracy: Some correct process is never suspected • Eventual strong accuracy: there is a time after which correct processes are not suspected by any correct process • Eventual weak accuracy: there is a time after which some correct process is not suspected by any correct process
Perfect Detector? • Named Perfect, written P • Strong completeness and strong accuracy • Immediately detects all failures • Never makes mistakes
Example of a failure detector • The detector they call W: “eventually weak” • More commonly: W: “diamond-W” • Defined by two properties: • There is a time after which every process that crashes is suspected by some correct process • There is a time after which some correct process is never suspected by any correct process • Think: “we can eventually agree upon a leader.” If it crashes, “we eventually, accurately detect the crash”
W: Weakest failure detector • They show that W is the weakest failure detector for which consensus is guaranteed to be achieved • Algorithm is pretty simple • Rotate a token around a ring of processes • Decision can occur once token makes it around once without a change in failure-suspicion status for any process • Subsequently, as token is passed, each recipient learns the decision outcome
Rotating a token versus 2-phase commit Propose v… ack… Decide v “phase”
Rotating a token versus 2-phase commit • Their protocol is basically a 2-phase commit • But with n processes, 2PC requires 2(n-1) messages per phase, 3(n-1) total • Passing a token only requires n messages per phase, for 2n total (when nothing fails) • Tolerates f < n/2 failures
Set of problems solvable in: Clock synchronization TRBnon-blocking atomic commitconsensusatomic broadcast reliablebroadcast Synchronous systems Asynchronous using P Asynchronous using W Asynchronous TRB: Byzantine Generals with only crash failures
Building systems with W • Unfortunately, this failure detector is not implementable • Using timeouts we can make mistakes at arbitrary times • But with long enough timeouts, could produce a close approximation to W
Would we want to? • Question: are we solving the right problem? • Pros and cons of asynchronous consensus • Think about an air traffic control application • Find one problem for which asynchronous consensus is a good match • Find one problem for which the match is poor
French ATC system (simplified) Onboard Radar X.500 Directory Controllers Air Traffic Database (flight plans, etc)
Potential applications • Maintaining replicated state within console clusters • Distributing radar data to participants • Distributing data over wide-area links within large geographic scale • Management and control (administration) of the overall system • Distributing security keys to prevent unauthorized action • Agreement when flight control handoffs occur
Broad conclusions? • The protocol seems unsuitable for high availability applications • If the core of the system must make progress, the agreement property itself is too strong • If a process becomes unresponsive might not want to wait for it to recover • Also, since we can’t implement any of these failure detectors, the whole issue is abstract… • Hence real systems don’t try to solve consensus as defined and used in these kinds of protocols!
Value of FLP/Consensus • A clear and elegant problem statement • Highlights limitations • Perhaps with clocks we can overcome them • More likely, we need a different notion of failure • “Crash failure” is too narrow, “unreachable” also treated as failure in many real systems • Caused much debate about real systems
Nature of debate • We’ll see many practical systems soon • Do they • Evade FLP in some way? • Are they subject to FLP? If so, what problem do they “solve”, given that consensus (and most problems reduce to consensus) is impossible to solve? • Or are they subject to even more stringent limitations? • Is fault-tolerant consensus even an issue in real systems?