190 likes | 282 Views
CS 294-8 Consensus http://www.cs.berkeley.edu/~yelick/294. Agenda. Overview and Administrivia Specifications and verification Consensus Practical Issues in Consensus Note: due in part to unreliable network and lack of reliability in ppt, most slides are stolen from Lamport. Administrivia.
E N D
Agenda • Overview and Administrivia • Specifications and verification • Consensus • Practical Issues in Consensus • Note: due in part to unreliable network and lack of reliability in ppt, most slides are stolen from Lamport.
Administrivia • So far: readings in distributed and fault tolerant systems • Next: specifying and reasoning about these systems • Readings for next few weeks will be set by Thursday • For Thursday: • SmartBridge (for talk) • Frangiapani (for discussion)
Course Overview • So far: reading “systems” papers • Next few weeks: reading papers on algorithms and proofs • Why? • I know my algorithm works, but… • I found a missing case when I was implementing… • My advisor (or the PC) doesn’t believe me…
Agenda • Overview and Administrivia • Specifications and verification • Consensus • Practical Issues in Consensus
Highly Available Computing • High availability means either perfection or redundancy. • The system can work even when some parts are broken. • The simplest redundancy is replication: • Several copies of each part. • Each non-faulty copy does the same thing. • Every computing system works as a state machine. • So a replicated state machine can do highly available computing.
Replicated State Machines • If a state machine is deterministic, then feeding two copies the same inputs will produce the same outputs and states. • We call each copy a process. • So all we need is to agree on the inputs. • Examples: • Replicated storage with Read(a) and Write(a, d) steps. • Airplane flight control system with ReadInstrument(i) and RaiseFlaps(d) steps.
State Machine Approach • A distributed system is: • A finite set of processes • A process is: • A set of states, with one initial state • A set of events or actions • An execution is a possibly infinite sequence of alternating states/actions s0 s1 s2 a0 a1 a2
Properties • A stuttering transition has the form s s • A property is a set of executions closed under stuttering [Abadi, Lamport 1990] • The clock still ticks after a program temrinates • Stuttering is also a useful in mapping between levels of abstraction a0
Safety Properties • Informally: A safety properties is one that says something bad doesn’t happen • Formally: A property P is a safety property iff: • If s is in P then any finite prefix of s is in P • Additionally, • If s is not in P then there is some finite prefix of s that is not in P • There is a point at which an illegal transition occurred • Safety properties can be finitely refuted.
Liveness Properties • Informally: A liveness property says something good eventually happens • Formally: A property P is a liveness property iff: • If every finite behavior is a prefix of some behavior in P • Additionally, • Can always “complete” a finite behavior into one that is in P • Safety properties cannot be finitely refuted.
Safety and Liveness • Every property (I.e., every set of behaviors) is the conjunction of: • A safety property and • A liveness property • Due to Alpern and Schneider, based on basic results from Topology
Visible Behavior • A specification identifies a subset of its actions (or its state variables) as externally visible. • A state machine defines a set of allowable executions: • state: a set of values, usually divided into named variables. • actions: named changes in the state; internal and external. • They may be nondeterministic • In fact, Lampson encourages this in specs to allow flexibility in implementations
Implements • Y implements X if • every external behavior of Y is an external behavior of X, • This expresses the idea that Y implements X if you can’t tell Y apart from X by looking only at the external actions • Examples: abstract data types, databases, distributed systems • Note: Lampson implicitly deals with finite behaviors, and therefore states the liveness property separately. (Doesn’t treat liveness in the proofs.)
Agenda • Overview and Administrivia • Specifications and verification • Consensus • Practical Issues in Consensus
Use of Consensus • Agreeing on some value is called consensus. • A replicated state machine needs to agree on a sequence of values: • Input 1 Write(x, 3) • Input 2 Read(x) • . . .
Paxos Assumptions • Each legislator has • A ledger (stable storage) • An hourglasses for time • Communication • Point-to-point, fully connected network • Unreliable: loss and delay allowed • Failures • Legislators may come and go (processor failure) • They are honest – no byzantine failures
Agenda • Overview and Administrivia • Specifications and verification • Consensus • Practical Issues in Consensus
Summary • How to build a highly available system using consensus. • Run a replicated deterministic state machine, and get consensus on each input. • Use leases to replace most of the consensus steps with actions by one process. • The most fault-tolerant algorithm for consensus without real-time guarantees. • Lamport’s “Paxos” algorithm, based on • How to design and understand a concurrent, fault-tolerant system. • Write a simple spec as a state machine. • Define abstract function and show simulation.