210 likes | 253 Views
Paxos. Student Presentation by Jeremy Trimble. Overview. Distributed State Machine Distributed Consensus Fault-tolerance Requirements/Assumptions Paxos Algorithm Concepts The Algorithm Itself Examples Multi-Paxos Variations of Paxos. Distributed State Machine.
E N D
CS 5204 – Operating Systems Paxos Student Presentation by Jeremy Trimble
CS 5204 – Operating Systems Overview • Distributed State Machine • Distributed Consensus • Fault-tolerance • Requirements/Assumptions • Paxos Algorithm • Concepts • The Algorithm Itself • Examples • Multi-Paxos • Variations of Paxos
CS 5204 – Operating Systems Distributed State Machine • Fault-tolerance through replication. • Need to ensure that replicas remain consistent. • Replicas must process requests in the same order.
CS 5204 – Operating Systems The Distributed Consensus Problem • In a distributed system, how can we: • Select a single action among many options? • How can this be done in a fault-tolerant way? • Simple solution: • A single node acts as the “decider.” • But this is not fault tolerant. (What if the decider fails?) • A better solution: Paxos
CS 5204 – Operating Systems Fault-Tolerant Consensus • Requirements • Safety • Only a value that has been proposed may be chosen. • Only a single value is chosen. • A process never learns that a value has been chosen until it actually has been. • Goals • Liveness • FLP Impossibility Proof (1985)
CS 5204 – Operating Systems Assumptions • Failures • “Fail Stop” assumption • When a node fails, it ceases to function entirely. • May resume normal operation when restarted. • Messages • May be lost. • May be duplicated. • May be delayed (and thus reordered). • May not be corrupt. • Stable Storage
CS 5204 – Operating Systems Paxos Terms P1 • Proposal • An alternative proposed by a proposer. • Consists of a unique number and a proposed value. ( 42, B ) • We say a value is chosen when consensus is reached on that value. • Proposer • Suggests values for consideration by Acceptors. • Advocates for a client. • Acceptor • Considers the values proposed by proposers. • Renders an accept/reject decision. • Learner • Learns the chosen value. • In practice, each node will usually play all three roles. A1
CS 5204 – Operating Systems Strong Majority • “Strong Majority” / “Quorum” • A set of acceptors consisting of more than half of all acceptors. • Any two quorums have a nonempty intersection. • Helps avoid “split-brain” problem. • Acceptors decisions are not in agreement. • Common node acts as “tie-breaker.” • In a system with 2F+1 acceptors, F acceptors can fail and we'll be OK. A2 A5 A1 A6 A7 A4 A3 Quorums in a system with seven acceptors.
CS 5204 – Operating Systems Consensus time A1 (N1, V1) (N5, V3) A2 (N2, V2) (N7, V3) A3 (N4, V1) (N6, V3) A4 (N3, V3) A5 (N2, V2) (N7, V3) consensus reached, V3 chosen • Values proposed by proposers are constrained so that once consensus has been reached, all future proposals will carry the chosen value. • P2c . For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of a majority of acceptors such that either: • (a) no acceptor in S has accepted any proposal numbered less than n, or • (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S.
CS 5204 – Operating Systems Basic Paxos Algorithm Phase 1a: “Prepare” Select proposal number* N and send a prepare(N) request to a quorum of acceptors. Phase 1b: “Promise” If N > number of any previous promises or acceptances, * promise to never accept any future proposal less than N, - send a promise(N, U) response (where U is the highest-numbered proposal accepted so far (if any)) Proposer Phase 2a: “Accept!” If proposer received promise responses from a quorum, - send an accept(N, W) request to those acceptors (where W is the value of the highest-numbered proposal among the promise responses, or any value if no promise contained a proposal) Acceptor Phase 2b: “Accepted” If N >= number of any previous promise, * accept the proposal - send an accepted notification to the learner * = record to stable storage
CS 5204 – Operating Systems P1 A2 A3 A1 start prepare(1) prepare(1) promise(1, -) promise(1) accept(1, A) prepare(1) accepted(1, A) prepare(1) time
CS 5204 – Operating Systems P1 P2 A2 A3 A1 accepted(1, A) prepare(1) continued... prepare(2) prepare(1) promise(2, -) promise(2, (1,A)) accept(2, A) prepare(1) accepted(2, A) accepted(2, I) time
CS 5204 – Operating Systems P1 P2 A2 A3 A1 start prepare(1) prepare(1) promise(1) promise(1) prepare(2) prepare(1) promise(2, -) prepare(1) accept(1, C) prepare(1) accepted(1, C) accept(2, D) prepare(1) time accepted(2, D) accepted(1, I)
CS 5204 – Operating Systems P1 P2 A2 A3 A1 start prepare(1) prepare(1) promise(1) promise(1) prepare(2) prepare(2) prepare(1) prepare(1) promise(2, -) prepare(1) accept(1, F) prepare(1) accepted(1, F) prepare(3) prepare(1) ... prepare(4) time prepare(1) ...
CS 5204 – Operating Systems Learning the Chosen Value Acceptors notify some set of learners upon acceptance. Distinguished Learner Other Considerations • Liveness • Can't be guaranteed in general. • Distinguished Proposer • All proposals are funneled through one node. • Can re-elect on failure. • A node may play the role of both distinguished proposer and distinguished learner – we call such a node the master.
CS 5204 – Operating Systems Multi-Paxos • A single instance of Paxos yields a single chosen value. • To form a sequence of chosen values, simply apply Paxos iteratively. • To distinguish, include an instance number in messages. • Facilitates replication of a state machine. chosen value S P ??? O “time” instance 39 40 41 42
CS 5204 – Operating Systems Fast Paxos Clients send accept messages to acceptors. Master is responsible for breaking ties. Reduces message traffic. Paxos Variations • Cheap Paxos • Reconfiguration • Eject failed acceptors. • Fault-tolerant with only F+1 nodes (vs 2F+1). • Failures must not happen too quickly. • Byzantine Paxos • Arbitrary failures – lying, collusion, fabricated messages, selective non-participation. • Adds an extra “verify” phase to the algorithm.
CS 5204 – Operating Systems Conclusion • State-Machine Replication • Distributed Consensus • Basic Paxos • Examples • Optimizations • Variations
CS 5204 – Operating Systems Questions?
CS 5204 – Operating Systems References • Paxos Made Simple • http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FaultTolerance/Paxos-Simple-Lamport.pdf • Paxos Made Live • http://courses.cs.vt.edu/cs5204/fall10-kafura-NVC/Papers/FaultTolerance/Paxos-Chubby.pdf • Wikipedia – Paxos Algorithm • http://en.wikipedia.org/wiki/Paxos_algorithm • The Byzantine Generals Problem • http://research.microsoft.com/en-us/um/people/lamport/pubs/byz.pdf • Impossibility of distributed consensus with one faulty process • http://portal.acm.org/citation.cfm?doid=3149.214121