450 likes | 737 Views
Paxos and Replicated State Machine (RSM). Outline. Basic Concepts of Replicated State Machine Paxos Made Simple. Replicated State Machine. We can replicate data, how can we guarantee it is correctly replicated? How can we replicate computing? Using replicated state machine.
E N D
Outline • Basic Concepts of Replicated State Machine • PaxosMade Simple
Replicated State Machine • We can replicate data, how can we guarantee it is correctly replicated? • How can we replicate computing? • Using replicated state machine. • Suppose the process execute operations deterministically (or can be made deterministically). • If a group of server start with the same initial state and execute the same sequence of operations, the final state should be the same.
Replicated State Machine • So, we can start a group of processes and made them to execute the same sequence of operations. Thus using multiple processes can be used for the purpose of fault tolerant. Many uses: lock servers as in the lab, reliable database, reliable replicated file systems
What is the crucial for the RSMs? • The member in RSM should agree on the order of operation series. • Thus, when there are two or more alternative operations, the system should decide which one should be chosen. • Decide means: each one in the RSM agree that they will perform a specific operation and only that operation. • So, the problem can be reduced to a consensus problem: • How can a group of processes can agree on “something”. Lets say here something means the operations (values!) that will be taken by the RSMs. The fundamental is how a group of process agree on a single value.
Consensus: decide a single value • FLP tells us that with one faulty process it is impossible to achieve distributed consensus. • Michael J. Fischer, Nancy Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382, April 1985. • FLP is valid for (all) general environment (asynchronous network) • In practical, the situation is not that bad. We’ve spent a lot of money on clusters and networks, they should not work that poor! • Paxos works for a partial asynchrony environment (which is a practical assumption) and can achieve consensus eventually. • e.g. works for the cluster environment. • Paxos can be used to decide a value among a group of process (nodes, computers) • LesliLamport, Paxos Made Simple, 01 Nov 2001
Paxos • Paxos makes a group of processes agree on the same value despite process failures, network failures, and network delays. • Thus, it can be used as the building block for RSM • All processes in the group will decide the same value for the next operation. • To make the algorithm meaningful, we discard the consensus on trivial solution: lock a predefined value in each process and make them just accept that value.
The overall structure of Paxos • Assume a collection of processes that can propose values. (Where do values come from? Someone has to make proposals. We call them as Proposers.) • Someone should accept a proposal or reject it. Based on whether accepters accept or reject a value, the value might be chosen. We call them as Accepters. • If a value has been chosen, then processes should be able to learn the chosen value. (Eventually someone will know that the system has decided on a value. We call the persons to learn the current states of system as Learners.) • No oracle, each process can only do the work based on the steps described by the ‘possible’ algorithm, using the local states as well as the messages received from others. • Proposers, Accepters as well as Learners (Agents) are the three roles in the consensus algorithm. In an implementation, a single process may act as more than one agent.
Again, the goal • The goal is to ensure that some proposed value is eventually chosen and, if a value has been chosen, then processes can eventually learn the value. • (Chosen means something (the single value) has been decided (locked in the system.) • And eventually means no time bound we can know that a value is chosen
Recall the two properties of distributed algorithms • Safety (Correctness) • Bad things never happen. • Any process in the group should not decide a different value than others. • The value should be meaningful. ( NOP for all operations! Only proposed value can be chosen) • Liveness • Good things eventually happen • Eventually, the process will all agree on a single value.
Safety requirements for consensus: • Only a value that has been proposed may be chosen • Only a single value is chosen • A process never learns that a value has been chosen unless it actually has been • Liveness means eventually a single value will be chosen and we will leave this issue to the end of this lecture.
Assumptions of Paxos • Agents can communicate with one another by sending messages • Agents operate at arbitrary speed, may fail by stopping, and may restart. Since all agents may fail after a value is chosen and then restart, a solution is impossible unless some information can be remembered by an agent that has failed and restarted (by using hard disks) • Messages can take arbitrarily long to be delivered, can be duplicated and can be lost, but they are not corrupted (no byzantine fault, no code penetration) • This model can fit to some practical environments such as clusters in a data center.
The consensus is hard • Network failure • Process failure • Network delay • Membership change: A process join and leave the system
A single accepter? • How: • a proposer sends a proposal to the accepter, and the accepter chose the first proposed value that it receives • Work? • No, the single accepter can fail • So, if an algorithm might work, it should use multiple accepters. • A proposer sends a proposed value to a set of acceptors. An acceptor may (or may not) accept the proposed value. • Chosen (Decided): The value is chosen when a large enough set of acceptors have accepted it.
How large is large enough? • Chosen (Decided): The value is chosen when a large enough set of acceptors have accepted it. • To ensure that only a single value is chosen, we can let a large enough set consist of any majority of the agents. • Because any two majorities have at least one acceptor in common, this works if an acceptor can accept at most one value. • You can define this in some other way as “majority” or “large enough”.
First requirement we should meet • We are very lucky that there is no failure, no message loss, no network delay. Everything works very well. We want a value to be chosen even if only one value is proposed by a single proposer. (Everything works very well, of course we can expect this.) • This suggests the requirement: (if an algorithm really works) • P1. An acceptor must accept the first proposal that it receives.
But…… Every accepter has accepted a value, but no single value is accepted by a “Majority” of them. Even only two proposed values, failure of a single accepter could make it impossible to learn which of the values was chosen.
Proposal • P1 and the requirement that a value is chosen only when it is accepted by a majority of acceptors imply that an acceptor must be allowed to accept more than one proposal. • An algorithm which might work should use multiple proposals. We can differentiate the proposals by tagging with a natural number. • A proposal: <proposal_number, proposal_value> • Different proposals have different numbers (but may have the same values).
Different proposals from different proposers One method, you can define others. What is the way in the lab?
Proposal Chosen? • A value is chosen when a single proposal with that value has been accepted by a majority of the acceptors. Notice that we say that a proposal is chosen which means both number and value. In that case, we say that the proposal (as well as its value) has been chosen. • We have not discussed any algorithm until now. Image that for a specific accepter, it can accept multiple proposals. Thus, we can allow multiple proposals to be chosen. • However: • P2. If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v. (If a proposal is chosen, the value should not be destroyed by the future execution of the algorithm. In distributed environment, the algorithm might execute for ever until somebody tell them to stop.) • Since numbers are totally ordered, condition P2 guarantees the crucial safety property that only a single value is chosen.
Strengthening P2 requirements: • To be chosen, a proposal must be accepted by at least one acceptor. We can satisfy P2 by satisfying: • P2a. If a proposal with value v is chosen, then every higher-numbered proposal accepted by any acceptor has value v. • But, someone might propose another value after the step of chosen value v with the proposal number n. Further strengthening: • P2b. If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value v. • So, if we meet P2b, we can meet P2a and then meet P2. • What does P2b mean? • A proposer should not make its proposal arbitrary. It should do something before making its proposal. Learning the history is easy, predict the future is difficult.
How to get P2b? • We would assume that some proposal with number m and value v is chosen and show that any proposal issued with number n > m also has value v. Using induction on n, assume every proposal issued with a number in m … (n-1) has value v. • For the proposal numbered m to be chosen, there must be some set C consisting of a majority of acceptors such that every acceptor in C accepted it. • Thus: • Every acceptor in C has accepted a proposal with number in m … (n-1), and every proposal with number in m…(n-1) accepted by any acceptor has value v.
How to make the proposals? • Since any set S consisting of a majority of accepters contains at least one member of C, we can conclude that a proposal numbered n as value v by ensuring that the following invariant is maintained: • P2c. For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of a majority of acceptors such that either (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S. • Let the proposer learn something first and then make the proposals.
Paxos Algorithm • Until now, we have not discuss any algorithm yet. Lets see Paxos then. • Proposer : Prepare proposals • Accepter : Accept or reject (not accept) proposals • Learner : Learn the current status of the system. If the value is decided, notify the person who are supposed to know the value
Step 1: Prepare (a) A proposer selects a proposal number n and sends a PREPARE request with number n to a majority of acceptors.
Step 2: Promise • PROMISE n – Acceptor will accept proposals only numbered nor higher • Proposer 1 is ineligible because a quorum has voted for a higher number than j (b) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted. P1a . An acceptor can accept a proposal numbered n iff it has not responded to a prepare request having a number greater than n.
Step 3: Accept! (a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an ACCEPT request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals.
Step 4: Accepted (b) If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a PREPARE request having a number greater than n.
Learning values If a learner interrogates the system, a quorum will respond with fact V_k A learner will send LEARN request to all (or majority) of the accepters. Acceptors will response with the accepted proposals. If a proposal is accepted by the majority of accepters, this proposal is the decided one.
Proposer Code • struct proposal {number, value} //n>=1 • proposer_make_proposal(n, pvalue) • send(PREPARE, n) to a majority of accepters; • wait until [received (ACK-PREPARE, proposal) from a majority of accepters] • received_proposals= [all received proposals] • old_max_proposal = a proposal in received_roposals with the maximal proposal number • if old_max_proposal.number > n • abandon_making_proposal; return; //abandon • if old_max_proposal == null • newproposal= (n, pvalue); • else • newproposal= (n, old_max_proposal.value); • send(ACCEPT, new proposal) to a majority of accepters; //or all accepters
Accepters response to PREPARE • old_prepare_number; • accepted_proposals; • accepter_on_receive_prepare(PREPARE,number,proposer) • if number > old_prepare_number • old_prepare_number = number; • old_max_proposal = a proposal in accepted_proposals with max proposal number • send(ACK_PREPARE, n, old_max_proposal) to proposer • else • either also send back the old_max_proposal or just ignore the message
Accepter response to ACCEPT • accepter_on_receive_accept(ACCEPT, proposal, proposer) • if proposal.number≥old_prepare_number • accepted_proposals= accepted_proposals∪ proposal • else • either send back the old_max_proposal or just ignore the message
Learner • repeat • send (LEARN) to all accepters • accepted_proposals = all proposals replied • until there exists a proposal that it is accepted by a majority accepters • proposal is chosen
Proposer response to LEARN • accepter_on_receive_learn(learner) • send(ACK-LEARN, accepted_proposals) to learner
Why Paxos is correct? • The key is “do not break the value if it is chosen”. The proposer follows the algorithm strictly and make the proposal based on the collected history information. • To prove Paxos is correct, we should prove: • P2b. If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value v. • Which means if decided, no further action should destroy the decision. • If proposal <m,v> is decided, the prepare phase will restrict the proposer to make any proposal with the only value of v as the proposer will only be the returned value from accepters. The proposer can only make the proposals while getting majority response from accepters which of course intersect with the decided proposal set of <m,v>. • This is what P2c said. • So, Paxoswill not destroy a value if it is decided i.e. meets the safety requirement. What about the liveness requirement? Will Paxos truly goes to consensus among a group of process?
Progress (liveness) • Even two proposers might bring the system to live lock without doing anything useful. • So, there should be only one proposer which can be considered as the leader in the group.
Leader Election(1) • leader; //leader process, initialized to p2, the process with smallest id • proposer_self; //each proposer has its own id • proposer_start_leader_election • repeat periodically forever • send(ELECTION) to all proposers • wait for a while and receiving leader election messages; //”a while” can be 2x(largest latency) • active_proposers = all proposers that send back the ACK-ELECTION message • leader = a proposer in active_proposers with minimal proposer_id
Leader Election(2) • proposer_on_receive_election(proposer) • send(ACK-ELECTION, proposer_self) to proposer
Leader Election(3, leader code) • current_proposal_number; • proposer_make_proposals • repeat forever • wait for a while; //3x(maximal latency) • if leader = proposer_self • stop the existing proposer_make_proposal; • current_proposal_number = current_proposal_number + np; • start a new call of proposer_make_proposal;
Discussion • 1 Can these two proposals considered the same? <100, “hello”>, <200, “hello>, consider there are only three accepters and using this group to illustrate the principle of paxos proposals. • 2 If everything is OK, what about the performance of Paxos? How can a bunch of operations can be batched?