1 / 21

PAXOS

PAXOS. Lecture by Avi Eyal Based on: Deconstructing Paxos – by Rajsbaum Paxos Made Simple – by Lamport Reconstructing Paxos – by Rajsbaum. Our Goals. Agree on values (Consensus) Arrange those values in a “Total Order”. The Scene. Complete graph Asynchronous system and no FIFO

tillie
Download Presentation

PAXOS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PAXOS Lecture by Avi Eyal Based on: Deconstructing Paxos – by Rajsbaum Paxos Made Simple – by Lamport Reconstructing Paxos – by Rajsbaum

  2. Our Goals • Agree on values (Consensus) • Arrange those values in a “Total Order”

  3. The Scene • Complete graph • Asynchronous system and no FIFO • Machine may crash (first we deal with “crash-stop”) • No Byzantine errors • No corruption of messages • The number of machines is known • The system stabilizes after a finite time

  4. A word about stability By the FLP theorem, Consensus is not solvable in an asynchronous system if even a single process might crash. We assume that after an unknown finite time, every process that crashes, crashes for good, and every active process is active for good (i.e. no process is unstable forever)

  5. Consensus • If process Pi proposes a value over and over, then either Pi crashes or Pi decides. • If Pi decides on a value, then eventually every correct process decides the same value.

  6. Consensus How can we assure that only a single value is chosen when some machines are unstable? Do we need a consistent leader? What if we had more than one leader at a time?

  7. Consensus Decision will be taken by at least half the processes, and we will make sure that the rest get the message. We will show that we do NOT need a consistent leader at that point, but… If we have 2 leaders, they might fail each other.

  8. Proposers & Witnesses “Read” • Make sure that more than half of the witnesses will not work with someone whose round number is less than mine. • Get a decided value if exists. “Write” • Set a value to more than half of the witnesses

  9. Proposer Witness [“read”, k] Update readj [ackRead, k, writej, vj] or Update v* or abort [nackRead, k] [“write”, k, v*] Update writej, vj [ackWrite, k] or Decide v* or abort [nackWrite, k]

  10. Consensus Propose(v) k=k+n Send [“read”, k,] to all Wait for n/2 replies [ackRead, k’, v’] if received any nackRead abort v*=v’ with max k’ or v if none exists Send [“write”, k, v*] to all Wait for n/2 replies [ackWrite, k] if received a nackWrite abort decide(v*) Upon receive [read, k] if k < readi or k < writei reply [nackRead, k] else readi=k reply [ackRead, k, writei, vi] Upon receive [write, k, v*] if writei > k or readi > k reply [nackWrite, k] writei = k vi = v* reply [ackWrite, k]

  11. Some notes about the Consensus algorithm • It is possible that Pi proposes a value, does not decide, and then Pj can decide this value even if Pi has crashed (after “write”). • When 2 leaders are proposing simultaneously, possibly none of them will decide. • If less than half the processes have answered the “write” query, we cannot be sure what the decided value will be. (It depends if the next proposer will get an answer from them or not).

  12. Total Order • If Pi delivers m then eventually every correct process delivers m. • If Pi delivers m, m’ in this order then Pj delivers m, m’ in the same order.

  13. Total Order Can we do that without a leader? For how long will we need that leader? What if we had more than one leader?

  14. The Paxos Algorithm • Each process maintains the id of it’s current leader • Proposing values is done through the leader • The leader sequences the orders and then uses Consensus in order to agree on the sequence.

  15. The Paxos Algorithm The messages proposed contain values and order numbers. A leader may take care of a few orders at the same time.

  16. Pi Pj m m’ (7,m*) (7,m*) Leader Propose(6, m) Propose(7, m’) Decide(7, m*) Decide(6, m’*) (6,m’*) (6,m’*)

  17. Data Structures • TO_Delivered[] • TO_Undelivered[] • AwaitToBeDelivered[] used upon delivery • nextBatch

  18. The Paxos Algorithm – leader Converge(L, m) returned = abort while (returned == abort) returned = propose(L, m) // Repeat until dicide send [decision, L, m] to all processes Upon new message m Verify that m has not yet been delivered find k that does not have a Converge(k, *) active Converge(k, m)

  19. The Paxos Algorithm – process Upon new message m or leader change Verify that m has not yet been delivered Send TO_Undelivered+m to the leader. Upon receive m from Pj [decision/update, kj, m) stop Converge(kj, *) if active if kj = nextBatch deliver (kj, m) and return if kj < nextBatch update Pj of his missing messages if kj > nextBatch AwaitToBeDelivered[kj] = m //Will be used upon delivery send [update, nextBatch-1, TO_Delivered] to all in order to be updated

  20. Fail Recovery • Each process holds readi, writei, vi, TO_Delivered and nextBatch on a stable storage in order recover consistently after a crash. • If a leader proposes, crashes, recovers and proposes again, he might consider an answer for the second proposal as an answer for the first one. Replies to the proposer should contain the msg. • A process should remember all the messages and should answer the same for same messages, in case a proposer proposed twice with the same value.

  21. Tradeoffs • If we know that most of the processes never crash, we can rely on them instead of using the stable storage. • If there are unstable processes, who elect themselves as leader over and over, we can store for each process the leaders of all other processes. A process will then elect a leader only if most of the processes have elected that leader (assuming most processes never crash).

More Related