1 / 27

S- Paxos : Eliminating the Leader Bottleneck

S- Paxos : Eliminating the Leader Bottleneck. Martin Biely , Zarko Milosevic, Nuno Santos , André Schiper Ecole Polytechnique Fédérale de Lausanne (EPFL ) Switzerland. October 9, 2012. Context: State Machine Replication. Consistency among replicas ensured by Deterministic service

prue
Download Presentation

S- Paxos : Eliminating the Leader Bottleneck

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. S-Paxos: Eliminating the Leader Bottleneck Martin Biely, Zarko Milosevic, Nuno Santos, André Schiper EcolePolytechniqueFédéralede Lausanne (EPFL) Switzerland October 9, 2012

  2. Context and Motivation Context: State Machine Replication • Consistency among replicas ensured by • Deterministic service • Same initial state • Same sequence of requests • System model • Partially synchronous • Crash stop (max crashes) Replicated Service Service Service Service Ordering protocol (Paxos) Clients

  3. Context and Motivation The Paxos Protocol • Observation: leader receives and sends more messages than the followers • Potential system bottleneck… • Paxosis a leader-based protocol • A distinguished process (leader) coordinates the others (followers)

  4. Context and Motivation Paxos Performance Experimental settings • JPaxos – implementation of Paxos in Java (protocol shown previously) • n=3, request size=20 bytes, CPU 2x2cores @2.2Ghz The bottleneck in Paxos is typically the leader

  5. Context and Motivation Paxos is Leader-centric • Leader-centric protocol • The leader does considerably more work than the followers • Therefore, the leader is prone to being the system bottleneck • Paxos and most leader-based protocols are also leader-centric

  6. Context and Motivation Leader-based vs Leader-centric • Note that leader-based ≠ leader-centric • Leader-based – algorithmic concept, leader is a distinguished process • Leader-centric – resource usage, leader is a bottleneck Question: do leader-based protocols like Paxos must also be leader-centric?

  7. S-Paxos Overview Leader-based but not leader-centric

  8. S-Paxos Overview Why Paxos is Leader-centric • Leader does the following • Receives requests from clients • Coordinates protocol to order requests • Replies to clients • Followers do much less • Receive client requests from leader • Acknowledge order proposed by leader • Underlying problem: unbalanced resource utilization • Leader runs out of resources (CPU, network bandwidth) • While followers are lightly loaded

  9. S-Paxos Overview S-Paxos: A Balanced Paxos Variant • S-Paxosbalances workload across replicas • Leader and followers have similar resource usage • The full resources of all replicas become available to the ordering protocol • S-Paxosis leader-based but not leader-centric • Combines several well-known ideas in a novel way • All replicas handle client communication • All replicas disseminate requests • Ordering done on IDs

  10. S-Paxos Overview S-Paxos key ideasDistribute client communication • Commonly used in practice • For instance, ZooKeeper • But by itself, still leader-centric • Leader runs the ordering protocol on requests (Phase 2a messages of Paxos) • Followers have to forward requests to leader • And hence, sends requests to other followers All replicas handle client communication

  11. S-Paxos Overview S-Paxos key ideasDistribute request dissemination • Note that Phase 2a messages have a dual purpose • Dissemination of requests • Establishing order • All replicas disseminate requests • Ordering performed on IDs S-Paxos separates dissemination from ordering

  12. S-Paxos Overview S-Paxos Architecture and Data Flow

  13. S-Paxos Overview S-Paxos balances work among replicas • Client communication and request dissemination usually the bulk of the load • In S-Paxos this task is performed by all replicas • Leader still has to coordinate ordering protocol • But IDs are small messages • So leader has minimal additional overhead • Two levels of batching to further reduce load on leader • Dissemination layer: batch client requests and use ordering layer to order ids of batches • Ordering layer: usual Paxos batching, in this case batches of batch ids.

  14. S-Paxos Overview Benefits in the presence of faults • Faster view change • Since IDs are small, Phase 1 of Paxos completes quickly • Failures affecting the leader have less impact on throughput • Ordering protocol is interrupted, but dissemination protocol continues among working replicas • When a correct leader emerges, it can quickly order the IDs of the requests that were disseminated while there was no leader

  15. Dissemination Layer Protocol

  16. Dissemination Layer Protocol Dissemination Layer Overview • Dissemination layer tasks • Receive requests from clients • Disseminate requests and IDs to all replicas • Initiate ordering of IDs • Execute requests in the order established for IDs • Challenges • Once an ID is decided, the corresponding request must remain available in the system • Coordinate view change between ordering and dissemination layers to ensure that ids are ordered once-and-only once 3 4 2 2 1

  17. Dissemination Layer Protocol Overview of the Protocol Disseminating requests • Optimistic implementation of reliable broadcast • When a replica receives a request from a client, it broadcasts <request,ID> • Replicas acknowledge reception of forwarded requests by broadcasting <Ack,ID> Proposing IDs • Leader proposes an ID once the corresponding request is stable • That is, when it receives acknowledgements for the ID Executing requests • Replica must have: request and decision for corresponding ID • If ID decided before request received, poll other replicas for request after a small delay • Request stable, so at least one correct replica has the request

  18. Performance Evaluation

  19. Experimental Evaluation Performance Evaluation • S-Paxos implemented on top of JPaxos, a Java implementation of Paxos • Experiments compare • JPaxos (leader-centric) • S-Paxos (non leader-centric) • Testbed: Grid 5000 (helios cluster) • CPU: 2x2-cores @ 2.2Ghz • Network: 1Gbit Ethernet • Experimental parameters • Request size: 20 bytes • Batch size • S-Paxos: dissemination layer 1450 bytes, ordering layer: 50 bytes • JPaxos: 1450 bytes • Null service

  20. Experimental Evaluation Load Distribution: Average CPU utilization JPaxos S-Paxos

  21. Throughput Response time Experimental Evaluation Performance with Increasing Number of Clients (n=3)

  22. Experimental Evaluation Scalability Throughput

  23. Experimental Evaluation Throughput with crashes • Request size: 1KB, Batch size: 8KB, Crash of the leader

  24. Experimental Evaluation False suspicions • Leader is (wrongly) suspected every 10 seconds

  25. Conclusion • A leader-based protocol does not need to be leader-centric • S-Paxos: balances the workload across replicas • Benefits • Better performance for the same number of replicas • Better scalability with the number of replicas • Better performance in the presence of faults

  26. Additional slides

  27. Dissemination Layer Protocol Discussion • Broadcast of <request,ID>: best effort, no retransmission • Avoids cost of reliable broadcast on requests • Recovering from partial delivery (message loss/crashes): • Request does not become stable - client timeouts and retransmits • Request becomes stable – after ID is decided, replicas poll other replicas for request • Broadcast of <Ack,ID>: retransmission • Ensures that once a request is stable, it will be proposed • Almost free in practice: acks are small and can be piggybacked on other messages.

More Related