270 likes | 423 Views
S- Paxos : Eliminating the Leader Bottleneck. Martin Biely , Zarko Milosevic, Nuno Santos , André Schiper Ecole Polytechnique Fédérale de Lausanne (EPFL ) Switzerland. October 9, 2012. Context: State Machine Replication. Replicated Service. Service. Service. Service.
E N D
S-Paxos: Eliminating the Leader Bottleneck Martin Biely, Zarko Milosevic, Nuno Santos, André Schiper EcolePolytechniqueFédéralede Lausanne (EPFL) Switzerland October 9, 2012
Context and Motivation Context: State Machine Replication Replicated Service Service Service Service Ordering protocol (Paxos) Clients
Context and Motivation The Paxos Protocol • Observation: leader receives and sends more messages than the followers • Potential system bottleneck… • Paxosis a leader-based protocol • A distinguished process (leader) coordinates the others (followers)
Context and Motivation Paxos Performance Experimental settings • JPaxos – implementation of Paxos in Java (protocol shown previously) • n=3, request size=20 bytes, CPU 2x2cores @2.2Ghz The bottleneck in Paxos is typically the leader
Context and Motivation Paxos is Leader-centric • Leader-centric protocol • The leader does considerably more work than the followers • Therefore, the leader is prone to being the system bottleneck • Paxos and most leader-based protocols are also leader-centric
Context and Motivation Leader-based vs Leader-centric • Note that leader-based ≠ leader-centric • Leader-based – algorithmic concept, leader is a distinguished process • Leader-centric – resource usage, leader is a bottleneck Question: do leader-based protocols like Paxos must also be leader-centric?
S-Paxos Overview Leader-based but not leader-centric
S-Paxos Overview Why Paxos is Leader-centric • Leader does the following • Receives requests from clients • Coordinates protocol to order requests • Replies to clients • Followers do much less • Receive client requests from leader • Acknowledge order proposed by leader • Underlying problem: unbalanced resource utilization • Leader runs out of resources (CPU, network bandwidth) • While followers are lightly loaded
S-Paxos Overview S-Paxos: A Balanced Paxos Variant • S-Paxosbalances workload across replicas • Leader and followers have similar resource usage • The full resources of all replicas become available to the ordering protocol • S-Paxosis leader-based but not leader-centric • Combines several well-known ideas in a novel way • All replicas handle client communication • All replicas disseminate requests • Ordering done on IDs
S-Paxos Overview S-Paxos key ideasDistribute client communication • Commonly used in practice • For instance, ZooKeeper • But by itself, still leader-centric • Leader runs the ordering protocol on requests (Phase 2a messages of Paxos) • Followers have to forward requests to leader • And hence, sends requests to other followers All replicas handle client communication
S-Paxos Overview S-Paxos key ideasDistribute request dissemination • Note that Phase 2a messages have a dual purpose • Dissemination of requests • Establishing order • All replicas disseminate requests • Ordering performed on IDs S-Paxos separates dissemination from ordering
S-Paxos Overview S-Paxos Architecture and Data Flow
S-Paxos Overview S-Paxos balances work among replicas • Client communication and request dissemination usually the bulk of the load • In S-Paxos this task is performed by all replicas • Leader still has to coordinate ordering protocol • But IDs are small messages • So leader has minimal additional overhead • Two levels of batching to further reduce load on leader • Dissemination layer: batch client requests and use ordering layer to order ids of batches • Ordering layer: usual Paxos batching, in this case batches of batch ids.
S-Paxos Overview Benefits in the presence of faults • Faster view change • Since IDs are small, Phase 1 of Paxos completes quickly • Failures affecting the leader have less impact on throughput • Ordering protocol is interrupted, but dissemination protocol continues among working replicas • When a correct leader emerges, it can quickly order the IDs of the requests that were disseminated while there was no leader
Dissemination Layer Protocol Dissemination Layer Overview • Dissemination layer tasks • Receive requests from clients • Disseminate requests and IDs to all replicas • Initiate ordering of IDs • Execute requests in the order established for IDs • Challenges • Once an ID is decided, the corresponding request must remain available in the system • Coordinate view change between ordering and dissemination layers to ensure that ids are ordered once-and-only once 3 4 2 2 1
Dissemination Layer Protocol Overview of the Protocol
Experimental Evaluation Performance Evaluation • S-Paxos implemented on top of JPaxos, a Java implementation of Paxos • Experiments compare • JPaxos (leader-centric) • S-Paxos (non leader-centric) • Testbed: Grid 5000 (helios cluster) • CPU: 2x2-cores @ 2.2Ghz • Network: 1Gbit Ethernet • Experimental parameters • Request size: 20 bytes • Batch size • S-Paxos: dissemination layer 1450 bytes, ordering layer: 50 bytes • JPaxos: 1450 bytes • Null service
Experimental Evaluation Load Distribution: Average CPU utilization JPaxos S-Paxos
Throughput Response time Experimental Evaluation Performance with Increasing Number of Clients (n=3)
Experimental Evaluation Scalability Throughput
Experimental Evaluation Throughput with crashes Crash of the leader
Experimental Evaluation False suspicions • Leader is (wrongly) suspected every 10 seconds
Conclusion • A leader-based protocol does not need to be leader-centric • S-Paxos: balances the workload across replicas • Benefits • Better performance for the same number of replicas • Better scalability with the number of replicas • Better performance in the presence of faults
Dissemination Layer Protocol Discussion • Broadcast of <request,ID>: best effort, no retransmission • Avoids cost of reliable broadcast on requests • Recovering from partial delivery (message loss/crashes): • Request does not become stable - client timeouts and retransmits • Request becomes stable – after ID is decided, replicas poll other replicas for request • Broadcast of <Ack,ID>: retransmission • Ensures that once a request is stable, it will be proposed • Almost free in practice: acks are small and can be piggybacked on other messages.