750 likes | 764 Views
The presentation slides as appeared in ASPLOS'20 for the paper<br>"Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol".
E N D
Hermes A Fast, Fault-tolerant and Linearizable Replication Protocol Antonios Katsarakis, V. Gavrielatos, S. Katebzadeh, A. Joshi*, B. Grot, V. Nagarajan, A. Dragojevic† University of Edinburgh, *Intel, †Microsoft Research hermes-protocol.com Thanks to:
Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 2
Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 3
Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 4
Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 5
Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 6
Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore Mandates data replication 7
Replication 101 Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 9
Replication 101 Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 10
Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 11
Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 12
Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance Can reliable protocols provide high performance? 13
Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 15
Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 16
Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 17
Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) State-of-the-art reliable protocols exploit failure-free operation for performance 18
Performance of state-of-the-art protocols ZAB replicas Leader 20
Performance of state-of-the-art protocols ZAB Leader Local reads form all replicas à Fast read write ucast bcast 21
Performance of state-of-the-art protocols ZAB Leader Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 22
Performance of state-of-the-art protocols ZAB CRAQ Leader Head Tail Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 23
Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 24
Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Head Writes traverse length of the chain à High latency Tail Writes serialize on the leader à Low throughput read write ucast bcast 25
Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Head Writes traverse length of the chain à High latency Tail Writes serialize on the leader à Low throughput Fast reads but poor write performance read write ucast bcast 26
Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write 28
Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write 29
Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Head Tail Avoid long latencies 30
Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Leader Avoid write serialization 32
Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Fast, decentralized, fully concurrent writes 33
Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Existing replication protocols are deficient Fast, decentralized, fully concurrent writes 34
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write 36
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) 37
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) Invalidation I I States of A: Valid, Invalid 38
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) Invalidation I I At this point, no stale reads can be served Strong consistency! States of A: Valid, Invalid 39
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 41
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V commit Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 42
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 43
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 44
Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid What about concurrent writes? 45
Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 47
Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 48
Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 49
Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 50
Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 51
Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Broadcast + Invalidations + TS à high performance writes 52
Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 54
Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 55
Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 56
Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 57
Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort Awesome! But what about fault tolerance? 58
Handling faults in Hermes Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation 60