Hermes Reliable Replication Protocol

Hermes A Fast, Fault-tolerant and Linearizable Replication Protocol Antonios Katsarakis, V. Gavrielatos, S. Katebzadeh, A. Joshi*, B. Grot, V. Nagarajan, A. Dragojevic† University of Edinburgh, *Intel, †Microsoft Research hermes-protocol.com Thanks to:

Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 2

Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore Mandates data replication 7

Replication 101 Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 9

Replication 101 Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 10

Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 11

Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 12

Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance Can reliable protocols provide high performance? 13

Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 15

Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) State-of-the-art reliable protocols exploit failure-free operation for performance 18

Performance of state-of-the-art protocols ZAB replicas Leader 20

Performance of state-of-the-art protocols ZAB Leader Local reads form all replicas à Fast read write ucast bcast 21

Performance of state-of-the-art protocols ZAB Leader Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 22

Performance of state-of-the-art protocols ZAB CRAQ Leader Head Tail Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 23

Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 24

Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Head Writes traverse length of the chain à High latency Tail Writes serialize on the leader à Low throughput read write ucast bcast 25

Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Head Writes traverse length of the chain à High latency Tail Writes serialize on the leader à Low throughput Fast reads but poor write performance read write ucast bcast 26

Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write 28

Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write 29

Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Head Tail Avoid long latencies 30

Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Leader Avoid write serialization 32

Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Fast, decentralized, fully concurrent writes 33

Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Existing replication protocols are deficient Fast, decentralized, fully concurrent writes 34

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write 36

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) 37

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) Invalidation I I States of A: Valid, Invalid 38

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) Invalidation I I At this point, no stale reads can be served Strong consistency! States of A: Valid, Invalid 39

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 41

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V commit Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 42

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 43

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 44

Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid What about concurrent writes? 45

Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 47

Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 48

Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 49

Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Broadcast + Invalidations + TS à high performance writes 52

Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 54

Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort Awesome! But what about fault tolerance? 58

Handling faults in Hermes Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation 60

Hermes Reliable Replication Protocol

Hermes Reliable Replication Protocol

Presentation Transcript

HERMES

Hermes Handbags,Hermes Birkin,Hermes Kelly

Hermes

Hermes EC3

Reliable Data Transfer Protocol

Hermes

Hermes

Hermes

Hermes (Mercury)

Hermes

Hermes (Mercury)

Hermes

Hermes

Hermes

HERMES

Reliable MySQL Using Replication

Sync-based Replication : Protocol and OpenLDAP Implementation

Hermes Net

Hermes do Amaral Pacheco hermes@icomnet.br

Hermes Net

HERMES BIRKIN

Hermes