Strong Eventual Consistency for Conflict-Free Replicated Data

Conflict-free Replicated Data Types Presented by: Ron Zisman Marc Shapiro, NunoPreguiça, Carlos Baquero and MarekZawirski

Replication and Consistency - essential features of large distributed systems such as www, p2p, and cloud computing • Lots of replicas • Great for fault-tolerance and read latency • Problematic when updates occur • Slow synchronization • Conflicts in case of no synchronization Motivation

We look for an approach that: • supports Replication • guaranteesEventual Consistency • isFastand Simple • Conflict-free objects = no synchronization whatsoever • Is this practical? Motivation

Contributions Theory Strong Eventual Consistency (SEC) • A solution to the CAP problem • Formal definitions • Two sufficient conditions • Strong equivalence between the two • Incomparable to sequential consistency Practice CRDTs = Convergent or Commutative Replicated Data Types • Counters • Set • Directed graph

Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale

Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex

Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex Reconcile

Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem

Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas • Termination: All method executions terminate • Convergence: Correct replicas that have delivered the same updates eventually reach equivalent state • Doesn’t preclude roll backs and reconciling Definition of EC

Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas • Termination: All method executions terminate • Strong Convergence: Correct replicas that have delivered the same updates haveequivalent state Definition of SEC

System model System of non-byzantine processes interconnected by an asynchronous network Partition-tolerance and recovery What are the two simple conditions that guarantee strong convergence?

Query • Client sends the query to any of the replicas • Local at source replica • Evaluate synchronously, no side effects

An object is a tuple • Local queries, local updates • Send full state; on receive, merge • Update is said ‘delivered’ at some replica when it is included in its casual history • Causal History: State-based approach payload set merge initial state update query

State-based replication Causal History: • on query: • on update: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload

State-based replication Causal History: • on query: • on update: • on merge: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload • Convergence • Episodically: send payload • On delivery: merge payloads

A poset is a join-semilattice if for all x,y in S a LUB exists LUB = Least Upper Bound • Associative: • Commutative: • Idempotent: Examples: Semi-lattice

State-based: monotonic semi-lattice CvRDT If: then replicas converge to LUB of last values • payload type forms a semi-lattice • updates are increasing • merge computes Least Upper Bound

An object is a tuple • prepare-update • Precondition at source • 1st phase: at source, synchronous, no side effects • effect-update • Precondition against downstream state (P) • 2nd phase, asynchronous, side-effects to downstream state Operation-based approach payload set delivery precondition effect-update initial state prepare-update query

Operation-based replication • Local at source • Precondition, compute • Broadcast to all replicas Causal History: • on query/prepare-update:

Operation-based replication • Local at source • Precondition, compute • Broadcast to all replicas • Eventually, at all replicas: • Downstream precondition • Assign local replica Causal History: • on query/prepare-update: • on effect-update:

Op-based: commute CmRDT • Liveness: all replicas execute all operations in delivery order where the downstream precondition (P) is true • Safety: concurrent operations all commute If: then replicas converge

Monotonic semi-latticeCommutative A state-based object can emulate an operation-based object, and vice-versa Use state-based reasoning and then covert to operation based for better efficiency

Comparison State-based Operation-based Update operation Higher level, more complex More powerful, more constraining Small messages Collaborative editing (Treedoc), Bayou, PNUTS • Update ≠ merge operation • Simple data types • State includes preceding updates; no separate historical information • Inefficient if payload is large • File systems (NFS, Dynamo) State-based or op-based, as convenient

There is a SEC object that is not sequentially-consistent: Consider a Set CRDT S with operations add(e) and remove(e) • remove(e) → add(e) e ∈ S • add(e) ║ remove(e’)e ∈ S ∧ e’ S • add(e) ║ remove(e)e ∈ S (suppose add wins) Consider the following scenario with replicas , : • [add(e); remove(e’)] ║ [add(e’); remove(e)] • merges the states from and : e ∈ S ∧ e’∈ S The state of replica will never occur in a sequentially-consistent execution (either remove(e) or remove(e’) must be last) SEC is incomparable to sequential consistency

There is a sequentially-consistent object that is not SEC: • If no crashes occur, a sequentially-consistent object is SEC • Generally, sequential consistency requires consensus to determine the single order of operations – cannot be solved if n-1 crashes occur (while SEC can tolerate n-1 crashes) SEC is incomparable to sequential consistency

Example CRDTs Multi-master counter Observed-Remove Set Directed Graph

Increment • Payload: • Partial order: • value() = • increment() = ++ • merge(x,y) = = Multi-master counter

Increment / Decrement • Payload: , • Partial order: • value() = - • increment() = ++ • decrement() = ++ • merge(x,y) = = ( ) Multi-master counter

Sequential specification: • {true} add(e){e ∈ S} • {true} remove(e){e ∈ S} • Concurrent: {true} add(e) ║remove(e) {???} • linearizable? • error state? • last writer wins? • add wins? • remove wins? Set design alternatives

Observed-Remove Set

Observed-Remove Set • Payload: added, removed (element, unique token) • add(e) =

Strong Eventual Consistency for Conflict-Free Replicated Data

Strong Eventual Consistency for Conflict-Free Replicated Data

Presentation Transcript

Types of Conflict

Mobile Replicated Data

Conflict-free Replicated Data Types

Types of Conflict

5 Types of Conflict

Replicated Data Protocols

Replicated Data Management

Data Currency in Replicated DHTs

Types of Conflict

Types of Conflict

Five types of conflict

Types of Conflict

Data Currency in Replicated DHTs

Ch12 (continued) Replicated Data Management

Types of Conflict

Types of Conflict

TYPES OF CONFLICT

Six Types of Conflict