820 likes | 1.12k Views
Conflict-free Replicated Data Types. Presented by: Ron Zisman. Marc Shapiro, Nuno Preguiça , Carlos Baquero and Marek Zawirski. Replication and Consistency - essential features of large distributed systems such as www, p2p, and cloud computing Lots of replicas
E N D
Conflict-free Replicated Data Types Presented by: Ron Zisman Marc Shapiro, NunoPreguiça, Carlos Baquero and MarekZawirski
Replication and Consistency - essential features of large distributed systems such as www, p2p, and cloud computing • Lots of replicas • Great for fault-tolerance and read latency • Problematic when updates occur • Slow synchronization • Conflicts in case of no synchronization Motivation
We look for an approach that: • supports Replication • guaranteesEventual Consistency • isFastand Simple • Conflict-free objects = no synchronization whatsoever • Is this practical? Motivation
Contributions Theory Strong Eventual Consistency (SEC) • A solution to the CAP problem • Formal definitions • Two sufficient conditions • Strong equivalence between the two • Incomparable to sequential consistency Practice CRDTs = Convergent or Commutative Replicated Data Types • Counters • Set • Directed graph
Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale
Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale
Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale
Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale
Strong Consistency Ideal consistency: all replicas know about the update immediately after it executes • Preclude conflicts • Replicas update in the same total order • Any deterministic object • Consensus • Serialization bottleneck • Tolerates < n/2 faults • Correct, but doesn’t scale
Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex
Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex
Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex
Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex
Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex
Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex
Eventual Consistency • Update local and propagate • No foreground synch • Eventual, reliable delivery • On conflict • Arbitrate • Roll back • Consensus moved to background • Better performance • Still complex Reconcile
Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem
Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem
Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem
Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem
Strong Eventual Consistency • Update local and propagate • No synch • Eventual, reliable delivery • No conflict • deterministic outcome of concurrent updates • No consensus: ≤ n-1 faults • Solves the CAP problem
Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas • Termination: All method executions terminate • Convergence: Correct replicas that have delivered the same updates eventually reach equivalent state • Doesn’t preclude roll backs and reconciling Definition of EC
Eventual delivery: An update delivered at some correct replica is eventually delivered to all correct replicas • Termination: All method executions terminate • Strong Convergence: Correct replicas that have delivered the same updates haveequivalent state Definition of SEC
System model System of non-byzantine processes interconnected by an asynchronous network Partition-tolerance and recovery What are the two simple conditions that guarantee strong convergence?
Query • Client sends the query to any of the replicas • Local at source replica • Evaluate synchronously, no side effects
Query • Client sends the query to any of the replicas • Local at source replica • Evaluate synchronously, no side effects
Query • Client sends the query to any of the replicas • Local at source replica • Evaluate synchronously, no side effects
An object is a tuple • Local queries, local updates • Send full state; on receive, merge • Update is said ‘delivered’ at some replica when it is included in its casual history • Causal History: State-based approach payload set merge initial state update query
State-based replication Causal History: • on query: • on update: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload
State-based replication Causal History: • on query: • on update: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload
State-based replication Causal History: • on query: • on update: • on merge: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload • Convergence • Episodically: send payload • On delivery: merge payloads
State-based replication Causal History: • on query: • on update: • on merge: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload • Convergence • Episodically: send payload • On delivery: merge payloads
State-based replication Causal History: • on query: • on update: • on merge: • Local at source .u(a), .u(b), … • Precondition, compute • Update local payload • Convergence • Episodically: send payload • On delivery: merge payloads
A poset is a join-semilattice if for all x,y in S a LUB exists LUB = Least Upper Bound • Associative: • Commutative: • Idempotent: Examples: Semi-lattice
State-based: monotonic semi-lattice CvRDT If: then replicas converge to LUB of last values • payload type forms a semi-lattice • updates are increasing • merge computes Least Upper Bound
An object is a tuple • prepare-update • Precondition at source • 1st phase: at source, synchronous, no side effects • effect-update • Precondition against downstream state (P) • 2nd phase, asynchronous, side-effects to downstream state Operation-based approach payload set delivery precondition effect-update initial state prepare-update query
Operation-based replication • Local at source • Precondition, compute • Broadcast to all replicas Causal History: • on query/prepare-update:
Operation-based replication • Local at source • Precondition, compute • Broadcast to all replicas • Eventually, at all replicas: • Downstream precondition • Assign local replica Causal History: • on query/prepare-update: • on effect-update:
Operation-based replication • Local at source • Precondition, compute • Broadcast to all replicas • Eventually, at all replicas: • Downstream precondition • Assign local replica Causal History: • on query/prepare-update: • on effect-update:
Op-based: commute CmRDT • Liveness: all replicas execute all operations in delivery order where the downstream precondition (P) is true • Safety: concurrent operations all commute If: then replicas converge
Monotonic semi-latticeCommutative A state-based object can emulate an operation-based object, and vice-versa Use state-based reasoning and then covert to operation based for better efficiency
Comparison State-based Operation-based Update operation Higher level, more complex More powerful, more constraining Small messages Collaborative editing (Treedoc), Bayou, PNUTS • Update ≠ merge operation • Simple data types • State includes preceding updates; no separate historical information • Inefficient if payload is large • File systems (NFS, Dynamo) State-based or op-based, as convenient
There is a SEC object that is not sequentially-consistent: Consider a Set CRDT S with operations add(e) and remove(e) • remove(e) → add(e) e ∈ S • add(e) ║ remove(e’)e ∈ S ∧ e’ S • add(e) ║ remove(e)e ∈ S (suppose add wins) Consider the following scenario with replicas , : • [add(e); remove(e’)] ║ [add(e’); remove(e)] • merges the states from and : e ∈ S ∧ e’∈ S The state of replica will never occur in a sequentially-consistent execution (either remove(e) or remove(e’) must be last) SEC is incomparable to sequential consistency
There is a sequentially-consistent object that is not SEC: • If no crashes occur, a sequentially-consistent object is SEC • Generally, sequential consistency requires consensus to determine the single order of operations – cannot be solved if n-1 crashes occur (while SEC can tolerate n-1 crashes) SEC is incomparable to sequential consistency
Example CRDTs Multi-master counter Observed-Remove Set Directed Graph
Increment • Payload: • Partial order: • value() = • increment() = ++ • merge(x,y) = = Multi-master counter
Increment / Decrement • Payload: , • Partial order: • value() = - • increment() = ++ • decrement() = ++ • merge(x,y) = = ( ) Multi-master counter
Sequential specification: • {true} add(e){e ∈ S} • {true} remove(e){e ∈ S} • Concurrent: {true} add(e) ║remove(e) {???} • linearizable? • error state? • last writer wins? • add wins? • remove wins? Set design alternatives
Observed-Remove Set • Payload: added, removed (element, unique token) • add(e) =