200 likes | 343 Views
Consistency in a distributed world. My goals. Quick Review Distributed Transactions A Different Approach CAP Theorem Eventually Consistent Best Practices. Quick Review. The Good Concurrency Consistency Integrity The Bad Locks The really ugly Failures. large scale systems?.
E N D
My goals • Quick Review • Distributed Transactions • A Different Approach • CAP Theorem • Eventually Consistent • Best Practices
Quick Review • The Good • Concurrency • Consistency • Integrity • The Bad • Locks • The really ugly • Failures large scale systems?
Distributed System • Still the same problems • Concurrency • Consistency • Integrity • But more things that can fail
Two Generals’ Problem G1 G2 E Rules Single General attack ⇒ Defeat Double attack ⇒ Victory Unreliable communication G1: Let’s attack at 18:45 G2: confirmation G1: confirmation …
Two-Phase Commit (2PC) • Goal: all commit or all rollback • Prepare Phase • Initiator asks other nodes to promise to commit or rollback, even if there’s a failure • If any node cannot prepare ⇒ rollback • Commit Phase • Initiator commits and asks others to do the same
Two-Phase Commit (2PC) • Prepare Phase: promise to commit or roll-back • Record operation in the “REDO” logs so that it can either commit or rollback regardless of failures • Place a “read lock” on the modified tables • Flag the transaction as “in-doubt” • Non-failure case • Coordinator will ask to either commit or rollback • Remove the locks • Failure • Transaction will remain “in-doubt” and resources are inaccessible
Two-Phase Commit (2PC) • Pros • ACID • Transparent (abstraction doesn’t “leak”) • Cons • When it works • Expensive • Read Locks • When it fails • Leaves Locks • “Better” Solutions Exist • More Complex • More expensive • Consensus (Paxos) • Google’s “Chubby”
A Different Approach Your Coffee Shop Doesn’t Use Two-Phase Commit • Employees: Baristas (B) and Cashiers (C) • Process • You order to C • C writes your name on cup and puts it in a queue • You pay to C • B eventually prepares coffee and calls your name • You pick up your coffee • Asynchronous • Pros: less locking ⇒ more efficient use of resources • Cons: A whole set of different problems …
Asynchronous Problems • Correlation: orders might be fulfilled not in the order they are queued ⇒ correlation identifier • Exception Handling: cannot be easily “abstracted” • Write-off: coffee made but you can’t pay • Retry: coffee was not good (idempotent receivers) • Compensating action: coffee machines breaks • Optimistic “Happy Day” Approach • Pessimistic Approach: Escrow Company • Prepare: debit money • Rollback: credit money back • Commit: do nothing
Compensating Action • Not as “simple” as in the ACID world • Some things cannot be compensated for • Shifts burden • From infrastructure (declarative) • To the client (ad-hoc solutions) • Less “economy of scale” • Monitoring, Control, ...
Conversation Pattern Half-sync Half-async Async Sync Sync
CAP Theorem • Consistency • Availability • Partition-tolerance • All 3 are desirable • Can have any 2 but not 3 ACID Vs BASE Basically Available, Soft State, Eventually Consistent Distributed Transactions
Eventually Consistent • Partitions are a given in larger systems • Relax availability • Might refuse to write • Relax consistency • Accept a write but this is not reflected in subsequent reads • Strong: after update any read will get the updated value • Weak: inconsistency window • Eventual: if not further updates all the read will “eventually” get the update value. Inconsistency Window = f(delays, load, #replicas, …) http://www.allthingsdistributed.com/2008/12/eventually_consistent.html
Eventually Consistent • Causal C: • A updates, tells B, B reads updated value • Read-yours-writes C • special case of previous • Session Consistency C • as long the session exists RYWC is guaranteed • Monotonic Read C • Once you see a value you never see a previous version • Monotonic Write C • Serialise Writes Amazon’s Dynamo “has brought all of these properties under explicit control of the application architecture”, “allow the application service owner […] to make the trade-offs between consistency, durability, availability, and performance at a certain cost point"
Scalability Best Practices EBay • Avoid Distributed Transactions • “we allow absolutely no client-side or distributed transactions of any kind - no two-phase commit.” • Decouple Functions Asynchronously • Messages and queues • Move Processing To Asynchronous Flows • Execution latency Vs User latency • Scale for peak Vs Scale for average
Lesson Learned • Large distributed systems often have different needs and requirements • To maximise “business value” we might need to relax some constraints • Problems are often “wicked” and the best solution depends on a lot of details and dependencies
спасибо Globe of Science and Innovation, CERN