270 likes | 413 Views
Ordering of events in Distributed Systems & Eventual Consistency. Jinyang Li. What is consistency?. Consistency model: A constraint on the system state observable by application operations Examples: X86 memory: Database:. write x=5. read x (should be 5). time. x:=x+1; y:=y-1.
E N D
Ordering of events in Distributed Systems&Eventual Consistency Jinyang Li
What is consistency? • Consistency model: • A constraint on the system state observable by application operations • Examples: • X86 memory: • Database: write x=5 read x (should be 5) time x:=x+1; y:=y-1 assert(x+y==const) time
Consistency • No right or wrong consistency models • Tradeoff between ease of programmability and efficiency • Consistency is hard in (distributed) systems: • Data replication (caching) • Concurrency • Failures
Consistency challenges: example • Each node has a local copy of state • Read from local state • Send writes to the other node, but do not wait
Consistency challenges: example W(x)1 W(y)1 x=1 If y==0 critical section y=1 If x==0 critical section
Does this work? W(x)1 W(y)1 R(x)0 R(y)0 x=1 If y==0 critical section y=1 If x==0 critical section
Diff CPUs see different event orders! What went wrong? W(x)1 W(y)1 R(x)0 R(y)0 CPU1 sees: W(y)1 R(x)0 W(x)1 CPU0 sees: W(x)1 R(y)0 W(y)1
Strict consistency • Each operation is stamped with a global wall-clock time • Rules: • Each read gets the latest write value • All operations at one CPU have time-stamps in execution order
W must have timestamp later than R Contradicts rule 1: R must see W(x)1 Strict consistency gives “intuitive” results • No two CPUs in the critical section • Proof: suppose mutual exclusion is violated CPU0: W(x)1 R(y)0 CPU1: W(y)1 R(x)0 • Rule 1: read gets latest write CPU0: W(x)1 R(x)0 CPU1: W(y)1 R(x)0
Sequential consistency • Strict consistency is not practical • No global wall-clock available • Sequential consistency is the closest • Rules: There is a total order of ops s.t. • All CPUs see results according to total order (i.e. reads see most recent writes) • Each CPUs’ ops appear in order
Lamport clock gives a total order • Each CPU keeps a logical clock • Each CPU updates its logical clock between successive events • A sender includes its clock value in the message. • A receiver advances its clock be greater than the message’s clock value. • Lamport clocks define a total order. • Ties are broken based on CPU ids.
Fix the example W(x)1 ack W(y)1 R(x)1 R(y)0 ack CPU1 should see order W(x)1 W(y)1 CPU0 should see order W(x)1 W(y)1
Lamport clock: an example W(x)1 1,0 S: W(x)1 W(y)1 1,1 S: W(y)1 2,1R: W(x)1 2,0 R: W(y)1 3,1S: ack 3,0 S: ack 4,1 R: ack 4,0 R: ack 1,0 S W(x)1 1,1 S W(y)1 2,0 R W(y)1 2,1 R W(x)1 3,0 S ack 3,1 S ack 4,0 R ack 4,1 S ack Defines one possible total order: W(x)1 < W(y)1
1,0 S W(x)1 ????? ?????? 1,1 S: W(x)1 1,0 S W(x)1 1,0 S W(x)1 1,1 S W(y)1 2,1 R: W(x)1 3,1 S: ack 1,0 S W(x)1 1,1 S W(y)1 2,0 R W(y)1 3,0 S ack 1,0 S W(x)1 1,1 S W(y)1 Lamport clock: an example 1,0 S: W(x)1 W(x)1 1,1 S: W(y)1 W(y)1 2,1R: W(x)1 2,0 R: W(y)1 3,1S: ack 3,0 S: ack 4,1 R: ack 4,0 R: ack
Beyond Lamport clock • Typical system obtains a total order differently • Use a single node to order all reads/writes • E.g. the lock_server in Lab1 • Partition state over multiple nodes, each node orders reads/writes for its partition • Invariant: exactly one is in charge of ordering • The ordering node must be online
Weakly consistent systems • Sequential consistency • All read/writes are applied in total order • Reads must see most recent writes • Eventual consistency (Bayou) • Writes are eventually applied in total order • Reads might not see most recent writes in total order
Why (not) eventual consistency? • Support disconnected operations • Better to read a stale value than nothing • Better to save writes somewhere than nothing • Potentially anomalous application behavior • Stale reads and conflicting writes…
Bayou Write log 0:0 1:0 2:0 Version Vector N1 0:0 1:0 2:0 N0 0:0 1:0 2:0 N2
1:0 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:0 2:0 Bayou propagation Write log 1:1 W(x) 0:0 1:1 2:0 Version Vector N1 1:0 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:0 2:0 N0 0:0 1:0 2:0 N2
0:3 1:4 2:0 1:1 W(x) Bayou propagation Write log 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:4 2:0 Version Vector N1 1:0 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:0 2:0 N0 0:0 1:0 2:0 N2
Which portion of The log is stable? Bayou propagation Write log 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:4 2:0 Version Vector N1 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:4 1:4 2:0 N0 0:0 1:0 2:0 N2
Bayou propagation Write log 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:4 2:0 Version Vector N1 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:4 1:4 2:0 N0 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:4 2:5 N2
Bayou propagation Write log 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:3 1:6 2:5 Version Vector N1 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:4 1:4 2:0 0:3 1:4 2:5 N0 1:0 W(x) 1:1 W(x) 2:0 W(y) 3:0 W(z) 0:4 1:4 2:5 N2
Bayou uses a primary to commit a total order • Why is it important to make log stable? • Stable writes can be committed • Stable portion of the log can be truncated • Problem: If any node is offline, the stable portion of all logs stops growing • Bayou’s solution: • A designated primary defines a total commit order • Primary assigns CSNs (commit-seq-no) • Any write with a known CSN is stable • All stable writes are ordered before tentative writes
∞:1:1 W(x) 0:0 1:1 2:0 Bayou propagation Write log ∞:1:1 W(x) 0:0 1:1 2:0 Version Vector N1 1:1:0 W(x) 2:2:0 W(y) 3:3:0 W(z) 0:3 1:0 2:0 N0 0:0 1:0 2:0 N2
1:1:0 W(x) 2:2:0 W(y) 3:3:0 W(z) 4:1:1 W(x) 0:4 1:1 2:0 Bayou propagation Write log ∞:1:1 W(x) 0:0 1:1 2:0 Version Vector N1 1:1:0 W(x) 2:2:0 W(y) 3:3:0 W(z) 0:4 1:1 2:0 N0 4:1:1 W(x) 0:0 1:0 2:0 N2
Bayou’s limitations • Primary cannot fail • Server creation & retirement makes nodeID grow arbitrarily long • Anomalous behaviors for apps? • Calendar app