540 likes | 656 Views
CS 347: Distributed Databases and Transaction Processing Data Replication. Hector Garcia-Molina. Replication Space. Updates at any copy at fixed (primary) copy at one copy but control can migrate no updates. Replication Space. Correctness no consistency local consistency
E N D
CS 347: Distributed Databases and Transaction ProcessingData Replication Hector Garcia-Molina Notes08
Replication Space • Updates • at any copy • at fixed (primary) copy • at one copy but control can migrate • no updates Notes08
Replication Space • Correctness • no consistency • local consistency • order preserving • serializable schedule • 1-copy serializability Notes08
Replication Space • Expected Failures • processors: fail-stop, byzantine? • network: reliable, partitions, in-order msgs? • storage: stable disk? Notes08
Replication Space • Implementation Details • update propagation • physical log records • logical log records • sql updates • transactions • reads at backup? • architecture • cross backups • multi-computer copy • initialization of backup copy Notes08
primary copy DB1 primary copy DB2 backup copy DB2 backup copy DB1 Cross Backups site A site B Notes08
L3’ L2’ L1’ L3 L2 L1 P2 P3 P1 B2 B3 B1 Y1 Y3 Y2 X3 X1 X2 Multi-Computer Sites backup site primary site Notes08
L1’ L1 P1 B1 Y1 X1 1-Safe Backups • Transactions commit at primary • Redo log records propagated • Transaction commit at backup Notes08
L1’ L1’ L1 L1 P1 P1 B1 B1 Y1 Y1 X1 X1 1-Safe Backups • Transactions can get lost T1, T2, T3 T1, T2 T1, T2, T3 T1, T2, T4, T5 Notes08
L1’ L1 P1 B1 Y1 X1 2-Safe Backups • Transactions do 2-phase commit • Redo log records propagated in prepare • Transactions not lost, but • longer delay, contention • cannot process unless both sites are up • After failure, go to 1-safe (no backup) Notes08
What is Correctness? • In 2-safe • In 1-safe Notes08
What is in Paper You Read? • Specific Senario • updates at fixed primary site • each site has multiple computers • primary-backup sites are matched • clean site failures; stable storage; rel net • log shipping • no reads at backup • no initialization Notes08
L2’ L1’ L2 L1 P2 P1 B1 B2 Y2 Y1 X2 X1 Main Problem: Update Dependencies backup site primary site Ta(1) Tb Ta(2) data dependency: TaTb Notes08
L2’ L1’ L2 L1 P1 P2 B2 B1 Y1 Y2 X2 X1 Main Problem: Update Dependencies backup site primary site Ta(1) Tb Ta(1) Tb ? Ta(2) data dependency: TaTb Notes08
L2’ L1’ L1 L2 P1 P2 B2 B1 Y2 Y1 X1 X2 Main Problem: Update Dependencies backup site primary site Ta(1) Tb Ta(1) Tb ? Ta(2) • should not install Ta • should not install Tb data dependency: TaTb Notes08
Dependency Reconstruction Algorithm • Locking at backup to detect dependencies • Ensure locks granted in same order as they were granted at primary Notes08
L1’ L2’ L1 L2 P1 P2 B2 B1 Y1 Y2 X2 X1 Example: Dependency Reconstruction backup site primary site tickets reflect local commit order Ta(1) Tb 5 6 Ta(2) 18 data dependency: TaTb Notes08
L2’ L1’ L1 L2 P1 P2 B2 B1 Y2 Y1 X1 X2 Example: Dependency Reconstruction backup site primary site Ta(1) Tb Ta(1) Tb 5 6 5 6 ? Ta(2) 18 data dependency: TaTb Notes08
L2’ L1’ L1 L2 P1 P2 B1 B2 Y1 Y2 X1 X2 Example: Dependency Reconstruction backup site primary site Ta(1) Tb Ta(1) Tb 5 6 5 6 ? Ta(2) 18 • Say Tb requests lock first at B1; • Tb request delayed until all lockswith tickets <6 have been granted data dependency: TaTb Notes08
Epoch Algorithm • Backup updates are installed in batches • Epoch delimiters written on log Notes08
Writing Delimiters at Primary master 15 16 slave 15 16 slave 15 16 log time Notes08
Problem with Commits master 15 16 prepare commit slave T 15 16 slave 15 16 log time T’s commit record in Epoch 15 in some logs; in Epoch 16 in others Notes08
Solution: Bump Epoch master 15 16 prepare commit slave T 15 16 slave 15 16 log time prepare ack reports epoch number; coordinator bumps epoch if necessary Notes08
Installing an Epoch at Backup master 15 16 install 16 end of 16 slave 15 16 end of 16 slave 15 16 log time Notes08
To Install Epoch X at Backup J • Redo transactions: • If commit(T) X, commit T • If prepare(T) X but commit(T) > X: • If T’s primary peer was coordinator, do not commit; • Else check with the backup of T’s coordinator B’: • If B’ committing T in epoch X, then we commit T • Else do not commit T • Otherwise do not commit T (defer to next epoch) commit(T) X means that T’s commit record found in epoch X (or earlier) at node J. Notes08
Why Do We Need Coordinator Check? • Assignment: Construct 2 scenarios that look the same to backup J: • In Scenario 1, T should be installed • In Scenario 2, T should not be installed Notes08
Scenario 1 B’ C(T) P(T) 15 16 slave C(T) P(T) 15 16 log time Notes08
Scenario 2 B’ P(T) C(T) 15 16 slave C(T) P(T) 15 16 log time Notes08
Scenario 3: Possible? B’ P(T) C(T) 15 16 17 slave C(T) P(T) 15 16 17 log time Note that T commits at slave but not at B’!! Notes08
Scenario 4: Possible? B’ P(T) C(T) 17 15 16 slave P(T) C(T) 15 16 17 log time Note that T commits at B’ but not at slave!! Notes08
Comparison of Options • 2-safe • 1-safe • dep reconstruction • epoch • Specific Senario • updates at fixed primary site • each site has multiple computers • primary-backup sites are matched • clean site failures; stable storage; rel net • log shipping • no reads at backup • no initialization Notes08
How to Evaluate • What system? • actual system(s) • simulation • testbed • What transactions? • real transactions • synthetic transactions Notes08
Metrics • IO utilization • CPU utilization • Throughput (given max delay?) • Transaction commit delay • Backup copy lag • Network overhead • Probability of inconsistency Notes08
Sample Results Notes08
Sample Results Notes08
And Now For SomethingCompletely Different: • Updates • at any copy • at fixed (primary) copy • at one copy but control can migrate • no updates next: available copies have seen Notes08
PC-lock available copies • Transactions write lock at all available copies • Transactions read lock at any available copy • Primary site (static) manages U – set of available copies * down primary X1 X2 X3 X4 Notes08
Update Transaction (1) Get U from primary (2) Get write locks from U nodes (3) Commit at U nodes U={C0, C1} C0 Primary C1 Backup C2 Backup updates, 2PC U Trans T3, U={C0, C1} Notes08
A potential problem - example Now: U={C0, C1} -recovering- I am recovering C0 Primary C1 Backup C2 Backup Trans T3, U={C0, C1} Notes08
A potential problem - example Later: U={C0, C1, C2} -recovering- You missed T0, T1, T2 C0 Primary C1 Backup C2 Backup T3 updates T3 updates Trans T3, U={C0, C1} Notes08
Solution: • Initially transaction T gets copy of U’ ofU from primary (or uses cached value) • At commit of T, check U’ with current Uat primary (if different, abort T) Notes08
Solution Continued • When CX recovers: • request missed and pending transactionsfrom primary (primary updates U) • set write locks for pending transactions • Primary polls nodes to detect failures(updates U) Notes08
Example Revisited You missed T0, T1, T2 U={C0, C1} U={C0, C1, C2} I am recovering C0 Primary C1 Backup C2 Backup reject -recovering- prepare prepare Trans T3, U={C0, C1} Notes08
Available Copies — No Primary • Let all nodes have a copy of U(not just primary) • To modify U, run a special atomic transaction at all available sites(use commit protocol) • E.g.: U1={C1, C2} U2={C1, C2 , C3}only C1, C2 participate in this transaction • E.g.: U2={C1, C2 , C3} U3={C1, C2}only C1, C2 participate in this transaction Notes08
Details are tricky... • What if commit of U-change blocks? Notes08
Node Recovery (no primary) • Get missed updates from any active node • No unique sequence of transactions • If all nodes fail, wait for - all to recover - majority to recover Notes08
Example recovering node How much information (update values) must beremembered? By whom? Committed: A,B,C,D,E,F Pending: G Committed: A,B Committed: A,C,B,E,D Pending: F,G,H Notes08
Correctness with replicated data S1: r1[X1] r2[X2] w1[X1] w2[X2] Is this schedule serializable? X2 X1 Notes08
One copy serializable (1SR) A schedule S on replicated data is 1SR if it is equivalent to a serial history of the same transactions on a one-copy database Notes08
To check 1SR • Take schedule • Treat ri[Xj] as ri[X] Xj is copy of X wi[Xj] as wi[X] • Compute P(S) • If P(S) acyclic, S is 1SR Notes08