CS 347: Distributed Databases and Transaction Processing Data Replication

CS 347: Distributed Databases and Transaction ProcessingData Replication Hector Garcia-Molina Notes08

Replication Space • Updates • at any copy • at fixed (primary) copy • at one copy but control can migrate • no updates Notes08

Replication Space • Correctness • no consistency • local consistency • order preserving • serializable schedule • 1-copy serializability Notes08

Replication Space • Expected Failures • processors: fail-stop, byzantine? • network: reliable, partitions, in-order msgs? • storage: stable disk? Notes08

Replication Space • Implementation Details • update propagation • physical log records • logical log records • sql updates • transactions • reads at backup? • architecture • cross backups • multi-computer copy • initialization of backup copy Notes08

primary copy DB1 primary copy DB2 backup copy DB2 backup copy DB1 Cross Backups site A site B Notes08

L3’ L2’ L1’ L3 L2 L1 P2 P3 P1 B2 B3 B1 Y1 Y3 Y2 X3 X1 X2 Multi-Computer Sites backup site primary site Notes08

L1’ L1 P1 B1 Y1 X1 1-Safe Backups • Transactions commit at primary • Redo log records propagated • Transaction commit at backup Notes08

L1’ L1’ L1 L1 P1 P1 B1 B1 Y1 Y1 X1 X1 1-Safe Backups • Transactions can get lost T1, T2, T3 T1, T2 T1, T2, T3 T1, T2, T4, T5 Notes08

L1’ L1 P1 B1 Y1 X1 2-Safe Backups • Transactions do 2-phase commit • Redo log records propagated in prepare • Transactions not lost, but • longer delay, contention • cannot process unless both sites are up • After failure, go to 1-safe (no backup) Notes08

What is Correctness? • In 2-safe • In 1-safe Notes08

What is in Paper You Read? • Specific Senario • updates at fixed primary site • each site has multiple computers • primary-backup sites are matched • clean site failures; stable storage; rel net • log shipping • no reads at backup • no initialization Notes08

L2’ L1’ L2 L1 P2 P1 B1 B2 Y2 Y1 X2 X1 Main Problem: Update Dependencies backup site primary site Ta(1) Tb Ta(2) data dependency: TaTb Notes08

L2’ L1’ L2 L1 P1 P2 B2 B1 Y1 Y2 X2 X1 Main Problem: Update Dependencies backup site primary site Ta(1) Tb Ta(1) Tb ? Ta(2) data dependency: TaTb Notes08

L2’ L1’ L1 L2 P1 P2 B2 B1 Y2 Y1 X1 X2 Main Problem: Update Dependencies backup site primary site Ta(1) Tb Ta(1) Tb ? Ta(2) • should not install Ta • should not install Tb data dependency: TaTb Notes08

Dependency Reconstruction Algorithm • Locking at backup to detect dependencies • Ensure locks granted in same order as they were granted at primary Notes08

L1’ L2’ L1 L2 P1 P2 B2 B1 Y1 Y2 X2 X1 Example: Dependency Reconstruction backup site primary site tickets reflect local commit order Ta(1) Tb 5 6 Ta(2) 18 data dependency: TaTb Notes08

L2’ L1’ L1 L2 P1 P2 B2 B1 Y2 Y1 X1 X2 Example: Dependency Reconstruction backup site primary site Ta(1) Tb Ta(1) Tb 5 6 5 6 ? Ta(2) 18 data dependency: TaTb Notes08

L2’ L1’ L1 L2 P1 P2 B1 B2 Y1 Y2 X1 X2 Example: Dependency Reconstruction backup site primary site Ta(1) Tb Ta(1) Tb 5 6 5 6 ? Ta(2) 18 • Say Tb requests lock first at B1; • Tb request delayed until all lockswith tickets <6 have been granted data dependency: TaTb Notes08

Epoch Algorithm • Backup updates are installed in batches • Epoch delimiters written on log Notes08

Writing Delimiters at Primary master 15 16 slave 15 16 slave 15 16 log time Notes08

Problem with Commits master 15 16 prepare commit slave T 15 16 slave 15 16 log time T’s commit record in Epoch 15 in some logs; in Epoch 16 in others Notes08

Solution: Bump Epoch master 15 16 prepare commit slave T 15 16 slave 15 16 log time prepare ack reports epoch number; coordinator bumps epoch if necessary Notes08

Installing an Epoch at Backup master 15 16 install 16 end of 16 slave 15 16 end of 16 slave 15 16 log time Notes08

To Install Epoch X at Backup J • Redo transactions: • If commit(T)  X, commit T • If prepare(T)  X but commit(T) > X: • If T’s primary peer was coordinator, do not commit; • Else check with the backup of T’s coordinator B’: • If B’ committing T in epoch X, then we commit T • Else do not commit T • Otherwise do not commit T (defer to next epoch) commit(T)  X means that T’s commit record found in epoch X (or earlier) at node J. Notes08

Why Do We Need Coordinator Check? • Assignment: Construct 2 scenarios that look the same to backup J: • In Scenario 1, T should be installed • In Scenario 2, T should not be installed Notes08

Scenario 1 B’ C(T) P(T) 15 16 slave C(T) P(T) 15 16 log time Notes08

Scenario 2 B’ P(T) C(T) 15 16 slave C(T) P(T) 15 16 log time Notes08

Scenario 3: Possible? B’ P(T) C(T) 15 16 17 slave C(T) P(T) 15 16 17 log time Note that T commits at slave but not at B’!! Notes08

Scenario 4: Possible? B’ P(T) C(T) 17 15 16 slave P(T) C(T) 15 16 17 log time Note that T commits at B’ but not at slave!! Notes08

Comparison of Options • 2-safe • 1-safe • dep reconstruction • epoch • Specific Senario • updates at fixed primary site • each site has multiple computers • primary-backup sites are matched • clean site failures; stable storage; rel net • log shipping • no reads at backup • no initialization Notes08

How to Evaluate • What system? • actual system(s) • simulation • testbed • What transactions? • real transactions • synthetic transactions Notes08

Metrics • IO utilization • CPU utilization • Throughput (given max delay?) • Transaction commit delay • Backup copy lag • Network overhead • Probability of inconsistency Notes08

Sample Results Notes08

And Now For SomethingCompletely Different: • Updates • at any copy • at fixed (primary) copy • at one copy but control can migrate • no updates next: available copies have seen Notes08

PC-lock available copies • Transactions write lock at all available copies • Transactions read lock at any available copy • Primary site (static) manages U – set of available copies * down primary X1 X2 X3 X4 Notes08

Update Transaction (1) Get U from primary (2) Get write locks from U nodes (3) Commit at U nodes U={C0, C1} C0 Primary C1 Backup C2 Backup updates, 2PC U Trans T3, U={C0, C1} Notes08

A potential problem - example Now: U={C0, C1} -recovering- I am recovering C0 Primary C1 Backup C2 Backup Trans T3, U={C0, C1} Notes08

A potential problem - example Later: U={C0, C1, C2} -recovering- You missed T0, T1, T2 C0 Primary C1 Backup C2 Backup T3 updates T3 updates Trans T3, U={C0, C1} Notes08

Solution: • Initially transaction T gets copy of U’ ofU from primary (or uses cached value) • At commit of T, check U’ with current Uat primary (if different, abort T) Notes08

Solution Continued • When CX recovers: • request missed and pending transactionsfrom primary (primary updates U) • set write locks for pending transactions • Primary polls nodes to detect failures(updates U) Notes08

Example Revisited You missed T0, T1, T2 U={C0, C1} U={C0, C1, C2} I am recovering C0 Primary C1 Backup C2 Backup reject -recovering- prepare prepare Trans T3, U={C0, C1} Notes08

Available Copies — No Primary • Let all nodes have a copy of U(not just primary) • To modify U, run a special atomic transaction at all available sites(use commit protocol) • E.g.: U1={C1, C2}  U2={C1, C2 , C3}only C1, C2 participate in this transaction • E.g.: U2={C1, C2 , C3}  U3={C1, C2}only C1, C2 participate in this transaction Notes08

Details are tricky... • What if commit of U-change blocks? Notes08

Node Recovery (no primary) • Get missed updates from any active node • No unique sequence of transactions • If all nodes fail, wait for - all to recover - majority to recover Notes08

Example recovering node  How much information (update values) must beremembered? By whom? Committed: A,B,C,D,E,F Pending: G Committed: A,B Committed: A,C,B,E,D Pending: F,G,H Notes08

Correctness with replicated data S1: r1[X1]  r2[X2]  w1[X1]  w2[X2]  Is this schedule serializable? X2 X1 Notes08

One copy serializable (1SR) A schedule S on replicated data is 1SR if it is equivalent to a serial history of the same transactions on a one-copy database Notes08

To check 1SR • Take schedule • Treat ri[Xj] as ri[X] Xj is copy of X wi[Xj] as wi[X] • Compute P(S) • If P(S) acyclic, S is 1SR Notes08

CS 347: Distributed Databases and Transaction Processing Data Replication

CS 347: Distributed Databases and Transaction Processing Data Replication

Presentation Transcript

Overview of Databases and Transaction Processing

CSI5311 Distributed Databases and Transaction Processing

ICS 214B: Transaction Processing and Distributed Data Management

Distributed Transaction Processing

CS 347: Parallel and Distributed Data Management Notes02: Distributed DB Design

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

Transaction Processing in Mobile Distributed Databases

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

CS 347: Parallel and Distributed Data Management Notes05: Concurrency Control

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

ICS 214B: Transaction Processing and Distributed Data Management

Distributed Databases and Query Processing

CS 347: Parallel and Distributed Data Management Notes07: Data Replication