1.32k likes | 1.36k Views
Understand the impact of system failures and concurrent queries on database consistency. Learn about correct transaction processing in parallel to maintain database integrity. Explore different approaches in concurrency control.
E N D
CONCURRENCYCONTROL Spring 2015
Warning This is a first draft I welcome your corrections
One common objective Maintaining database in a consistent state Means here maintaining the integrity of the data After a money transfer between two accounts, the amount debited from the fist account should be equal to the amount credited to the second account Assuming no-fee transfer
Two different problems Handling outcomes of system failures: Server crashes, power failures, … Previous chapter Preventing inconsistencies resulting from concurrent queries/updates that interfere with each other This chapter
The problem Assume that we want to process in parallel A transaction T1crediting a checking account by $2,000 A transaction T2 debiting the same account by $100
A correct schedule T1 reads the current balance ($500) T1 computes the new balance ($2,500) T1 writes the new balance ($2,500) to disk T2 reads the current balance ($2,500) T2 computes the new balance ($2,400) T2 writes the new balance ($2,400) to disk Serial execution of the two transactions
Another correct schedule T2 reads the current balance ($500) T2 computes the new balance ($400) T2 writes the new balance ($400) to disk T1 reads the current balance ($400) T1 computes the new balance ($2,400) T1 writes the new balance ($2,400) to disk Another serial execution
An incorrect schedule T1 reads the current balance ($500) T2 reads the current balance ($500) T1 computes the new balance ($2,500) T1 writes the new balance ($2,500) to disk T2 computes the new balance ($400) T2 writes the new balance ($400) to disk That's not right!
Another problem Assume that we want to process in parallel A transaction T1crediting a checking account X by $2,000 A transaction T2 debiting a checking account Y by $100
A correct schedule T1 reads the current balance of X ($500) T1 computes the new balance of X ($2,500) T1 writes the new balance of X ($2,500) to disk T2 reads the current balance of Y ($800) T2 computes the new balance of Y ($700) T2 writes the new balance of Y ($700) to disk Serial execution of the two transactions
Another correct schedule T1 reads the current balance of X ($500) T2 reads the current balance of Y ($800) T1 computes the new balance of X ($2,500) T1 writes the new balance of X ($2,500) to disk T2 computes the new balance of Y ($700) T2 writes the new balance of Y ($700) to disk A correct concurrent schedule
Two different approaches (I) Most OS ensure the correctness of conflicting operations by forcing them to execute in mutual exclusion Through locks, semaphores, monitors Avoid conflicts but forces operations to execute in some serial order Acceptable for short critical sections Not acceptable for DB Would slow down update rate
Two different approaches (II) DBMS will let potentially conflicting transactions execute concurrently controlt heir actual schedule and guarantee that their outcome will be identical to some serial execution of these transactions Serializability criterion
Notations As far as concurrency control is concerned, a transaction can be represented by the sequence of its read and write operations: r1(A); w1(A)means transaction T1 reads A then updates A A transaction schedule S represents the sequence of the read/write operations of multiple transactions as they are executed r1(A); r2(A); w1(A); w2(B)
Serial schedules A schedule is said to be serial if all its transactions execute One at a time Without any interleaving All serial schedules are assumed to be correct Different orders of execution may produce different outcomes
Example Transaction T1 reads the balance of an account Transaction T2 withdraws $200 Two serial schedules S = r1(A); r2(A); w2(A) S' = r2(A); w2(A); r1(A) are both correct even though T1 will return different values
Serializable schedules An interleaved schedule is said to be serializable if its outcome is identical to some serial schedule for the same transactions All serializable schedules are assumed to be correct
Example Transaction T1 reads the balance of an account Transaction T2 withdraws $200 S = r1(A); r2(A); w2(A) is a serial schedule S' = r2(A); r1(A); w2(A) is interleaved but equivalent to S S' is serializable
Defining equivalence We cannot depend on either The original values of the DB elements being modified Their new values We will use conflict-based serializability:
Conflicting operations Two operations are said to be conflicting if They belong to two different transactions T1 and T2 They access the same DB element A At least one of them is a write
Examples The following pairs of operations conflict: r1(A); w2(A) w1(A); r2(A) w1(A); w2(A) The following pairs do not: r1(A); r2(A) w1(A); r2(B) w1(A); w2(B)
Important property Changing the order of two conflicting operations is likely to affect the correctness of the schedule r1(A); r2(A); w1(A); w2(A) is not equivalent to r1(A); w1(A) r2(A); w2(A) Changing the order of two non-conflicting operations will never affect the correctness of the schedule r2(A); r1(A); w2(A) is equivalent tor1(A); r2(A); w2(A)
Our serializability criterion Two schedules S1 and S2 are said to beconflict equivalent if S1 can be transformed into S2 by a series of swaps on non-conflicting actions. G.-M. A schedule S is said to be conflict serializable if it is conflict-equivalent to some serial schedule. G.-M.
Example r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B)can be successively transformed intor1(B); r2(A); w2(A); w1(B); r3(A); r2(B); w3(A);w2(B)r1(B); r2(A); w1(B); w2(A);r2(B); r3(A);w2(B); w3(A)r1(B); w1(B); r2(A); w2(A); r2(B); w2(B); r3(A); w3(A)which corresponds to a serial execution of the transactions T1, T2 and T3
Another example The schedule r1(A) r2(A) w1(A) w2(A)cannot be transformed into either r1(A) w1(A) r1(A) w1(A)or r1(A) w1(A) r2(A) w2(A)and is not conflict serializable
Definition For a given schedule S, a transaction T1 is said to be aheadof a transaction T2 if there is an action A1 in T1 and an action A2 such that A1 appears ahead of A2 in the schedule, Both A1 and A2 access the same element X of the DB At least one of them is a write We write T1 <S T2
Precedence graph The precedence graph for a schedule S contains: A node for each transaction Ti in S An arc from node Tito node Tkif Ti <S Tk
Example In the schedule r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B)we have T1 <S T2becausew1(B); …; r2(B); T2 <S T3becausew2(A); …; r3(A); 1 2 3
Another example In the schedule r1(A) r2(A) w1(A) w2(A)we have T2 <S T1 T1 <S T2 1 2
Theorem The necessary and sufficient condition for a schedule to be conflict-serializable is that its precedence graph has no cycles
Necessary condition If schedule S contains a cycle, such asT1 T2 T3 … Tn T1, any equivalent serial schedule must have All actions of T1 preceding all actions of T2, All actions of T2 preceding all actions of T3 … All actions of Tn preceding all actions of T1, which is impossible
Sufficient condition Proof by induction Basis: Obvious if the schedule contains a single transaction
Sufficient condition Induction step: Assume it is true for any schedule with n - 1 transactions Let S be a schedule with n transactions Since the precedence graph of S has no cycles, at least one of its nodes has no incoming arcs Take all actions of transaction associated with that node and start new schedule with them What remains of S has no cycles andn - 1transactions
Pessimistic scheduling (I) Uses locks Incomingtransactions Locktable Transactionscheduler Serializabletransactionschedule
Pessimistic scheduling (II) Scheduler will guarantee the serializability of the schedules it issues by enforcing sufficient conditions These conditions will be enforced through locking Outcome will be schedules that satisfy conflict serializability
Locking Two operations li(A): transaction Ti requests and gets an exclusive lock on element A ui(A): transaction Ti releases its exclusive lock on element A
First rule:Well-formed transactions A transaction T can only access an element A of a DB if it has obtained a lock for A and not released it All transactions will eventually release all their locks Ti: … li(A) … ri(A) …wi(A) … ui(A) ...
Second rule:Legal scheduler Scheduler will not grant a lock to a transaction Tk for a DB entity Aif another transaction Ti holds a lock on A Ti: … li(A) … ri(A) …wi(A) … ui(A) ... lk(A) must wait x
Locking is not enough First transaction:l1(A)r1(A)u1(A)………l1(A)w1(A)u1(A) Second transaction:l2(A)r2(A)u2(A)…l2(A)w2(A)u2(A) T1 and T2read same old value of A then enter a race condition when theyupdate it
Two-phase locking Requires all transactions to acquire all their locks before they release any Go through two phases Acquiring their locks Releasing them Locks ownedby transaction Time
Claim: 2PL producesserializable schedules Geometric argument: Locks ownedby T2 Locks ownedby T1 Time
More formally Any legal schedule that only contains consistent two-phase locked transactions is serializable To prove it we will show we can convertS = T1, T2, ..., Tn into an equivalent serial schedule S'
Proof (I) Basis: True for n = 1 Induction step: Let Ti be transaction with first unlock action, say ui(X), in entire schedule One way to start constructing S' would be to move all actions of Ti ahead of the actions of all other transactions
Proof Consider an arbitrary action of Ti, say wi(Y) The sole condition that would prevent us from moving wi(Y) to the beginning of S' would the existence of a conflicting operation wk(Y) in some transaction Tk This would assume the existence in the schedule of a sequence of action containinglk(Y) ...wk(Y) ... Uk(Y)... li(Y) ...wi(Y) ...ui(Y) ...which is contrary to our assumption.
2PL is enough First transaction:l1(A)r1(A)…w1(A)u1(A) Second transaction:l2(A)r2(A)…w2(A)u2(A) T2 must nowwait Schedule is serial and thus serializable
Shared and exclusive locks Shared locks: sli(A) allows transaction Tito access element A in read-only mode Can be shared with other transactions requiring the same mode of access Exclusive locks: xli(A) grants to transaction Ti the exclusiverightto access element A Required for write access
First rule:Well-formed transactions A transaction T can only Perform a read action ri(A) on an element A of a DB if it has obtained a shared or an exclusive lock for A and not released it Perform a write action wi(A) on an element A of a DB if it has obtained an exclusive lock for A and not released it All transactions will eventually release all their locks
Second rule:Legal scheduler Scheduler will not grant: A shared lock to a transaction Tk for a DB entity Aif another transaction Ti holds a exclusive lock on A An exclusive lock to a transaction Tk for a DB entity Aif another transaction Ti holds a a lock on A (either shared or exclusive)
Third rule:Two-phase locking Requires all transactions to acquire all their locks before they release any The option of downgrading a lock and gofrom an exclusive lock xli(A)to ashared lock sli(A) is not offered