720 likes | 879 Views
CSL 771: Database Implementation Transaction Processing. Maya Ramanath All material (including figures) from: Concurrency Control and Recovery in Database Systems Phil Bernstein, Vassos Hadzilacos and Nathan Goodman (http :// research.microsoft.com /en-us/people/ philbe / ccontrol.aspx ).
E N D
CSL 771: Database ImplementationTransaction Processing Maya Ramanath All material (including figures) from: Concurrency Control and Recovery in Database Systems Phil Bernstein, VassosHadzilacos and Nathan Goodman (http://research.microsoft.com/en-us/people/philbe/ccontrol.aspx)
Transactions • Interaction with the DBMS through SQL updateAirlinessetprice = price - price*0.1, status = “cheap” whereprice< 5000 A transaction is a unit of interaction
ACID Properties • Atomicity • Consistency • Isolation • Durability Database system must ensure ACID properties
Atomicity and Consistency • Single transaction • Execution of a transaction: “all-or-nothing” Either a transaction completes in its entirety Or it “does not even start” • As if the transaction never existed • No partial effect must be visible 2 outcomes: A transaction COMMITs or ABORTs
Consistency and Isolation • Multiple transactions • Concurrent execution can cause an inconsistent database state • Each transaction executed as if isolated from the others
Durability • If a transaction commits the effects are permanent • But, durability has a bigger scope • Catastrophic failures (floods, fires, earthquakes)
What we will study… • Concurrency Control • Ensuring atomicity, consistency and isolation when multiple transactions are executed concurrently • Recovery • Ensuring durability and consistency in case of software/hardware failures
Terminology • Data item • A tuple, table, block • Read (x) • Write (x, 5) • Start (T) • Commit (T) • Abort (T) • Active Transaction • A transaction which has neither committed nor aborted
High level model Transaction 2 Transaction 1 Transaction n Transaction Manager Scheduler Disk Recovery Manager Cache Manager
Recoverability (1/2) • Transaction T Aborts • T wrote some data items • T’ read items that T wrote • DBMS has to… • Undo the effects of T • Undo effects of T’ • But, T’ has already committed
Recoverability (2/2) • Let T1,…,Tn be a set of transactions • Ti reads a value written by Tk, k < i • An execution of transactions is recoverable if Ti commits after all Tk commit
Cascading Aborts (1/2) • Because T was aborted, T1,…, Tk also have to be aborted
Cascading Aborts (2/2) • Recoverable executions do not prevent cascading aborts • How can we prevent them then ?
What we learnt so far… Reading a value, committing a transaction Recoverable with cascading aborts Recoverable without cascading aborts Not recoverable
Strict Schedule (1/2) • “Undo”-ing the effects of a transaction • Restore the before image of the data item Equivalent to Final value of y: 3
Strict Schedule (2/2) Initial value of x: 1 Should x be restored to 1 or 3? T1 restores x to 3? T2 restores x to 2? Do not read or write a value which has been written by an active transaction until that transaction has committed or aborted
The Lost Update Problem Assume x is your account balance
Serializable Schedules • Serial schedule • Simply execute transactions one after the other • A serializableschedule is one which equivalent to some serial schedule
Serializable Schedules T1: op11, op12, op13 T2: op21, op22, op23, op24 • Serial schedule • Simply execute transactions one after the other op11, op12, op13 op21, op22, op23, op24 op11, op12, op13 op21, op22, op23, op24 • Serializable schedule • Interleave operations • Ensure end result is equivalent to some serial schedule
Notation r1[x] = Transaction 1, Read (x) w1[x] = Transaction 1, Write (x) c1 = Transaction 1, Commit a1= Transaction 1, Abort r1[x], r1[y], w2[x], r2[y], c1, c2
Histories (1/3) • Operations of transaction T can be represented by a partial order. r1[x] w1[z] c1 r1[y]
Histories (2/3) • Conflicting operations • Of two ops operating on the same data item, if one of them is a write, then the ops conflict • An order has to be specified for conflicting operations
Histories (3/3) • Complete History
Serializable Histories • The goal: Ensure that the interleaving operations guarantee a serializable history. • The method • When are two histories equivalent? • When is a history serial?
Equivalence of Histories (1/2) H ≅H’ if • they are defined over the same set of transactions and they have the same operations • they order conflicting operations the same way
Equivalence of Histories (2/2) y Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman
Serial History • A complete history is serial if for every pair of transactions Ti and Tk, • all operations of Ti occur before Tk OR • all operations of Tk occur before Ti • A history is serializableif its committed projection is equivalent to a serial history.
Serialization Graph T1 T3 T2
Serializability Theorem A history H is serializable if its serialization graph SG(H) is acyclic On your own How do recoverability, strict schedules, cascading aborts fit into the big picture?
High level model Transaction 2 Transaction 1 Transaction n Transaction Manager Scheduler Disk Recovery Manager Cache Manager
Transaction Management Transaction 1 Transaction 2 Transaction 3 . . . Transaction n • Transaction Manager • Receives Transactions • Sends operations to scheduler Read1(x) Write2(y,k) Read2(x) Commit1 Disk • Scheduler • Execute op • Reject op • Delay op
Locking • Each data item x has a lock associated with it • If T wants to access x • Scheduler first acquires a lock on x • Only one transaction can hold a lock on x • T releases the lock after processing Locking is used by the scheduler to ensure serializability
Notation • Read lock and write lock rl[x], wl[x] • Obtaining read and write locks rli[x], wli[x] • Lock table • Entries of the form [x, r, Ti] • Conflicting locks • pli[x], qlk[y], x = y and p,q conflict • Unlock rui[x], wui[x]
Basic 2-Phase Locking (2PL) RULE 2 pli[x] cannot be released until pi[x] is completed RULE 1 Receive pi[x] is qlk[x] set such that p and q conflict? NO Acquire pli[x] RULE 3 (2 Phase Rule) Once a lock is releasedno other locks may be obtained. YES pi[x] scheduled pi[x] delayed
The 2-phase rule Once a lock is releasedno other locks may be obtained. T1: r1[x] w1[y] c1 T2: w2[x] w2[y] c2 H = rl1[x] r1[x] ru1[x]wl2[x] w2[x] wl2[y] w2[y] wu2[x] wu2[y] c2wl1[y] w1[y] wu1[y] c1 T1 T2
Correctness of 2PL 2PL always produces serializable histories Proof outline STEP 1: Characterize properties of the scheduler STEP 2: Prove that any history with these properties is serializable (That is, SG(H) is acyclic)
Deadlocks (1/2) T1: r1[x] w1[y] c1 T2: w2[y] w2[x] c2 Scheduler rl1[x]wl2[y]r1[x]w2[y]<cannot proceed>
Deadlocks (2/2) Strategies to deal with deadlocks • Timeouts • Leads to inefficiency • Detecting deadlocks • Maintain a wait-for graph, cycle indicates deadlock • Once a deadlock is detected, break the cycle by aborting a transaction • New problem: Starvation
Conservative 2PL • Avoids deadlocks altogether • T declares its readset and writeset • Scheduler tries to acquire all required locks • If not all locks can be acquired, T waits in a queue • T never “starts” until all locks are acquired • Therefore, it can never be involved in a deadlock On your own Strict 2PL (2PL which ensures only strict schedules)
Extra Information • Assumption: Data items are organized in a tree Can we come up with a better (more efficient) protocol?
Tree Locking Protocol (1/3) RULE 2 if x is an intermediate node, and y is a parent of x, the ali[x] is possible only if ali[y] RULE 1 Receive ai[x] NO is alk[x] ? RULE 2 RULE 3 ali[x] cannot be released until ai[x] is completed YES pi[x] scheduled ai[x] delayed RULE 4 Once a lock is releasedthe same lock may not be re-obtained.
Tree Locking Protocol (2/3) • Proposition: If Ti locks x before Tk, then for every v which is a descendant of x, if both Ti and Tk lock v, then Ti locks v before Tk. • Theorem: Tree Locking Protocol always produces Serializable Schedules
Tree Locking Protocol (3/3) • Tree Locking Protocol avoids deadlock • Releases locks earlier than 2PL BUT • Needs to know the access pattern to be effective • Transactions should access nodes from root-to-leaf
Multi-granularity Locking (1/3) • Granularity • Refers to the relative size of the data item • Attribute, tuple, table, page, file, etc. • Efficiency depends on granularity of locking • Allow transactions to lock at different granularities
Multi-granularity Locking (2/3) • Lock Instance Graph • Explicit and Implicit Locks • Intention read and intention write locks • Intention locks conflict with explicit read and write locks but not with other intention locks Source: Concurrency Control and Recovery in Database Systems: Bernstein, Hadzilacos and Goodman
Multi-granularity Locking (3/3) • To set rli[x] or irli[x], first hold irli[y] or iwli[y], such that y is the parent of x. • To set wli[x] or iwli[x], first hold iwli[y], such that y is the parent of x. • To schedule ri[x] (or wi[x]), Ti must hold rli[y] (or wli[y]) where y = x, or y is an ancestor of x. • To release irli[x] (or iwli[x]) no child of x can be locked by Ti
The Phantom Problem • How to lock a tuple, which (currently) does not exist? T1: r1[x1], r1[x2], r1[X], c1 T2: w[x3], w[X], c2 rl1[x1], r1[x1], rl1[x2], r1[x2], wl2[x3], wl[X], w2[x3], wu2[x3,X], c2, rl1[X], ru1[x1,x2,X], c1