450 likes | 540 Views
CS 440 Database Management Systems. Transaction Management: Concurrency Control. Example. Flight( fltNo , fltDate , seatNo , seatStatus ) Check the available seats SELECT seatNo FROM Flight WHERE fltNo =123 AND fltDate =DATE’08-1-1’ AND seatStatus = ‘available’;
E N D
CS 440 Database Management Systems Transaction Management: Concurrency Control
Example Flight(fltNo, fltDate, seatNo, seatStatus) • Check the available seats SELECT seatNo FROM Flight WHERE fltNo=123 AND fltDate=DATE’08-1-1’ AND seatStatus = ‘available’; 2. Book the available seat UPDATE Flight SET seatStatus = ‘occupied’ WHERE fltNo=123 AND fltDate=DATE‘08-1-1’ AND seatNo = 22A
Example • John checks for availability and gets seat 22A • John books seat 22A • Mary checks for availability abd gets seat 22B • Mary books seat 22B
Example • John checks for availability and gets seat 22A • Mary checks for availability abd gets seat 22A • John books seat 22A • Mary books seat 22A Double booking!
Operations of Transactions • Read(A) [R(A)] • Read data item A • WRITE(A) [W(A)] • Write on data item A • Each transaction is a sequence of read and write operations on data items.
Schedule • The order of execution of operations of two or more transactions. Schedule S1 Transaction1 Transaction2 R(A) R(C) W(A) R(B) W(C) R(B) W(B) W(B) Read data item A (usually a tuple) Time
Schedule Transaction 1: Add $100 to account A Transaction 2: Add $200 to account A R(A) W(A) R(A) W(A) Time
Schedule Transaction 1: Add $100 to account A Transaction 2: Add $200 to account A R(A) W(A) R(A) W(A) Time The Lost Update Problem
Schedule Transaction 1: Add $100 to account A Transaction 2: Add $200 to account A R(A) W(A) F A I L R(A) W(A) Time Dirty reads cause problems.
Isolation • The concurrent execution of transactions must be such that each transaction appears to execute in isolation. • Better performance than serial schedules.
Scheduler Transaction Manager Read/Write request Guarantees Isolation Scheduler Reads and writes Buffers
Serial Schedule • The transactions are performed one after the other. Transaction1 Transaction2 Transaction1 Transaction2 R(A) W(A) R(B) W(B) R(C) W(C) R(B) W(B) R(C) W(C) R(B) W(B) R(A) W(A) R(B) W(B)
Serial Schedules Are Not Desirable • Terrible throughput • I/O activities cannot be done in parallel with CPU access • High average waiting time • A short transaction has to wait for some (many) long transactions to complete.
Serializable Schedules • A serializable schedule is guaranteed to give the same final result as some serial schedule. • Which of these are serializable? Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Read(A) Write(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Write(A) Read(A) Write(A) Read(B) Write(B) Read(B) Write(B)
Serializable Schedules • Why is that last schedule serializable? Read(A) Write(A) Read(A) Write(A) Read(B) Write(B) Read(B) Write(B) Read(A) Write(A) Read(B) Write(B) Read(A) Write(A) Read(B) Write(B)
Notations for Schedules • T1: r1(A); w1(A); r1(B); w1(B); • T2: r2(A); w2(A); r2(B); w2(B); • S: r1(A); w1(A); r2(A); w2(A); r1(B); w1(B); r2(B); w2(B);
Serializability versus Conflict-Serializability • Checking if a transaction is serializable often requires knowing the semantic of transaction. • Hard for scheduler • Conflict-serializability is a stronger notion that serializability that can be checked easier. • Based on the idea of conflict between operations
Conflicting Operations Transaction 1: Add $100 to account A Transaction 2: Add $200 to account A R(A) W(A) R(A) W(A) Time
Conflicting Operations Transaction 1: Add $100 to account A Transaction 2: Add $200 to account A R(A) W(A) R(A) W(A) Time The Lost Update Problem: the order of W(A) and R(A) matters
Conflicting Operations • Two actions of the same transaction - E.g.,ri(X); wi(Y) • Two writes of the same data item - E.g., wi(X); wj(X) • A read and a write of the same data item - E.g., ri(X); wj(X) • They access the same data item. • At least one of them is a write operation.
Swapping Operations • Any pair of operations in a given schedule can be swapped unless they are conflicting operations. • Examples: • ri(X); rj(Y) • ri(X); wj(Y) if X!= Y • wi(X); rj(Y) if X != Y • wi(X); wj(Y) if X != Y
Example r1(A); w1(A); r2(A); w2(A); r1(B); w1(B); r2(B); w2(B); r1(A); w1(A); r2(A); r1(B); w2(A); w1(B); r2(B); w2(B); r1(A); w1(A); r1(B); r2(A); w2(A); w1(B); r2(B); w2(B); r1(A); w1(A); r1(B); r2(A); w1(B); w2(A); r2(B); w2(B); r1(A); w1(A); r1(B); w1(B); r2(A); w2(A); r2(B); w2(B);
Conflict-serializable • Two schedules are conflict-equivalent if they can be turned one into the other by a sequence of nonconflicting swaps of adjacent actions • A schedule is conflict-serializable if it is conflict-equivalent to a serial schedule • Easy to check whether a schedule is conflict-serializable by examining a precedence graph
Test for Conflict-Serializability • The transactions performing conflicting operations in schedule S must appear in the same order of the operations in some conflict-equivalent serial schedule of S. • Summarize these conflicting operations in a precedence graph
Precedence Graph for Schedule S • T1 takes precedence over T2 (T1 <S T2), if there are operations O1of T1 and O2of T2, s.t. • O1is ahead of O2in S • Both O1and O2 access the same data item • At least one of O1and O2is a write operation • Construct a precedence graph and ask if there are any cycles.
Example S: r2(A); r1(B); w2(A); r3(A); w1(B); w3(A); r2(B); w2(B); 1 2 3 S’: r1(B); w1(B); r2(A); w2(A); r2(B); w2(B); r3(A); w3(A);
Example S1: r2(A); r1(B); w2(A); r2(B); r3(A); w1(B); w3(A); w2(B); 1 2 3
Guaranteeing Conflict Serializability • Scheduler must guarantee isolation • Conflict serializability • Two approaches: • Pessimistic: There are many conflicting transactions. • Optimistic: There are a few conflicting transactions.
Pessimistic Approaches: Lock-based • Before a transaction read (write) data item A, it must request a read (write) lock on the data item from the lock manager. • It eventually gives up the lock.
Lock-based Scheduler Transaction Manager Read/Write request Scheduler Lock Manager Reads and writes Buffers
Types of Locks • Shared lock = read lock: Many transactions can hold a shared lock over A simultaneously. • Exclusive lock = write lock: At most one transaction can hold an exclusive lock over A at a any given time. • Lock manager makes decision based on compatibility matrix:
2 Phase Locking (2PL) • A transaction will not request any new lock after gives up one lock. • Each transaction has two phases: • Getting locks (growing) • Releasing locks (shrinking)
Growing Phase • Can only lock data items during this phase. • May also upgrade the locks (from read-lock to write-lock). • Read/ write the locked data items. • No unlocking in this phase
Shrinking Phase • Can only release locks on the data items. • May also downgrade the locks (from write-lock to read-lock). • This phase stars with the first release-lock. • No locking after the first release-lock.
2PLSchedule T1 T2 X-lock(B) W(B) S-lock(A) R(A) Release(A) W(B) Release(B) S-lock(A) R(A) X-lock(A) W(A) Release(A) Upgrade Shrinking
Why does 2PL work? • The locks of conflicting operations are not compatible. • 2PL does not allow the swap of conflicting operations. • Serial order between conflicting operations • All conflicting operations of T1 before T2. • It is possible to swap all non-conflicting operations in transactions to get a serial schedule.
2PLdoes not solve every potential problem We should never have let T1 commit T1 T2 R(A) W(A) W(B) R(A) W(B) T1 Commits Cascading rollback T2 Aborts
How do we deal with this? • Commit transaction T only after all transactions that wrote data that T read have committed. • Only let a transaction read an item after the transaction that last wrote this item has committed.
Strict 2PL • 2PL + a transaction releases its locks only after it has committed. • How does Strict 2PL prevent cascading rollback?
Deadlock • Circular wait among transactions waiting for each other to release a lock. • Prevent: every transaction must lock all items it needs in advance!!? • Detect: • Constitute a wait-for graph • Check the waiting transactions and break the deadlock by aborting one of them.
Even S2PL does not solve the phantom problem T2 T1 Insert a Toyota with price $0 Find the cheapest car in table Car Find the cheapest car in table Car Honda Toyota Non-repeatable read! Would never happen in a serial schedule
Solution to Phantom Phenomena • Lock the index used to reach tuples • Some protocols and locks on locking tree nodes • If there is no index, lock the whole table.
What exactly are we locking anyway? • The most common data lock granularities • Record • Page • Table • If you access a small number of records, the best locking granularity is one record. • If you access many records of the same table, best to have coarser granularity locks. • Programmers (DBA) may interfere.
Optimistic Approaches • Locking has overhead • If you lock a table, all transactions using a record of that table must wait. • You may assume that the conflicts are rare. • Let the transactions run, check if the schedule is serializable. If it is not, abort and re-run the transactions. • Timestamp-based methods • Read the textbook
Isolation Levels • Sometime we do not need serializable schedules • It is OK to have non-repeatable reads • Each isolation level guarantees that some cases will not happen • No lost update, but may have non-repeatable reads. • Allows more concurrency and better throughput.