300 likes | 1.01k Views
Principles of Transaction Management. Outline. Transaction concepts & protocols Performance impact of concurrency control Performance tuning. Application Programmer (e.g., business analyst, Data architect). Application. Sophisticated Application Programmer (e.g., SAP admin).
E N D
Outline • Transaction concepts & protocols • Performance impact of concurrency control • Performance tuning
ApplicationProgrammer(e.g., business analyst, Data architect) Application SophisticatedApplicationProgrammer(e.g., SAP admin) QueryProcessor Indexes Storage Subsystem Concurrency Control Recovery DBA,Tuner Operating System Hardware[Processor(s), Disk(s), Memory]
Transaction Concepts & Protocols • Transaction • A logical unit of database processing • A sequence of begin, reads/writes, end • Unit of recovery, consistency, concurrency • Transaction Processing Systems • Large databases with multiple users executing database transactions • Examples • Banking systems, airline reservations, supermarket checkouts, ...
Transition STATE Transaction States read-item, write-item begin-transaction end-transaction commit Active Partially Committed Committed abort abort Failed Terminated
Interleaved Transactions • A and B are concurrent transactions A A B B Time t1 t2 t3 t4 t5
Transaction “Correctness” • ACID properties • Atomicity • Consistency • Isolation • Durability • Enforced by concurrency control and recovery methods of the DBMS
Serial Schedule • Schedule • A sequence of read & write operations from various transactions • R1[X] W3[Y] R2[X] W2[Y] W1[X] W2[X] • Serial schedule • No interleaved operations from the participating transactions • W3[Z] R3[Y] R1[X] W1[Y] R2[Y] W2[Z] W2[X] • Always correct, but … so slow! • A schedule that is equivalent to some serial schedule is correct too
Serializable Schedule T1 T2 R(A) W(A) R(A) W(A) R(B) W(B) R(B) W(B) Commit Commit
Equivalent Schedules • 2 schedules are equivalent if the transactions • Read the same values • Produce the same output • Have the same effect on the database • Examples • R1[X] W2[X] R3[Y] W1[Y] R2[Y] W3[Z] W2[Z] • W3[Z] R3[Y] R1[X] W1[Y] R2[Y] W2[Z] W2[X] • W2[X] R1[X] W1[Y] R2[Y] W3[Z] W2[Z] R3[Y] • 1 and 2 are equivalent; not 3
Serializable Schedule Theorem • A schedule is serializable if there is a serial schedule such that for every conflicting pair of operations, the two operations appear in the same order in both schedules. • 2 operations conflict if they are on the same object and one is a write • Example 1 is serializable
WR Conflicts T1 T2 T1 transfer $100 from A to B, and T2 increments both and B by 6% (A and B have $200 initially) R(A) ($200) W(A) ($100) R(A) (100) W(A) (106) Dirty read R(B) (200) W(B) (212) Commit R(A) R(B) (212) W(B) (312) Unrepeatable Read (UR) Commit
WW Conflicts T1 T2 T1 to set both A and B to $1000, T2 to set both A and B to $2000 R(A) W(A) ($1000) R(B) W(B) ($2000) R(B) W(B) ($1000) R(A) W(A) ($2000) Commit Lost Update! Commit
Concurrency Control Enforces Serializability • Most commercial DBMS use protocols (a set of rules)which when enforced by DBMS ensure the serializability of all schedules in which transactions participate. • Serializability testing after execution is meaningless; how to rectify? • This done by Concurrency Control
Concurrency Control Protocols • Commercially accepted mechanisms • Locking • Timestamps • Others mechanisms • Multi-version and optimistic protocols • Granularity issues
Locking • Locking is used to synchronize accesses by concurrent transactions on data items • A concept also found in operating systems and concurrent programming • A lock is a variable for a data item, that describes the status of the item with respect to allowable operations
Types of Locks • Binary locks • Locked, or Unlocked • Check before enter; wait when locked; lock after enter; unlock after use (and wakeup one waiting transaction). • Simple but too restrictive • Read/Write locks in commercial DBMS • read-locked • write-locked • Unlocked R-lock W-lock R-lock Y N W-lock N N
Read/Write Locking Scheme • A transaction T must issue read-lock (X) or write-lock before any read-item (X) • T must issue write-lock (X) before any write-item (X) • T must issue unlock-item (X) after completing all read-item (X) and write-item (X) • T will not issue a read-lock (X) if T already holds a read/write lock on X • T will not issue write-lock (X) if T already holds a write lock on X
Does Locking Ensure Serializability? T1 read-lock (Y); read-item (Y); unlock (Y); write-lock (X); read-item (X); X:=X+Y; write-item (X); unlock (X); T2 read-lock (X); read-item (X); unlock (X); write-lock (Y); read-item (Y); Y:=X+Y; write-item (Y); unlock (Y); X unlocked too early Y unlocked too early X == Y (orignal X + originalY) For serializable T1T2, X == X + Y Y == 2Y + originalX? Cannot serialize T1 and T2
Need for Locking Protocol • Locking alone does not ensure serializability! • We need a locking protocol • A set of rules that dictate the positioning of locking and unlocking operations, thus guaranteeing serializability
Two-Phase Locking (2PL) • A transaction follows the two-phase protocol if all locking operations precede the first unlocking operation read-lock (X) write-lock (X) write-lock (Y) read-lock (Y) unlock (X) unlock (Y) Phase 1: Growing Phase 2: Shrinking
2PL Variants • Basic 2PL • Conservative 2PL • Locking operations precede transaction execution • Make sure can acquire necessary locks • Strict 2PL • Unlocking of write-locks after commit (or abort) • Avoid cascading abort • Rigorous 2PL • Unlocking of all locks after commit (or abort)
Limitations of 2PL • Some serializable schedules may not be permitted • Performance not optimal • 2PL (and locking in general) may cause deadlocks and starvation • Deadlock: no transactions can proceed • Starvation: some transaction wait forever
Lock Granularity • Larger size - lower concurrency • Smaller size - higher overhead What is the best item size? Processing a mix of transactions? Depends on the type of transactions Multiple granularity locking scheme, changing the size of the data item dynamically
Performance of Locking Throughput Thrashing # of Active Transactions • Overhead: blocking • Increasing the throughput: • Locking smaller size objects • Reducing locking time • Reducing hot spots
Other CC Protocols • Timestamp based • Multi-version based • Optimistic concurrency control • No checking is done before or during transaction execution • The transaction is validated at the end of execution, by checking if serializability has been violated
Summary of Transaction Concepts Baseline: Serial Schedule Transaction Correctness • Other CC Protocols • Timestamp • Multi-version • Optimistic Strict 2PL A C I D 2PL Ideal: Serializable Schedule
Summary To improve performance Interleave transactions Correctness: ACID Serial schedule is correct Serializable schedule is equivalent to some serial schedule Concurrency control enforces serializability • 2PL • Deadlock • Starvation • Granularity Optimistic Timestamping Multi-version
Performance Impact of Concurrency Control • Lock contention • Deadlock
Performance Impact of Concurrency Control • LONG transactions are penalized