370 likes | 910 Views
Transactions. A transaction is a complete and consistent series of database operations to perform some logical task that is part of a database application. a database application usually consists of many such transactions
E N D
Transactions • A transaction is a complete and consistent series of database operations to perform some logical task that is part of a database application. • a database application usually consists of many such transactions • A transactions starts with the first SELECT, INSERT, UPDATE and DELETE statement and ends with either: • “commit work” – save the work done by the transaction • “rollback work” – undo the work done by the transaction Database Systems
Properties of a Transaction • Transaction management in a DBMS allows a large number of transactions to execute concurrently while maintaining: • Atomicity:a transaction either finishes all operations successfully or has no effect on the database • Consistency:when a transaction completes successfully, it leaves the database in a consistent state • Isolation:transactions run as if no other process is accessing the database while they execute • Durability:when a transaction completes successfully, its effects are written to disk and are never lost! Database Systems
Transactions • A transaction is defined as a series of read and write operations • select is a read type operation, insert is a write operation, delete and update are a read operation followed by a write operation • data items (usually tuples) are operands for the operations in a transaction • most often we disregard or simplify the program logic that takes place in between the database operations • A transaction: R(A) R(B) A := A + 500 W(A) B:= B - 500 W(B) Database Systems
T1 is overwriting a change T2 made T2 is reading something T1 changed (DIRTY READ) As a result, T2 used a value for A that should have never been in the database Transaction Management T1 T2 R(A) R(B) R(B) A := A + 500 W(A) B := B+200 W(B) R(A) B:= B - 500 W(B) A:= A-200 W(A) ABORT Time Database Systems
Transaction Management • What does it mean to correctly execute a series of transactions? When are interleaved read/write operations from transactions valid? • a schedule is a global sequence of database operations from multiple transactions • serializable schedules are equivalent to executing all transactions one by one in some order T1,T2,…,Tk (i.e. T1 first, then T2, etc.) • serializable schedules guarantee isolation and consistency Database Systems
Achieving the ACID Properties • How do we guarantee all schedules generated by a DBMS are serializable? [Concurrency Control] • locking can be used to guarantee serializable schedules as we will see (consistency and isolation) • locking also guarantees atomicity • How do we make sure that if a transaction succeeds, its effects are never lost? [Recovery] • write-ahead logging, achieves durability Database Systems
Concurrency Control • Concurrency control – activity of coordinating the actions of transactions that operate in parallel and access shared data. • The operations of multiple transactions must be interleaved in such a way that the transactions do not interfere with one another. • The results of the execution of the transactions must be the same as if they were run one after the other in a serial fashion – serializability. • A database system must find serializable executions of the transactions it receives. • For simplicity, we will assume transactions are composed of the operations: read(x), write(x), commit and abort. Database Systems
Concurrency Control Problems T1 T2 • Lost update problem: • Dirty read problem: read(x) x = x + 1 read(x) The final value of x is incremented by one rather than by two, as it should be. time x = x + 1 write(x) write(x) T1 T2 read(x) T2 reads a value for x that should not have been part of the database. x = x + 1 write(x) time read(x) abort Database Systems
More Problems • Incorrect summary problem: T1 T2 sum = 0 T1 incorrectly computes the sum of x, y, and z. read(x) sum = sum + x read(y) y = y - 1 write(y) time If T2 is moving money From one account to another and T1 is computing total assets, this can be a serious problem. read(y) sum = sum + y read(z) sum = sum + z read(z) z = z + 1 write(z) Database Systems
Transaction Histories • A history is a partial ordering of the operations in a set of transactions recording how the transactions executed. • Two operations conflict if they operate on the same data item and at least one of them is a write. • in this case it is important to know which of the two operations went first • Let T = {T1, T2, ..., Tn}. A history H over T is a partial order <H that includes all the operations in the {Ti } and that preserves the order of the operations in each Ti. It must also order all pairs of conflicting operations. Database Systems
History Example • Consider the following three transactions: • T1 = r1[x] ---> w1[x] ---> c1 • T2 = r2[x] ---> w2[y] ---> w2[x] ---> c2 • T3 = r3[y] ---> w3[x] ---> w3[y] ---> w3[z] ---> c3 • A history over these transactions is: • r2[x] ---> w2[y] ---> w2[x] ---> c2 • r3[y] ---> w3[x] ---> w3[y] ---> w3[z] ---> c3 H = • r1[x] ---> w1[x] ---> c1 Database Systems
Serialization Graph • A serialization graph of a history H is a directed graph whose nodes are the transactions in H and whose edges are all of the form Ti ---> Tj such that an operation in Ti precedes and conflicts with an operation in Tj. • Example • The serialization graph for this example is: • r3[x] ---> w3[x] ---> c3 • r1[x] ---> w1[x] ---> w1[y] ---> c1 H = • r2[x] ---> w2[y] ---> c2 T2 T1 T3 Database Systems
Serializability Theory • Two histories are equivalent if they order conflicting operations of non-aborted transactions in the same way. • Goal -- determine if there is a serial history equivalent to a given history H. • if there is, then the transactions in H did not interfere with one another • Therefore, we need a serial history consistent with the edges in the serialization graph for the history H. • Such a serial history exists if the serialization graph for history H is acyclic. Database Systems
Example - A Serializable History • Example: • The serialization graph for this example is: • An equivalent serial history is: • r3[x] ---> w3[x] ---> c3 • r1[x] ---> w1[x] ---> w1[y] ---> c1 H = • r2[x] ---> w2[y] ---> c2 T2 T1 T3 T2 T1 T3 Database Systems
Example: Non-Serializable History • r2[x] ---> w2[y] ---> w2[x] ---> c2 • Example: • The serialization graph for this history is: • There are no equivalent serial histories because the serialization graph has a cycle. • these transactions interfere with one another • r3[y] ---> w3[x] ---> w3[y] ---> w3[z] ---> c3 H = • r1[x] ---> w1[x] ---> c1 T1 T2 T3 Database Systems
Serialization Graphs • Given a serialization graph that does not contain a cycle, an equivalent serial schedule can always be constructed: • create an empty schedule S • while the graph is not empty: • find all nodes N in the graph with no incoming edges • order transactions in N in any arbitrary serial order and add them to the end of S for each transaction in N, select the operations in the current schedule and write them in the same order to the new schedule • delete nodes N and all edges from nodes in N to the remaining graph Database Systems
Schedulers • A database system must have a scheduler that orders operations in concurrently executing transactions. • Aggressive (optimistic) scheduler -- assume the best and abort if discover a problem later • the abort undoes operations that the transaction completed before the abort • good when conflicts are rare • Conservative scheduler -- delay operations so that they can be reordered to avoid problems. • less concurrency • good when conflicts are likely Database Systems
Locking to Ensure Serializability • A transaction should have a lock in its possession for any data objects it accesses. • read/shared lock is required to read an item • write/exclusive lock is required to write an item • Many transactions may have read locks on the same data item. • If a transaction holds a write lock on an item, no other transaction may obtain a lock of any kind for this item. Database Systems
Two Phase Locking (2PL) • Before each read and write operation, the transaction manager attempts to obtain the necessary lock for the appropriate data item • if the lock is granted, then the operation is performed • if the lock is not granted, then the whole transaction waits in a queue until the lock can be obtained • A transaction can release a lock at any time • after a transaction releases a lock for the first time, it cannot obtain any new locks on any data items • The COMMIT command in SQL releases all locks a transaction holds, which is a stronger condition than required by 2PL. [Strict Two Phase Locking] Database Systems
2PL THEOREM: All schedules generated by enforcing the 2PL protocol are serializable. The actual read/write events that take place in the database correspond to some serial execution of transactions. PROOF: The conflict graph induced by 2PL compliant schedulers are acyclic. By contradiction, assume the serialization graph has a cycle of the form: Then there must be at least two conflicts. A sample scenario for conflicts 1 and 2: R1(A)…W2(A)...R2(B)…W1(B) conflict1 T1 T2 conflict2 Database Systems
2PL conflict1 T1 T2 R1(A)…W2(A)...R2(B)…W1(B) • Such a cycle is not possible. T2 could only obtain a write lock for “A” if T1 first releases it’s read lock and enters its shrinking phase. • Consequently, T1 will be able to write “B” only if it already holds a write lock for “B”. • But is T1 already holds a write lock for “B”, then T2 will not be able to get the read lock on B that it needs. • Hence, this schedule cannot happen. conflict2 Database Systems
But There is a Problem ! • Two-phase locking is prone to deadlocks. • Example: • T1: r1 [x] ---> w1 [y] ---> c1 • T2: w2 [y] ---> w2 [x] ---> c2 • Lock conversion from read locks to write locks can also cause a problem. • T1: r1 [x] ---> w1 [x] ---> c1 • T2: r2 [x] ---> w2 [x] ---> c2 Database Systems
Deadlocks in 2PL • If a transaction requests a lock of any kind and cannot obtain it, it enters a wait state • this can lead to deadlocks • A wait-for graph to detect deadlocks is constructed as follows: • if transaction Ti is waiting for a lock that transaction Tj holds, draw an edge from Ti to Tj • If the wait-for graph has a cycle, then no transaction in the cycle will ever exit the “wait” state. [DEADLOCK] • One of the transactions participating in a deadlock can be aborted to remove the deadlock Database Systems
Example • T1: R(A) W(B) • T2: R(B) W(A) • Schedule: R1(A) R2(B) • T1 is waiting for T2 to release the lock on B, T2 is waiting for T1 to release the lock on A • Neither of the transactions can complete there is a cycle in the wait-for graph deadlock T1 T2 Database Systems
Deadlocks • Time-outs -- if a transaction has waited too long for a lock, the scheduler “guesses” that it is deadlocked and aborts it • Maintain wait-for graphs • things to consider when aborting transactions: • effort already invested • cost to abort • other potential deadlocks • work remaining • priority and locks already held Database Systems
Implementation Issues • The scheduler in a database system is usually implemented as a combination of a lock manager and a transaction manager. • The lock manager maintains a table of locks and supports the following operations: • lock (transaction id, data item, mode) • unlock (transaction id, data item) • these must be atomic operations • The lock manager also maintains a queue for each data item of transactions waiting for locks on that item. Database Systems
Locking Granularity • Possible lock granularities: • attribute value in a tuple • tuple • disk block • file • whole database • Larger granularities allow less concurrency • Smaller granularities require more overhead • Many database systems provide a small set of choices to pick from. • Row (tuple level) locking only in Oracle. Database Systems
Timestamp Ordering • A unique timestamp ts(Ti) is given to each transaction. • If operations oi[x] from Ti and oj[x] from Tj conflict, then the system processes oi[x] before oj[x] iff ts(Ti) < ts(Tj). • Basic Timestamp Ordering: (aggressive) • operations executed in first come first serve order • if an operation arrives to late, its transaction is aborted • the scheduler must maintain the maximum timestamp of previous reads and writes for each data item to do this • Timestamp Ordering establishes an ordering on the transactions so that the serialization graph will be acyclic. Database Systems
Cascading Abort • By definition, if a transaction aborts, its effects must be completely erased from the database • Example: • if T1 aborts, since T2 has seen the changes caused by T1, then T2 has to be aborted as well -> cascading abort • schedules generated by 2PL may require cascading abort Database Systems
Strict 2PL • A transaction obtains all locks it requires as in normal 2PL. • A transaction holds all locks until it commits • Since all transactions see only the changes made by committed (not aborted) transactions, these transactions do not require cascading aborts • the example on the previous slide could not happen • Remember that most commercial database systems use strict 2PL • this is one of the reasons why Database Systems
Levels of Isolation • 2PL ensures serializability, but at the expense of lower concurrency • “Consistency” may not be vital to some types of transactions • In all the isolations levels, all locks are kept until commit time! Database Systems
Read Uncommitted • Read all tuples regardless of the locks that are on them. • When reading, a short lock (latch) is used to ensure atomicity. • Transactions that are set to be “read uncommitted” are read only and cannot make database updates • Results of such transactions are not guaranteed to be correct! SET TRANSACTION READ UNCOMMITTED ; SELECT count(F.reserved_seats) FROM Flights F WHERE F.flight_id = “AA112” ; Database Systems
Read committed (cursor stability) • There are no read locks (except for latches) ! But, transactions can only read items with no locks. • A transaction needs a write lock to write a data item. • Write locks are not released until the transaction holding them commits. • Avoids all W1(A) -> R2(A) and W1(A) -> W2(A) type conflicts since transaction T2 will only execute after transaction T1 commits • Problem with R1(A) -> … -> W1(A) type transactions Another transaction T2 may read and alter A in between these operations Database Systems
Cursor Stability • Cursor Stability • declare a cursor for update when using “read committed” • a read lock is held on each row the cursor points to until the cursor is moved from that location exec sql set transaction read committed ; exec sql declare cursor deposit for select balance from accounts where branch_id = ‘Alb’ for update of balance ; exec sql open deposit ; (loop through rows in cursor) exec sql fetch deposit into :balance ; balance = balance + 10 ; exec sql update account set balance = :balance where current of deposit ; (end of loop) exec sql close deposit ; exec sql commit work ; Database Systems
Repeatable reads • Hold read locks on rows as well as write locks until commit time • similar to the 2PL mechanism we have seen • Suffers from “phantom update” problems. exec sql set transaction repeatable reads ; exec sql select count(F.reserved_seats) into :num; from Flights F where F.flight_id = “AA112” exec sql update current set status = :num where current.flight_id = “AA112” ; Another transaction may insert a reservation tuple in between these operations making the count incorrect Database Systems
Serializable • Serializable transactions do not have the phantom update problem. • Serializable transactions lock a “predicate”, limiting all read/write access to operations that would satisfy a predicate such as F.flight_id = “AA112” • In Oracle Database Systems