960 likes | 1.13k Views
Database Systems: Transaction Management. Necessity. Database systems are normally being accessed by many users or processes at the same time. Both queries and modifications.
E N D
Necessity • Database systems are normally being accessed by many users or processes at the same time. • Both queries and modifications. • Unlike Operating Systems, which support interaction of processes, a DMBS needs to keep processes away from troublesome interactions.
Example: Bad Interaction • You and your father ( Joint A/c) each take Rs. 10,000 from different ATM’s at about the same time. • The DBMS should make sure one account deduction doesn’t get lost. • Comparison: An OS allows two people to edit a document at the same time. If both write, one’s changes get lost. • What if the connection to the bank is lost during the transaction?
Transaction Management Support • Two main issues to deal with: • Concurrent execution of multiple transactions • Failures of various kinds, such as hardware failures and system crashes
Transactions: Basic concepts Transaction • A logical unit of database processing. • An action or series of actions, carried out by user or application, which accesses or changes contents of database. • Transforms database from one consistent state to another, although consistency may be violated during transaction. Transaction Processing System • Systems with large databases and multiple concurrent users that are executing database transactions. Examples • Banking systems, Airline reservations, Supermarket checkouts, ...
Database Access Operations • Performed on a data item • read-item (X) • write-item (X) • A transaction can have multiple database access operations. • Each transaction has clearly specified beginning and end statements. • A single application program may contain many transactions.
read-item (X), write-item (X) Read_item(X) includes steps: 1. Find the address of the disk block that contains the item X 2. Copy the disk block into a buffer in main memory 3. Copy the item X from the buffer to a program variable ( for simplicity also called X) Write_item(X) includes steps: 1. Find the address of the disk block that contains the item X 2. Copy the disk block into a buffer in main memory 3. Copy the item X from a program variable into its correct location in the buffer 4. Store the updated block from the buffer back to disk
A Simple Transaction T1 T1: read-item (X); X:= X + M; write-item (X); read and write sets of a Transaction read-set of T1 is {X} • reads database item x into a program variable x write-set of T1 is also {X} • writes program value of the variable x into the database item x
Another example T2 read_item(X); X:=X-N; write_item(X); read_item(Y); read_item(Z); Y:=Y+Z+N; write(Y); read_set of T2 is {X,Y,Z}, write_set of T2 is {X,Y}
Transaction Operations A transaction is either completed in its entirely or not done at all. Hence for recovery purpose, the recovery manager keeps track of the following transaction operations. • begin-transaction • read-item • write-item • end-transaction • commit • abort (or rollback)
Transition STATE Transaction States read-item, write-item begin-transaction end-transaction commit Active Partially Committed Committed abort abort Failed Terminated
[start-transaction, T1] … ... [start-transaction, T2] … [commit, T2] … [commit, T1] Commit Point of a Transaction T Marks the successful completion of T and having recordedthe effect of all the transaction operations on the database in the system log. [Commit, T] is recorded in the system log
[start-transaction, T1] … ... [start-transaction, T2] … [commit, T2] … [rollback, T1] Rollback Point of a Transaction T Causes the transaction to end, but by aborting. No effects on the database. [Rollback, T] is recorded in the system log • Failures like division by 0 can also cause rollback, even if the programmer does not request it
Desirable properties: ACID Transactions should possess ACID properties. ACID properties should be enforced by concurrency control and recovery methods of the DBMS • Atomicity – Either the whole process is done or none is.. • Consistency Preservation– Database constraints are preserved.. • Isolation – It appears to the user as if only one process executes at a time. • Durability – Effects of a committed process do not get lost if the system crashes.
Why Concurrency Control is needed? Transaction processing systems are large databases with multiple users executing database transactions Multiple users are concurrently executing transactions A transaction can have several data access operations, some of which could be accessing the same data item
Contd… • Simultaneous execution of transactions over a shared database can create several data integrity and consistency problems. • Transactions could be run serially, but this limits the degree of concurrency or parallelism in system. • Although two transactions may be correct in themselves, interleaving of operations may produce an incorrect result.
!!! Concurrency Problems • Lost update • Dirty read • Incorrect summary • Unrepeatable Read
Lost Update Two transactions have their operations interleaved in such a way that it makes the value of some data items incorrect T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T2 read-item (X); X:=X+M; write-item (X); Item X has an incorrect value because its update by T1 is lost
T i m e Lost update T1 T2 r[x] r[x] x:=x+1 x:=x-1 w[x] w[x] r[y] y:=y-1 w[y] x=5 x=5 x:=6 x:=4 x=6 x=4 y=2 y:=1 y=1 The update of the item x by T1 is lost
Dirty read (temporary update) A transaction updates a database item and then fails. The updated item is accessed by another transaction before it is restored back to its original value T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Abort; T2 read-item (X); X:=X+M; write-item (X); T1 fails and must restore the value of X; meanwhile T2 has read the temporary incorrect value of X
Dirty read (temporary update) T1 T2 r[x] x:=x+1 w[x] r[x] x:=x+2 w[x] r[y] Abort T1 x=5 x:=6 x=6 x=6 x:=8 x=8 r[y] The value of x written to the database is equal to: initial value of x + 3, while it should be x+2
Incorrect summary A transaction aggregating a number of records may read values of some records before, and some after the update by another transaction T3 sum:=0; read-item(A) sum:=sum+A; . . . read-item (X); sum:=sum+X; read-item (Y); sum:=sum+Y; T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T3 reads X after N is subtracted, and reads Y before N is added; a wrong summary is calculated
Unrepeatable read A transaction reads the value of an item twice, and the value of the item is changed by another transaction in between the reads T1 read-item (X); read-item (X) X:=X-N; write-item (X); T2 read-item (X); X:=X+M; write-item (X); T1 reads X again, however T2 has changed the value of X after the first read
Unrepeatable read T1 T2 r[x] x:=3*x w[x] r[x] x:=x+2 w[x] r[y] r[x] y:=y+x/2 w[y] x=4 x:=12 x=12 x=12 x:=14 x=14 y=3 x=14 y:=10 y=10 Value of x used in T1 to compute the update of y is not the value of x expected to be used
Why Recovery is needed? • Two types of storage: volatile (main memory) and nonvolatile. • Volatile storage does not survive system crashes. • Reasons for the need of recovery • Physical problems and catastrophes • Disk failure • System failure • Transaction failure • Local error or exception condition • Concurrency control enforcement. • The system must keep sufficient information to recover from the failure. • DBMS should commit changes for successful transactions and reject changes of aborted transactions.
Desirable properties: ACID Transactions should possess ACID properties. ACID properties should be enforced by concurrency control and recovery methods of the DBMS • Atomicity – Either the whole process is done or none is.. • Consistency Preservation– Database constraints are preserved.. • Isolation – It appears to the user as if only one process executes at a time. • Durability – Effects of a committed process do not get lost if the system crashes.
Schedules • A Schedule is the order of execution of operations from various transactions. • A formal definition of a schedule is: • A schedule S of n transactions T1, T2, …, Tn is an ordering of the operations of these transactions, given that Ti S, the order of operations of Ti in S is the same as in the original Ti. • Schedules consider read-item, write-item, commit and abort operations only and the order of operations in S to be a total order.
Example Schedule Sa : r1 (X); r2 (X); w1 (X); r1 (Y); w2 (X); w1 (Y); T2 read-item (X); X:=X+M; write-item (X); T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); Notation r read-item w write-item c commit a abort r1 (X) T1: read-item (X) a2 T2: abort
Characterising Schedules Based on Recoverability • For some types of schedules it is easy to recover but not for all. • Characterise the type of schedules for which recovery is possible and relatively simple: • Recoverable and nonrecoverable schedules • Cascadeless or Avoid cascading rollback (ACR) schedules • Strict schedules
Recoverable Schedules Schedules that can recover from transaction failures such that once a transaction is committed it should never be necessary to roll back. • Non recoverable schedules should not be permitted for execution. • Formally A schedule S is recoverable if no transaction T in S commits until all transactions T` that write an item that T reads, have committed.
Examples of Recoverable Schedules • Sc: r1(X); w1(X); r2(X); r1(Y); w2(X);c2; w1(Y); a1; T2 reads item X fromT1 and then T2 commits before T1 commits. If T1 aborts after c2 then X that T2 read is no longer valid and T2 must be aborted after it has been committed. - Non-recoverable Schedule • Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2; - Recoverable Schedule
More Examples Consider the following schedules for two transaction with interleaved execution. S1: r1[x], r2[y], w1[x], w2[y], r2[x], w2[x], c2, r1[y], a1 S2: r1[x], r2[y], r1[y], w2[y], w1[x], r2[x], w2[x], c1, c2 • S1 is not recoverable since T2 reads item x written by the transaction T1 that failed. • S2 is recoverable; i.e. T2 reads items written by T1 and does not commit before T1. • In a recoverable schedule, no committed transaction ever needs to be rolled back.
T7 T3 T4 T9 T6 T2 Cascading Rollback An uncommitted transaction has to be rolled back because it reads an item from a transaction which failed. It can be time consuming! T8 T5 Suppose that T5 has to be aborted. All transactions ‘reachable’ from T5 are aborted. T1
Cascadeless or ACR Schedules • Cascadeless schedules are recoverable schedules that avoid cascading rollbacks. • A cascadeless schedule is guaranteed not to be rolled back. • Formally A schedule S is cascadeless if every transaction reads only items that were written by committed transactions.
Examples of Cascadeless Schedules • Recoverable Schedule with cascading rollback Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2; T2 has to be rolled back because it reads X from T1 and T1 then aborted, • Cascadeless Schedule ( delaying T2) Sf: r1(X); w1(X); a1; r2(X); w2(X); c2;
Strict Schedules • Strict schedules are recoverable and cascadeless schedules that guarantee correct results. • Strict schedules simplify the recovery process. • Formally A schedule S is strict if transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted).
Examples of Strict Schedules • Recoverable and Cascadeless Schedule with potential incorrect results. Sg: r1(X); w1(X); w2(X); a1; - not a strict schedule because T2 writes X before T1 commits or aborts that last wrote X. • Strict Schedule Sh: r1(X); w1(X); a1; w2(X);
Characterization of Schedules Avoidance of cascading rollback Recoverability Strictness
Characterising Schedules Based on Serializability • Characterise the type of schedules that are considered correct when concurrent transactions are executing. • Serial schedules • Nonserial schedules • Conflict-Serializable schedules
Serial Schedules • Schedules that execute each transaction one by one without any interleaving. • Formally A schedule S is serialif for every transaction T participating in S, all operations of T are executed consecutively; otherwise the schedule is called nonserial. • There are n! serial schedules for n transactions • A serial schedule is always correct, but unacceptable in practice !
Serial Schedules (b) T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T2 read-item (X); X:=X+M; write-item (X); T2 read-item (X); X:=X+M; write-item (X); Examples of Serial Schedules Serial Schedules (a)
Non-serial Schedules (c) T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T2 read-item (X); X:=X+M; write-item (X); T2 read-item (X); X:=X+M; write-item (X); Examples of Nonserial Schedules Non-serial Schedules (d)
Results of Example Schedules • Initial Values of database items • X = 90, Y =90, N = 3, M = 2 • Results: • Schedule (a): Y = 93, X =89 • Schedule (b): Y = 93, X =89 • Schedule (c): Y = 93, X =92 • Schedule (d): Y = 93, X =89 • We are interested in schedule like the schedule (d).
Serializable Schedules • A serializableschedule is a nonserial schedules that is equivalent to some serialschedule. • it gives the correct result in spite of interleaving. • Formally A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions • When are two schedules ‘equivalent’?
Schedule equivalence • Conflict Equivalence The order of any two conflicting operations is the same in both schedules. • View Equivalence Each read operation of a transaction reads the result of the same write operation in both schedules. • Result Equivalence The two schedules produce the same final state of the database. • Other types of Equivalence
Conflict Equivalence and Conflicting Operations Schedules are called conflict equivalent if the order of any two conflicting operations is same in both schedules. • Operations of a schedule are in Conflict if they satisfy all of the following conditions: • they belong to different transactions; • they access the same item X; and • at least one is a write-item (X). An example Sa : r1 (X);r2 (X); w1 (X); r1 (Y); w2 (X); w1 (Y); the Conflicting operations are: {r1(X) and w2(X)}, {r2(X) and w1(X)}, {w2(X) and w1(X)} but {r1(X) and r2(X)}, {w1(X) and w2(Y)} are not in conflict.
Example of Conflict Equivalence • Consider two schedules: S1: …. r1(X); w2(X); ….; S2: …. w2(X); r1(X); ….; • S1 and S2 are not conflict equivalent. • r1(X) and w2(X) are conflicting operations of transactions T1 and T2. • Value read by r1(X) can be different in the two schedules.
T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T2 read-item (X); X:=X+M; write-item (X); T2 read-item (X); X:=X+M; write-item (X); Serial Schedule S`: … w1(X);…; r1(X);…; Conflict Serializable Schedule S: … w1(X); r1(X);…; Conflict Serializable A schedule S is conflict serializable if it is conflict equivalent to some serial schedule S`.
Precedence Graph Precedence Graph is used to test for serializability A directed graph G = (N, E), where N is a set of Nodes, N = {T1, T2, …, Tn} E is a set of directed edges, E = {e1, e2, …, em} Each transaction Ti in the schedule has one node Each edge ei is (Tj Tk) 1 j n, 1 k n The edge ei is created when an operation in Tj is followed by a conflicting operation in Tk The schedule S is serializable iff the graph has no cycles • A path is called a cycle if it starts and ends in the same node and contains at least two nodes. • If the precedence graph contains cycle, the schedule is not conflict serializable.
T1 read-item (X); X:=X-N; write-item (X); read-item (Y); Y:=Y+N; write-item (Y); T2 read-item (X); X:=X+M; write-item (X); Serial Schedule Sa: … w1(X);…; r2(X);…; Precedence Graph for a Serial Schedule T2 T1 X Precedence Graph for Sa