470 likes | 496 Views
Enterprise Database Systems Transaction Processing. Technological Educational Institution of Larissa in collaboration with Staffordshire University Larissa 200 9. Dr. Georgia Garani Dr. Theodoros Mitakos teo_ms@yahoo.com. Agenda. Transaction Processing (Concepts and Theory)
E N D
Enterprise Database SystemsTransaction Processing Technological Educational Institution of Larissa in collaboration with Staffordshire University Larissa 2009 Dr. Georgia Garani Dr. Theodoros Mitakosteo_ms@yahoo.com
Agenda • Transaction Processing (Concepts and Theory) • Concurrency Control Techniques
Introduction • A transaction is an executing program that forms a logical unit of database processing. A transaction includes one or more database access operations (these can include insertion, deletion, modification or retrieval operations). It must be completed in its entirety to ensure correctness. • Transaction processing systems are systems with large databases and hundreds of concurrent users that are executing database transactions (e.g. reservations, banking, credit card processing, supermarkets)
Database Systems Classification • The data model (e.g. relational, object oriented, network) • The number of users supported (multiuser, single user) • The number of sites over which the database is distributed (centralized, distributed, federated) • The cost of the DBMS (low end, expensive) • General purpose, special purpose DBMSs (eg traditional systems, online transaction systems)
Database Systems Classification according of the number of users • Single user: At most one user can use the system at a time. • Multi-user: Many users can access the database concurrently. • Concurrent execution of processes is interleaved in multiprogrammed operating systems
Example in SQL Server BEGIN TRANSACTION /* the beginning of the transaction */ UPDATE employee SET emp_no = 39831 WHERE emp_no = 10102 IF(@@error <> 0) ROLLBACK TRANSACTON /* Rollback of the transaction */ UPDATE works_on SET emp_no = 39831 WHERE emp_no = 10102 IF(@@error <> 0) ROLLBACK TRANSACTON /* Rollback of the transaction */ COMMIT TRANSACTION /* The end of the transaction */ The consistent state of data can be obtained only if both UPDATE or neither of them are executed
TRANSACT-SQL STATEMENTS AND TRANSACTIONS • BEGIN TRANSACTION • BEGIN DISTRIBUTED TRANSACTION • COMMIT TRANSACTION • ROLLBACK TRANSACTION • SAVE TRANSACTION
TRANSACT-SQL STATEMENTS AND TRANSACTIONS • BEGIN TRANSACTION [transaction name] [WITH MARK [‘description’]] transaction name is the name assigned to a transaction WITH MARK option specifies that the transaction is to be marked in the log • A distributed transaction is one that involves databases on more than one server • The COMMIT TRANSACTION statement successfully ends the transaction started with the BEGIN TRANSACTION statement. This means all modifications made by the transaction are stored on the disk. • The SAVE TRANSACTION statement set a save point within a transaction. A save point marks a specified point within the transaction so that all updates that follow can be cancelled without canceling the entire transaction.
EXAMPLE BEGIN TRANSACTION INSERT INTO department(dept_no, dept_name) VALUES(‘d4’,’Sales’) SAVE TRANSACTION a INSERT INTO department(dept_no, dept_name) VALUES(‘d5’,’Research’) SAVE TRANSACTION b INSERT INTO department(dept_no, dept_name) VALUES(‘d6’,’Management’) ROLLBACK TRANSACTION b INSERT INTO department(dept_no, dept_name) VALUES(‘d7’,’Support’) ROLLBACK TRANSACTION a COMMIT TRANSACTION
Each Transact-SQL statement always belongs implicitly or explicitly to a transaction. • When a session operates in implicit transaction mode , selected statements implicitly issue the BEGIN TRANSACTION statement. This means that you do nothing to start such a transaction. However, the end of each implicit transaction must be explicitly committed or rolled back using COMMIT statement.
Transaction Logging • SQL server keeps record of each change it makes to the database during a transaction. This is necessary in case an error occurs during the execution of the transaction. In this case all previously executed statements within the transaction have to rolled back. • SQL server keeps the before and after values in the transaction log. The transaction log is used to roll back or restore a transaction.
Locking • Locks can be applied to the following database objects: • Row • Page (main unit of data storage in SQL server) • Index • Extend (physical unit of storage in which space is allocated to a table) • Table • Database itself The size of the object that is locked is called granularity
Kinds of locks • Row level – Page level • Shared (S) • Exclusive (X) • Update (U) • A shared lock reserves a database object for reading only • An exclusive lock reserves a page or row for the exclusive use of a single transaction • An update lock can be placed only if no other update or exclusive lock exists. On the other hand, it can be placed on objects that already have shared locks
Table level locks • Shared (S) • Exclusive (X) • Intended share (IS) • Intended exclusive (IX) • Shared with intent exclusive (SIX) • Generally, an indent lock shows an intention to lock the next lower object in the hierarchy of the database objects. Therefore, intent locks are placed at a level in the hierarchy above that witch the process intends to lock.
Basic database access operations • Read_item(X): reads a database item X into a program variable • Find the address of the disk block that contains item X • Copy that disk block into a buffer in main memory • Copy item X from the buffer to the program variable named X • Write_item(X) Writes the value of program variable X into the database item X • Find the address of the disk block that contains item X • Copy that disk block into a buffer in main memory • Copy item X from the program variable named X into its correct location in the buffer • Store the updated block from the buffer back to disk
T1 Read_item(X); X:=X-N; Write_item(X); Read_item(Y); Y:=Y+N; Write_item(Y); T2 Read_item(X); X:=X+M; Write_item(X); Example transactions
Why concurrency control is needed • The lost update problem. This problem occurs when two transactions that access the same database items have their operations interleaved in a way that makes the value of some database items incorrect.
T1 Read_item(X); X:=X-N; Write_item(X); Read_item(Y); Y:=Y+N; Write_item(Y); T2 Read_item(X); X:=X+M; Write_item(X); Example
Temporary Update (or Dirty read) problem: The problem occurs when one transaction updates a database item and then the transaction fails for some reason. The updated item is accessed by another transaction before it is changed back to its original value.
T1 Read_item(X); X:=X-N; Write_item(X); Read_item(Y); Transaction T1 fails and must change the value of X back to its old value; meanwhile T2 has read the “temporary” incorrect value of X T2 Read_item(X); X:=X+M; Write_item(X); The value of X is called dirty data Example
The incorrect summary problem: If one transaction is calculating an aggregate summary function on a number of records while other transactions are updating some of these records, the aggregate function may calculate some values before they are updated and others after they are updated.
T1 Read_item(X); X:=X-N; Write_item(X); Read_item(Y); Y:=Y+N; Write_item(Y); T3 Sum:=0; Read_iem(A); sum:=sum+A; Read_item(X); Sum:=sum+X; Read_item(Y); Sum:=sum+Y; Example
Types of Failures • A system crash • A transaction or system error (e.g. division by zero) • Local errors or exception conditions detected by the transaction (e.g. data for the transaction not found) • Concurrency control enforcement (abort the transaction) • Disk failure (read/write malfunction) • Physical problems and catastrophes.
Transaction states End transacion Begin transaction commit active Partially commited commited abort abort failed terminated
Commit point • A transaction T reaches its commit point when all its operations that access the database have been executed successfully and the effect of all transaction operations on the database have been recorded in the log. Beyond the commit point the transaction is said to be committed and its effect is assumed to be permanently recorded in the database
Desirable properties of transactions • Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all • Consistency preservation: A transaction is consistency preserving if its complete execution takes the database from one consistent state to another • Isolation: A transaction should appear as though it is being executed in isolation from other transactions • Durability: The changes applied to the database by a committed transaction must persist in the database.
Schedules Histories of transactions • A schedule S of n transactions T1, T2, …, Tn is an ordering of the operations of the transactions subject to the constraint that, for each transaction Ti that participates in S, the operations of Ti in S must appear in the same order in which they occur in Ti. • However operations from other transactions Tj can be interleaved with the operations of Ti. • We are interested in read_item, write_item, commit and abort operations.
Conflict • Two operations in a schedule are said to conflict if: • Belong to deferent transactions • Access the same item X • At least one of the operations is write_item(X)
Characterizing schedules based on recoverability • Recoverable schedules: Once a transaction T is committed it should never be necessary to rollback T. The other are called nonrecoverable. • A schedule S is recoverable if no transaction T in S commits until all transactions T’ that have written an item that T reads have committed. • A schedule is said to be cascedeless or avoid cascading rollback if every transaction in the schedule reads only items that were written by committed transactions. • A schedule is called strict if transactions can neither read nor write an item X until the last transaction that wrote X has committed.
Characterizing schedules based on serializability • A schedule S is serial if for every transaction T participating in the schedule, all operations of T are executed consequently in the schedule; otherwise the schedule is called nonserial. (Only one transaction at a moment is active) • A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the same n transactions.
Defining equivalence • Two schedules are said to be conflict equivalent if the order of any two conflicting operations is the same in both schedules. • View equivalence: • The idea is as long as each read operation of a transaction reads the result of the same write operation in both schedules the write operations of each transaction must produce the same results. The read operations are said to see the same view in both schedules.
In most practical systems methods are determined to ensure serializability without having to test schedules themselves (there exist algorithms though that test if schedules are serializable). According to this approach protocols (set of rules) ensure serializability of all schedules in which transactions participate.
Conditions under which schedule Sh is considered correct even though it is not serializable. • With the additional knowledge, or semantics, that the operations between each ri(I) and wi(I) are commutative, we know that the order of executing the sequences consisting of (read, update, write) is not important as long as each (read, update, write) sequence by a particular transaction Ti on a particular item I is not interrupted by conflicting operations.
Types of locks • Binary locks: A binary lock can have two states or values (locked and unlocked, 0 or 1 for simplicity). A distinct lock is associated with each database item. If the value of the lock on X is 1, item X cannot be accessed y a database operation that requests the item. If the value is 0 the item can be accessed when requested. Lock_item(X), unlock_item(X)
Types of locks • Shared/Exclusive (read/write) locks:Allows several transactions to access the same item X if they all access x for reading purposes only. However, if a transaction is to write an item X it must have exclusive access to X. read_lock(X), write_lock(X), unlock(X)
Conversion of locks: A transaction that already holds a lock on item X is allowed under certain conditions to convert the lock from one locked state to another. • Using binary locks or read/write locks in transactions does not guarantee serializability of schedules on its own.
Two phase locking • A transaction is said to follow the two phase locking protocol if all locking operations read_lock, write lock precede the first unlock operation in the transaction. • Expanding (first) phase during which new locks on items can be acquired but none can be released. • Shrinking (second) phase during which existing locks can be released but no new locks can be acquired.
T1 Read_lock(Y); Read_item(Y); Write_lock(X); Unlock(Y); Read_item(X); X:=X+Y; Write_iem(X); Unlock(X); T2 Read_lock(X); Read_item(X); Write_lock(Y); Unlock(X); Read_item(Y); Y:=Y+X; Write_iem(Y); Unlock(Y); Example T1 and T2 follow the two phase locking protocol
T1 Read_lock(Y); Read_item(Y); Unlock(Y); Write_lock(X); Read_item(X); X:=X+Y; Write_iem(X); Unlock(X); T2 Read_lock(X); Read_item(X); Unlock(X); Write_lock(Y); Read_item(Y); Y:=Y+X; Write_iem(Y); Unlock(Y); Example T1 and T2 do not follow the two phase locking protocol Nonserializable transactions
Variations of two phase locking • Basic 2PL • Conservative 2PL: Requires a transaction to lock all items it access before the transaction begins execution by predeclaring its readset (all items a transaction writes) and writeset (All items a transaction writes). If any of the preceding items cannot be locked the transaction waits until all items are available for locking. It is deadlock free protocol but difficult to implement. • Strict 2PL: The transaction does not release any of the write locks until after it commits or aborts. It is not deadlock free.
Deadlock • Deadlock occurs when each transaction T in a set of two or more transactions is waiting for some item that is locked by some other transaction T’ in the set. Hence, each transaction in the set is in a waiting queue waiting for one of the other transactions in the set to release the lock on an item.
Deadlock prevention protocols • E.g. Conservative 2PL protocol. • Order all items in the database and make sure that a transaction that needs several items will lock them according to that order. (it is not practical in database context) • A transaction time stamp TS(T) is a unique identifier assigned to each transaction.
Time stamp deadlock solutions Suppose that transaction Ti tries to lock an item X but is not able to because X is locked by some other transaction Tj with a conflicting lock. • Wait-die: if TS(Ti)<TS(Tj) then (Ti is older than Tj) Ti is allowed to wait; otherwise (Ti younger than Tj) abort Ti (Ti dies) and restart it later with the same timestamp. • Wound-wait: if TS(Ti)<TS(Tj) then (Ti is older than Tj) abort Tj (Ti wounds Tj) and restart it later with the same timestamp; otherwise (Ti younger than Tj) Ti is allowed to wait. • Both schemas end up aborting the younger of the two transactions that may be involved in a deadlock.
Deadlock detection and timeouts • It is a deadlock detection method. • It is better applied when the transactions are short and each transaction locks only a few items or the transaction load is light. • Construct a wait for graph. Whenever a transaction Ti is waiting to lock an item X that is currently locked by a transaction Tj a directed edge is created in the wait for graph. • Timeouts: if a transaction waits for a period longer than a system defined timeout period the system assumes that the transaction may be deadlocked and aborts it.
Starvation • Starvation occurs when a transaction cannot proceed for an indefinite period of time while other transactions in the system continue normally. • Solution: fair waiting scheme like using a first come first served queue. The wait die and wound wait schemes avoid starvation.
Granularity level considerations • The larger the data item size the lower the degree of concurrency permitted. • The smaller the data item size is the more the items in the database (large number of locks to be handled by the system lock manager - more lock and unlock operations will be permitted causing a higher overhead.)
Multiple granularity level locking • The best granularity size depends on the given transaction. A database system should support multiple levels of granularity. • Intention locks: A transaction indicates along the path from the root to the desired node what type of lock shared or exclusive it will require from one of the node’s descendants.