Advanced Database Course (ESED5204)

University of Palestine Software Engineering Department Advanced Database Course (ESED5204) Eng. Hanan Alyazji

Concurrency Control

Concurrency Control • Problem : in a multi-user environment, simultaneous access to data can result in interference and data loss. • Solution : Concurrency Control • Multiple users access databases and use computer systems simultaneously. Example: In concurrent execution environment if T1 conflicts with T2 over a data item A, then the existing concurrency control decides if T1 or T2 should get the A and if the other transaction is rolled-back or waits.

What is Concurrent Process? • The process of managing simultaneous operations against a database so that data integrity is maintained and the operations do not interfere with each other in a multi-user environment. • Concurrency Control process: Ensure serializability of transactions in multiuser database environment. Example: • Airline reservation system. • An airline reservation system is used by hundreds of travel agents and reservation clerks concurrently.

Why Concurrent Process? • Better transaction throughput and response time. • Better utilization of resource. BUT simultaneous execution of transactions creates data integrity & consistency problems: • Lost updates • Uncommitted data • Inconsistent retrievals

1. Lost Update Problem • Two transactions accessing the same database item have their operations interleaved in a way that makes the database item incorrect. Transaction A Transaction B 1 Read data B writes over the same data written by A, without taking into account the changes made by A 1 Read data 2 Write data Write data 2

Example: T2 T1 read_item (X) X = X + M write_item (X) read_item ( X ) X = X – N write_item (X ) t i m e • Aftertermination of T2, X = X + M. T1's update to X is lost because T2 wrote over X. Generally, lost update problem is characterized by: • T2 reads X, T1 writes X, and T2 writes X.

Example: X=4 Y=8 N=2 M=3 • item X has incorrect value because its update from T1 is “lost” (overwritten) • T2 reads the value of X before T1 changes it in the database and hence the updated database value resulting from T1 is lost.

2. Uncommitted Read Problem: • Data are not committed when 2 transactions T1 & T2 are executed concurrently and the first transaction is rolled back after second transaction has already accessed uncommitted data. • An uncommitted read problem occurs when a first transaction relies on a change which has not yet been committed, which is rolled back after the second transaction has begun. • An Uncommitted Readis also called “Dirty Read”.

Dirty read • One transaction updates a database item and then the transaction fails. The updated item is accessed by another transaction before it is changed back to its original value. • Dirty Read is data written by an uncommitted transaction. Transaction A Transaction A Transaction B Transaction B 1 1 Write data Write data An Uncommitted Read (also called a Dirty Read) happens when B reads uncommitted data modified by A 1 1 Read data Read data 2 2 2 2

Example: T2 T1 read_item (X) X = X + M write_item (X) read_item ( X ) X = X – N write_item (X ) T1 fails t i m e • T2 has already read item X = X - N value, and that value is going to be altered by DBMS back to X. Generally, dirty read problem is characterized by: • T1 writes X, T2 reads X, and T1 fails.

3. Inconsistent retrievals Problem: • Inconsistent retrievals or unrepeatable read Occurs when transaction reads several values but second transaction updates some of them during execution of first. • A transaction, if it reads the same data item more than once, should always read the same value. • It occurs when a transaction T1 reads a database item A, another transaction T2 updates A and commits, and then T1 reads the altered A again.

Example: T2 T1 read_item (X) X = X + M write_item (X) read_item ( X ) read_item (X ) t i m e • Transaction T1has got two different values of X in two subsequent reads, because T2has changed it in the meantime. Generally, unrepeatable read problem is characterized by: • T1 reads X, T2 writes X, and T1 reads X.

Important Note: Having two transactions T1 & T2 if: • T2 reads X, T1 writes X, and T2 writes  Lost update problem • T1 writes X, T2 reads X, and T1 fails  dirty readproblem • T1 reads X, T2 writes X, and T1 reads X  unrepeatable readproblem

A Question for You …… • What is the difference between: Dirty read and Unrepeatable read problem? • Answers: • The difference is: • The dirty read is a consequence of reading updates made by a transaction before it has successfully finished (and has even failed later). • The unrepeatable read is a consequence of allowing a transaction to read data that the other one is altering.

Concurrency Control Techniques • Two basic concurrency control techniques: • Locking • Timestamping • Both are conservative approaches: delay transactions in case they conflict with other transactions.

Locking • A lock is a variable associated with a data item that describes the status of the item with respect to possible operations that can be applied to it. • Locking is an operation which secures a permission to Read or a permission to Write a data item for a transaction. • Example:Lock (X): Data item X is locked in behalf of the requesting transaction. • Unlocking is an operation which removes these permissions from the data item. • Example:Unlock (X): Data item X is made available to all other transactions.

Example: T1 read_lock (Y) read_item (Y); unlock (Y); write_lock (X); read_item (X); X:=X+Y; write_item (X); unlock (X);

Locking • A transaction locks a portion of the database to prevent concurrency problems. • Before reading from a resource a transaction must acquire a read-lock and Before writing to a resource a transaction must acquire a write-lock. • Two locks modes: • Shared mode:shared lock (X). More than one transaction can apply share lock on X for reading its value but no write lock can be applied on X by any other transaction. • Exclusive mode:Write lock (X). Only one write lock on X can exist at any time and no shared lock can be applied by any other transaction on X.

Locking - Basic Rules • If transaction has read lock on an item, the item can be read but not updated. • If transaction has write lock on an item, the item can be both read and updated. • Reads cannot conflict, so multiple transactions can hold read locks simultaneously on the same item. • Write lock gives one transaction exclusive access to an item. • Need to commit a transaction before a write lock is released.

Database requires that all transactions should be well-formed. A transaction is well-formed if: • It must lock the data item before it reads or writes to it. • It must unlock the data item after it is done with it. • It must not lock an already locked data item. • It must not try to unlock a free data item.

Two-Phase Locking (2PL) • Two-Phase Locking occurs if all locking operations precede first unlock operation in the transaction. • All lock operations must precede the first unlock operation. Theorem: If all transactions in a schedule obey locking rules and two phase locking protocol, the schedule is a conflict serializable one.

Example: T1 T2 read-lock(X) read-lock(X) Read(X) Read(X) write-lock(Y) unlock(X) unlock(X) write-lock(Y) Read(Y) Read(Y) Y = Y + X Y = Y + X Write(Y) Write(Y) unlock(Y) unlock(Y) • T1 follows 2PL protocol • All of its locks are acquired before it releases any of them • T2 does not • It releases its lock on X and then goes on to later acquire a lock o Y

Exercise:Which transaction follow the 2PL protocol?? T3 read_lock (Y); read_item (Y); write_lock (X); unlock(Y); read_item (X); X:=X+Y; write_item (X); unlock (X); T1 read_lock (Y) read_item (Y); unlock (Y); write_lock (X); read_item (X); X:=X+Y; write_item (X); unlock (X); T2 read_lock (X); read_item (X); unlock (X); Write_lock (Y); read_item (Y); Y:=X+Y; write_item (Y); unlock (Y); T4 read_lock (X); read_item (X); Write_lock (Y); unlock(X) read_item (Y); Y:=X+Y; write_item (Y); unlock (Y)

Lost Update and Two Phase Locking T2 T1 write_lock(X ) //has to wait write_lock(X ) read_item (X) X = X + M write_item(X ) unlock(X ) read_lock(X ) read_item ( X ) X = X – N write_lock(X ) write_item (X ) unlock(X ) T2 can not obtain a write_lock on X since T1 holds a read lock on X and it has to wait. When T1 releases lock on X, T2 acquires a lock on X and finishes successfully. t i m e Two phase locking provides for a safe conflict serializable schedule.

Two Phase Locking: Dirty Read T2 T1 write_lock(X ) read_item ( X ) X = X – N write_item (X ) write_lock(Y ) unlock(X ) read_item ( Y ) Y = Y + Q write_item (Y ) unlock(Y ) //T1 fails before it commits write_lock(X ) read_item X X = X + M write_item(X ) unlock(X ) If T1 gets exclusive lock on X first, T2 has to wait until T1 unlocks X Two phase locking alone does not solve the dirty read problem, because T2 is allowed to read uncommitted database item X t i m e Two phase locking alone does not solve the dirty read problem.

T1 T2 Deadlocks • 2PL can lead to deadlocks • Different transactions wait for each other to release locks. • Only one way to break deadlock: abort one or more of the transactions. • Represent the waiting relationship as waiting graph • Directed edge from Ti to Tj if Ti waits for Tj T1:T2: xlock (x) write (x) xlock (y) write (y) xlock (y) xlock (x) waiting graph

T1 T2 2. Deadlock Detection • Alternative is to allow deadlocks to happen but to check for them and fix them if found. • Create a waits-for graph: • Nodes are transactions • There is an edge from Ti to Tj if Ti is waiting for Tj to release a lock. • Periodically check for cycles in the waits-for graph.

Deadlock Detection: Example: T1: S(A), S(D), S(B) T2: X(B) X(C) T3: S(D), S(C), X(A) T4: X(B) T1 T2 T4 T3

Deadlock • Two ways of dealing with deadlocks: • Deadlock prevention. • Deadlock detection. 1.Deadlock Prevention: • Assign priorities based on timestamps. Assume Ti wants a lock that Tj holds. Two policies are possible: • Wait-Die: If Ti has higher priority, Ti waits for Tj; otherwise Ti aborts • Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits

wait? Wait-Die Algorithm: • Transactions given a timestamp when they arrive …. ts(Ti) • Ti can only wait for Tj if ts(Ti) < ts(Tj) ...else die Example1: T1 (ts =10) T2 (ts =20) T3 (ts =25) wait wait

Example 2: T1 (ts =22) T2 (ts =20) T3 (ts =25) requests A: wait for T2 or T3? Note: ts between 20 and 25. wait(A)

One option: T1 waits just for T3, transaction holding lock. But when T2 gets lock, T1 will have to die! T1 (ts =22) T2 (ts =20) T3 (ts =25) wait(A) wait(A)

Another option: T1 only gets A lock after T2, T3 complete, so T1 waits for both T2, T3  T1 dies right away! T1 (ts =22) T2 (ts =20) T3 (ts =25) wait(A) wait(A) wait(A)

Yet another option: T1 preempts T2, so T1 only waits for T3; T2 then waits for T3 and T1  T2 dies T1 (ts =22) T2 (ts =20) T3 (ts =25) wait(A) wait(A) wait(A)

wait Wound-wait Algorithm: • Transactions given a timestamp when they arrive … ts(Ti) • Ti wounds Tj if ts(Ti) < ts(Tj) else Ti waits “Wound”: Tj rolls back and gives lock to Ti Example1: T1 (ts =25) T2 (ts =20) T3 (ts =10) wait wait

Example2: T1 (ts =15) T2 (ts =20) T3 (ts =10) requests A: wait for T2 or T3? Note: ts between 10 and 20. wait(A)

One option: T1 waits just for T3, transaction holding lock. But when T2 gets lock, T1 waits for T2 and wounds T2. T1 (ts =15) T2 (ts =20) T3 (ts =10) wait(A) wait(A)

Another option: T1 only gets A lock after T2, T3 complete, so T1 waits for both T2, T3  T2 wounded right away! T1 (ts =15) T2 (ts =20) T3 (ts =10) wait(A) wait(A) wait(A)

Yet another option: T1 preempts T2, so T1 only waits for T3; T2 then waits for T3 and T1...  T2 is spared! T1 (ts =15) T2 (ts =20) T3 (ts =10) wait(A) wait(A) wait(A)

Timestamping • A unique identifier created by DBMS that indicates relative starting time of a transaction. • Transactions with smaller timestamps, get priority in the event of conflict. • A larger timestamp value indicates a younger transaction. • Timestamp based algorithm uses timestamp to serialize the execution of concurrent transactions. • Conflict is resolved by rolling back and restarting transaction. • No locks so no deadlock.

Timestamping • In order to use timestamp values for serializable scheduling of transactions, the transaction manager of a DBMS associates with each database item X two timestamp (TS) values: • Read_TS(X): The timestamp (identifier) of the youngest transaction that has read X successfully. • Write_TS(X): The timestamp (identifier) of the youngest transaction that has written X successfully. Note: If Ti starts before Tj , then TS (Ti ) < TS (Tj) (*Ti is older than Tj)

Basic timestamp ordering • When a transaction tries to read a value • Is there a younger transaction that has already written the data item? • True => Abort (older) transaction • False => Continue and update write timestamp • When a transaction tries to write a data item • Is there a younger transaction that has already read or written the same data? • True => Abort the (older) transaction • False => Continue and update read timestamp

Example 1: • When a transaction tries to read a value • Is there a younger transaction that has already written the data item? • True => Abort (older) transaction • False => Continue No problem, T2 has only read x Transactions can both commit, no concurrency problem.

When a transaction tries to write a data item • Is there a younger transaction that has already read or written the same data? • True => Abort the (older) transaction • False => Continue Example 2: No problem, T2 has only read x Problem, T2 has written x T1 will be rolled back and started again with a new timestamp

Questions? ?

Advanced Database Course (ESED5204)

Advanced Database Course (ESED5204)

Presentation Transcript

Database Course

Database Course

INF5100 Advanced Database Systems

Course: Database Design

Advanced Database Systems

Advanced Database Topics

Advanced Database Topics

Advanced Database Discussion

Advanced Database Systems

Advanced Course

Advanced Database Systems

Advanced Database Systems

Advanced Database Systems

ADVANCED DATABASE CONCEPTS

Advanced Database Systems

Advanced Database Course (ESED5204)

Advanced Database Systems

CM036: Advanced Database

Advanced Database Systems

CISB514: ADVANCED DATABASE

Advanced Database Discussion

Advanced Database Systems