Understanding Concurrency Control for Improved Transaction Processing

Transaction Processing: Concurrency and Serializability 10/4/05

Interleave transactions to improve concurrency; increasing concurrency can increase throughput (performance). • Some interleaved transactions will never violate isolation because they act on different data. • Some interleaved transactions MAY violate isolation.

Concurrency control: An algorithm to (hopefully) permit good interleaving and refuse bad interleaving. • NB, Executing a concurrency control algorithm will increase overhead of the transaction manager. • This will increase response time, • and reduce throughput.

Concurrency control • Input to the algorithm are the arriving requests for database reads/writes. • The input is obtained from the various transactions. • Output is a sequence of database read/write requests. • The output is provided to the portion of the data manager actually accessing the disk.

A serial schedule has no interleaving between transactions (a transaction completes before another begins). • A schedule is correct if it is equivalent to a serial schedule.

Isolation levels • By relaxing the isolation requirement, more interleaving is possible -- at a greater risk to data integrity. • Isolation levels characterize the amount of isolation imposed.

Commuting operations Two operations, p1 and p2, commute if, for all possible initial database states, • p1 returns the same value when executed in order <p1, p2> or <p2, p1> • p2 returns the same value when executed in order <p1, p2> or <p2, p1> • The database state produced by both sequences is the same. • Note, commutativity is symmetric. • NOTE! Two operations on different data items always commute. • Note, Two operations on the same data item MAY commute.

Conflicting operations • Two operations that do not commute are conflicting operations. • E.g., S1 : <s11, s12> S1’ : <s12, s11> If they are run on the same starting state, and end up in different states, then s11 and s12 conflict. • Look at the following from the aspect of two different transactions, • A read and read on the same item always commute. • A read and a write on the same item conflict because (though the final state is the same), value returned depends on order of ops. • A write and a write on the same item conflict.

If S2 can be obtained from S1 by “swapping” commuting operations, then S1 and S2 are equivalent. • Equivalence of schedules is transitive!

Example schedules • Two interleaved transactions T1 (t11, t12), T2 (t21, t22): • S1: s11, s12, s13, s14 • Suppose s12 and s13 commute, then • S2 : s11, s13, s12, s14 Same start state Same end state

Schedule equivalence (not the same as E&N’s ‘complete schedule’ definition): • Two schedules of the same set of ops are equivalent iff conflicting operations are ordered in the same way in both schedules. • ==> A schedule S2 can be derived from a schedule S1 by interchanging commuting operations iff conflicting operations are ordered in the same way in both schedules.

Restatement of Serializable Schedule • A schedule is serializable if it is equivalent to a serial schedule • Equivalent construction: • Commute commuting operators and use transitivity of equivalence, or • Conflicting operations are in the same order in both schedules.

Try this: is S1 serializable (what commutations?), S2? S3? • T1: <r(a), r(b), b += a, w(b)> • T2: <r(a), a ++, w(a) > • S1: <r1(a), r2(a), w2(a), r1(b) w1(b) > • S2: <r1(a), r1(b), r2(a), w1(b), w2(a)> • S3: <r2(a), r1(a), w2(a), r1(b), w1(b)>

Try this: is S4 serializable? • S4: <r1(a), r2(b), w2(a), w1(b)>

More on schedule equivalence • The preceding definition of equivalence (by commuting, AKA by maintaining order of conflicting ops) is called conflict equivalence. • A different kind of equivalence is view equivalence, two schedules of the same set of ops are view equivalent if both the following are true: • Corresponding read ops in each schedule return the same values, • Both schedules yield the same final state.

View equivalence • If corresponding read ops in both schedules return the same values, then the transactions perform the same calculations and write the same results! • I.e., transactions in both schedules have the same view of the database. • Conflict equivalence implies view equivalence • View equivalence does not imply conflict equivalence. • I.e., Conflict equivalence is the stronger; but it turns out that conflict equivalence is easier to use for concurrency control.

Serialization graphs • A schedule, S, is represented as a directed graph. • Nodes are (committed) transactions. • Edge between Ti and Tj (Ti -> Tj) if: • Some op in Ti, pi, conflicts with some op, pj, in Tj, and • pi appears before pj in S.

Example • S1: <r1(a), r2(a), w2(a), r1(b) w2(b)> T2 writes a after T1 reads a. The ops do not commute: r1(a), w2(a) Graph of S1: T1 T2

A schedule is conflict serializable iff its serialization graph is acyclic. T2 T4 T1 T3 T5 T6 T7 Topological sorts give conflict equivalent serial schedules, e.g.: T1, T3, T5, T2, T6, T7, T4. Others?

In class • Using concurrent transactions, deposit to a, withdraw from a, make a (non-serial) schedule: • Give the serialization graph • Is it acyclic? If so, give a conflict equivalent serial schedule. • Identify commuting operations. • Identify conflicting operations. • Using the concurrent deposit, transfer and withdraw transactions (deposit to a, withdraw from b, transfer takes from b and puts in a), make a (non-serial) schedule: • Give the serialization graph • Is it acyclic? Is there a serial schedule? • How many total pairs of operations are there? • Identify, at least some, commuting operations. • Identify, at least some, conflicting operations.

A strict concurrency control • A transaction is not allowed to read or write data that has been written by another still active transaction. (Recoverability topic later). • Conflict avoidance: • If operation requests by T1 and T2 do not conflict, they are granted. • Requests don’t conflict if either: • Requests are to different data items, OR • Requests are both reads.

In class • Make the conflict table for the previous algorithm:(put X for conflicting requests) Granted op: Requested op: read write read write

But if you make a transaction wait … DEADLOCK (a cycle of k transactions waiting for each other)

Dealing with deadlock • Prevention: maintain a data structure that checks whether deadlock may result. If so, some transaction involved in the deadlock must be aborted. • Timeout: if time to execute exceeds a threshold, force an abort. • Timestamp:Timestamp start of each transaction. Use timestamp to implement a conflict resolution policy: • Older transaction never waits for younger (e.g., by aborting younger, even though younger has been waiting a long time), • Younger transaction can only wait for an older (place younger on wait-list)

Manual locking: an alternative to AUTOMATIC locking • A transaction explicitly requests concurrency control to grant a lock on a data item, then makes the read/write request. • Concurrency control grants (or refuses) locks.

UNLOCKING • Can be automatic -- when a transaction terminates, all locks held by it are released. • Can be manual -- transaction explicitly releases a lock.

Two phase locking: 2PL • A transaction maintains 2PL protocol if it obtains all of its locks before making any unlocks … • lock phase, followed by unlock phase • Automatic locking is 2PL. • Automatic unlocking is 2PL. • 2PL protocol produces serializable schedules.

For next time, we’ll discuss the paper in the RedBook: “Granularity of Locks …” • How are the different lock modes used? • What are the degrees of consistency? • How does the locking protocol relate to degrees of consistency. • What are the overhead costs of the different locking protocols?

Understanding Concurrency Control for Improved Transaction Processing