Concurrency Control

Concurrency Control

Concurrency Control Techniques • Protocols that guarantee serializability 1. Locking 2. Timestamps 3. Multiversion 4. Optimistic - validation or certification

Locking

Serlializable? • R1(X) R4(X) R2(Y) W4(X) R2(X) W2(Y) C1 C4 C2 • R1(Y) R2(X) W2(Y) C2 R1(X) W1(X) C1

Locking • Locking - main technique to control concurrent execution • execution based on concept of locking data items • locks granted to a specific transaction for a particular data item • a lock forces mutual exclusion on data items • lock manager subsystem to keep track of and control access to locks

Binary locks • binary lock - 2 states or values • locked, unlocked • distinct lock for each data item • Suppose: • transaction T must issue a lock request before R, W • issue unlock after R, W • Too restrictive, what if want transactions to read at same time?

Multiple-mode lock • multiple-mode lock - 3 types, indivisible • Read lock - share lock • Write lock - exclusive lock • Unlock - unlock a lock needs 3 fields: <data, lock(R,W), # of reads>

Locking Rules for multiple-mode (theoretical) Rules: 1. T must issue request for R or W lock before any R(X) 2. T must issue request for W lock before any W(X) 3. T must issue Unlock(X) after R or W completed OR once commit assume all remaining locks unlocked 4. If T issues request for W lock(X) and holds R lock on X, must upgrade R lock(X) to W lock(X) 5. If issue R lock(X) and hold W lock on X, must downgrade W lock(X) to R lock(X) 6. T will not issue Unlock(X) unless hold R lock(X) or W lock(X)

Dirty Write example R1(Y) R2(X) W2(Y) C2 R1(X) W1(X) C1 Will locking rules prevent dirty write? T1 granted request R lock on Y, R1(Y), then Unlock T2 granted request R lock on X, R2(X), then Unlock T2 granted request W lock on Y, W2(Y), then Unlock C2 T1 granted request R lock on X, R1(X) T1 granted upgrade to W lock on X, W1(X), then Unlock C1 • Does not guarantee serializability • need 2PL

Basic Two-Phase Locking 2PL • 2PL has a: growing phase - locks acquired shrinking phase - locks released – cannot request new lock during this phase • Advantage • Guarantees serializability • Disadvantage • 2PL limits concurrency

Requesting/Releasing locks • Can upgrade R lock to W lock - must be done in growing phase • Can downgrade W lock to R lock - must be done in shrinking phase • If R(X) then W(X),we will assume request lock as W lock(X) for the R(X) • Theoretically, can release locks whenever done with item, as long as don't request new locks on any other data items • We assume will release when transaction commits

Lock conflicts • If lock conflict, force requesting transaction to block (Bi) wait for transaction holding lock, proceed with other transactions W1(Y) R2(Y) C1 C2 T1 Requests W_lock on Y W1(Y) T2 Requests R_lock on Y and B2 on R2(Y) C1 Unlocks R2(Y) C2

Lost update problem R1(A) R2(A) W1(A) C1 W2(A) C2 R1(A) (B2 on R2(A)) W1(A) C1 R2(A) W2(A) C2 T2 blocked, eventually both commit – no lost update

Example R1(X)R2(X)W1(Y)W2(Z) • Should this block? • NO!! R1(X) R2(X) are not conflicting locks, make sure you understand this

Any problems with Basic 2PL? R1(Y) R2(X) W2(Y) C2 W1(X) C1 R1(Y) R2(X) (B2 on W2(Y)) (B1 on W1(X)) T2 blocked then T1 blocked no lost update, but DEADLOCK Basic 2PL • enforces serializability but deadlocks

Types of 2PL • Basic 2PL • Request locks as needed, release at end of growing phase • Strict 2PL • does not release any write locks until commits or aborts • Not deadlock free (But easier to recover from) • Strong: does not release any read locks until commits or aborts 2PL is most common type of concurrency control in commercial systems, but it is implemented as follows: • Strong Strict 2PL (SS2PL) a.k.a. rigorous 2PL Not really two-phases, only one-phase …

Strategies for deadlock in 2PL 1. Conservative 2PL – not popular • requires transactions to lock all data items before begin executing • prevents deadlock • Declare readset, writeset • or request data items in order

Strategies for deadlock in 2PL 2. No waiting - if Ti unable to obtain a lock, abort/restart after a time delay. T's can abort and restart needlessly. 3. Cautious waiting - if Ti tried to lock X and X is already locked by Tj and if Tj is not blocked: Ti waits else Tj aborts Deadlock free - total ordering of blocking times 4. Timeouts - if Ti waits > threshold, Ti is aborted

Strategies cont’d 5. Deadlock detection - useful if T's rarely access the same items and each T only locks a few items • construct a waits for graph (maintained by lock scheduler) • deadlock when cycle in graph • Problems: • victim selection - which to abort • Livelock - if repeatedly choose same victim to abort/restart • if wait indefinite period of time, need a fair waiting scheme – FCFS

Strategies cont’d 6. Timestamps to prevent deadlocks • Transactions can be assigned timestamps, TS(Ti) if T1 starts before T2, TS(T1) < TS(T2) if T1 starts after T2, TS(T1) > TS(T2) • older Ti has a smaller TS • Timestamp can be: a counter, or the current value of the system clock • wait-die or wound-wait strategies use TSs • Can you think of a strategy using TSs?

Wait-die Suppose Tj tries to lock X, and a CONFLICTING lock is already held by Ti Wait-die: (aborts Transaction requesting lock) if TS(Tj) < TS(Ti) // Tj is older then Tj waits else // Tj is younger Tj aborts and restart with same timestamp R1(Y) R2(X) W2(Y) C2 R1(X) W1(X) C1where TS(T1)<TS(T2) R1(Y) R2(X) (A2 on W2(Y)) R1(X) W1(X) C1 R2(X) W2(Y) C2

Wound-wait Suppose Tj tries to lock X, and a CONFLICTING lock is already held by Ti Wound-wait: (aborts Transaction holding lock) if TS(Tj) < TS(Ti) // Tj is older then Ti aborts and restart with same timestamp else // Tj is younger Tj waits R1(Y) R2(X) W2(Y) C2 R1(X) W1(X) C1 where TS(T1)<TS(T2) R1(Y) R2(X) (B2 on W2(Y)) (A2 on R1(X)) R1(X) W1(X) C1 R2(X) W2(Y) C2

Comparisons • Wait-die - older waits on younger, else younger aborts itself and restarted • favors younger lock holder • Wound-wait - younger waits on older, else older preempts younger lock holder by abort – • favors older requester • T's aborted and restarted even if not deadlocked. • Wait-die: can abort Tj and restart many times in a row • Wound-wait: can be aborted even if obtain all of its locks (not true for wait-die, lock holder not aborted)

Strategies Useful? • Google using Wound-Wait in new semi-relational DB • Classic concurrency control for Real-time Transactions • Assume: T1 holds lock, T2 wants it, dl is deadline If T1(dl) < T2(dl) // T1 earlier deadline T2 waits else abort T1 T2 gets lock // T2 has earlier deadline

Problems with serializability • Scheduling that guarantees perfect serializability can be intrusive on performance • Too many transaction in wait state • If increase number of threads, can reduce the number of transactions active • CPU never fully utilized

Alternatives to serializability • Weakened forms of 2PL locking in SQL levels of isolation • Used instead of degrees of isolation • Can set the isolation level with set transaction statement (can specify R only, W only) 1) read uncommitted 2) read committed – Default Oracle 3) repeatable read 4) serializable

Isolation levels • Lock types used to implement • Short-term lock • guarantees R, W, is atomic • long-term lock • held until Transaction commits or aborts

Read uncommitted • Read uncommitted (for read only Ts) – • no long-term locks used • allow for Read only operations • no dirty writes (since only read) • but dirty reads can occur

Read committed • Read committed – (no dirty reads) • W lock long term, R lock short term • Can only read data that has been written by committed transactions • Unrepeatable reads can occur • Lost update can still occur R1(A) R2(A) W2(A) W1(A) C2 C1 // will not allow R1(A) R2(A) W2(A) C2 W1(A) C1 // may allow

Repeatable read • Repeatable read – • W lock, R lock long term • Repeatable reads, no lost update • But, predicate locking is not guaranteed • Predicate locking – lock only rows that satisfy specified condition (e.g. major =‘CS’) • therefore can have phantom updates due to inserting new rows • e.g. if branch totals in branch table, and insert while computing total

Serializability • Serializable requires R, W lock long term on all data satisfying predicate • How? • lock entire table

Granularity Hierarchy • How to accommodate different granularities of locks by the lock manager • If only a few data items from a table are needed, how to indicate they are locked • Use a tree • Allow data items to be of various sizes • Used with 2PL to guarantee serializability

Tree • Multiple levels of nodes • Highest node is entire DB • Non-leaf node as data associated with descendants • Each node can be locked individually • If lock a node, all ancestors are also locked in appropriate mode • Locks are: • Shared or exclusive • Intention or explicit mode

How to determine if a node at a lower level is locked without searching entire tree? • If a node at a lower level must be explicitly locked, then all ancestor nodes are intention locked as traverse tree • S shared, IS intention-shared • X exclusive, IX intention exclusive • SIX shared with intention exclusive • Strategy most useful for short transactions with few data items and long transactions produce reports form file

SIX • What is SIX? • Subtree rooted in that node is S and then X at lower level • The lock owner can read and change data in the table, partition, or table space. Concurrent processes can read data in the table, partition, or table space, but not change it. Only when the lock owner changes data does it acquire page or row locks. • Does this mean I can share everything except what I want to write to?

Locking protocol • Top down lock: • Use compatibility matrix on next page • T must lock root first (in any mode) • T can lock Q in S or IS only if T has parent of Q in IX or IS mode • T can lock Q in X, SIX, or IX only if T has parent of Q locked in IX or SIX mode • Bottom up unlock: • T can lock node only if not previously unlocked any node • T can unlock Q only if T has no children of Q locked

If a transaction holds a lock in one mode, a requester can be granted a lock in the specified mode

Serializability • Do we always have to use locking to ensure serializability?

Timestamp Ordering • No - Timestamp Ordering • concurrency control techniques based on timestampe (TS) - do not use locks • Can deadlock occur?

Timestamp Ordering

Timestamps • Use timestamp ordering (TO) • in 2PL schedule, serializable by being equivalent to some serial schedule allowed by locking protocols • In TO schedule, serializable by being equivalent to particular order that corresponds to order of transaction TS's • This means conflicting operations must execute in order of their timestamps, e.g. R/Ws to same data item must occur in the same order as their TS

Timestamps (TO) • Basic TO algorithm: • associated with each X, 2 TS values • R_TS(X) - largest TS that has successfully read X • W_TS(X) - largest TS that has successfully written X • If T is aborted, it is restarted with LATER timestamp

TO Algorithms If T issues R(X): if W_TS(X) ≤ TS(T) R(X) and set R_TS(X) = Max(TS(T), R_TS(X)) else // W_TS(X) > TS(T) abort T If T issues W(X): if R_TS(X) > TS(T) or W_TS(X) > TS(T) abort T else W(X) and set W_TS(X) = TS(T) R1(X)R2(X)W1(X)W2(X)C1C2 Assume TS(T1)=1 and TS(T2)=2

Lost Update Example - TO R1(X)R2(X)W1(X)W2(X) Assume TS(T1)=1 and TS(T2)=2 X R_TS W_TS 0 0 1 0 R1(X) 2 0 R2(X) W1(X), abort restart with T1 = 3 2 2 W2(X) C2 3 2 R1(X) 3 3 W1(X) C1 Serializable? Try with TO R1(X)R4(X)R2(Y)R2(X)W4(X)W2(Y) R1(X)R4(X)R2(Y)W4(X)R2(X)W2(Y)

TO vs. 2PL • TO and 2PL guarantee serializability – • Neither allows non-serializable schedules but • Some schedules possible under each, not allowed under the other • Note • If T is aborted (undo, rolled back), any value written by T also must be rolled back (Can have cascading rollback) • Schedules produced are not recoverable, does not ensure recoverable and cascade- less or strict schedules

Multiversion Concurrency Control a.k.a Multiversion Timestamp Ordering

Multiversion Concurrency Control (MVCC) • Multiversion Concurrency Control • Oracle uses multiversions to enforce levels of isolation • Useful for mobile, temporal and real-time DBs • Keep old values when item is updated - several versions maintained • When operation accesses item, appropriate version chosen to ensure serializability • Read older version of item instead of abort • Write new version, keep old one • View serializability not conflict serializability is ensured

Multiversion • Disadvantage - more storage • however, may keep older versions for recovery anyway • Google keeps multiple versions for semi-relational DB

MVCC Algorithm If T issues: R(X) find version i of X with largest W_TS s.t. W_TS(Xi) ≤ TS(T) set R_TS(Xi) = max(TS(T), R_TS(Xi)) If T issues: W(X) find version i of X with largest W_TS s.t. W_TS (Xi) ≤ TS(T) if TS(T) >= R_TS(Xi) create new version Xj with R_TS(Xj) = W_TS(Xj) = TS(T) else abort // TS(T) < R_TS(Xi) so must abort R1(X)R4(X)R2(Y) R2(X)W4(X)W2(Y)R1(X)R2(X)W1(X)W2(X) W1(X)R2(X)W3(X)W2(X)W4(X)

Concurrency Control