520 likes | 670 Views
CSIS 7102 Spring 2004 Lecture 4 : Issues in locking techniques. Dr. King-Ip Lin. Table of contents. Implementation of locks The phantom problem Multiple granularity locks Isolation levels. The story so far. Two-phase locking (2PL) as a protocol to ensure conflict serializability
E N D
CSIS 7102 Spring 2004Lecture 4 : Issues in locking techniques Dr. King-Ip Lin
Table of contents • Implementation of locks • The phantom problem • Multiple granularity locks • Isolation levels
The story so far • Two-phase locking (2PL) as a protocol to ensure conflict serializability • Once a transaction start releasing locks, cannot obtain new locks • Ensure that the conflict cannot go both direction • 2PL does not guarantee recoverability • Strict 2PL ensure that • Rigorous 2PL ensure strict schedule • Deadlock handling in locking • Detection • Time-out • Cycles in wait-for graph • Avoidance • Wait-die vs. wound-wait • Prevention • Conservation 2PL • Ordering of resources
Locks -- implementation • Various support need to implement locking • OS support – lock(X) must be an atomic operation in the OS level • i.e. support for critical sections • Implementation of read(X)/write(X) – automatically add code for locking • Lock manager – module to handle and keep track of locks
Locks -- implementation • A transaction Ti issues the standard read/write instruction, without explicit locking calls. • The operation read(D) is processed as: if Ti has a lock on D then read(D) else begin if necessary wait until no other transaction has a X-lock on D (or apply any deadlock avoidance rules) grant Ti a S-lock on D; read(D) end
Locks -- implementation • write(D) is processed as: if Ti has a X-lock on D then write(D) else begin if necessary wait until no other trans. has any lock on D (or apply any deadlock avoidance rules), if Ti has a S-lock on D then upgrade lock on D to X-lock else grant Ti a X-lock on D write(D) end; • All locks are released after commit or abort
Locks – Lock manager • A Lock manager can be implemented as a separate process to which transactions send lock and unlock requests • The lock manager replies to a lock request by sending a lock grant messages (or a message asking the transaction to roll back, in case of a deadlock) • The requesting transaction waits until its request is answered • The lock manager maintains a data structure called a lock table to record granted locks and pending requests • The lock table is usually implemented as an in-memory hash table indexed on the name of the data item being locked
Locks – lock manager • Black rectangles indicate granted locks, white ones indicate waiting requests • Lock table also records the type of lock granted or requested • New request is added to the end of the queue of requests for the data item, and granted if it is compatible with all earlier locks • Unlock requests result in the request being deleted, and later requests are checked to see if they can now be granted • If transaction aborts, all waiting or granted requests of the transaction are deleted • lock manager may keep a list of locks held by each transaction, to implement this efficiently
Insertion & Deletion • Does delete(X) conflict with read(X)/write(X)? • Yes. Wrong order leads to errors • Thus delete(X) should be treated like a write operation • Request X-locks • Similar to insert(X) operation. • X-lock is given to the newly created tuple.
The phantom menace • Consider the following 2 transactions • There does not seems to be a conflict (in terms of tuple) • Assume initially CS faculty have total salary 1,000,000 and Math faculty have total salary 2,000,000 • Then T1 -> T2 will imply the select statements return 1,000,000 and 2,000,000 • T2 -> T1 will imply the select statements return 1,001,000 and 2,005,000 • Select sum(salary) • From faculty • Where dept = “CS” • 2. Select sum(salary) • From faculty • Where dept = “Math” a. Insert into faculty values (“Lin”, “CS”, 1000) b. Insert into faculty values (“Lam”, “Math”, 5000) T2 T1
The phantom menace • But consider the following schedule • The output will be 1,000,100 and 2,000,000 • Not conflict serializable! a. Insert into faculty values (“Lin”, “CS”, 1000) b. Insert into faculty values (“Lam”, “Math”, 5000) • Select sum(salary) • From faculty • Where dept = “CS” • 2. Select sum(salary) • From faculty • Where dept = “Math” T1 T2
The phantom problem • This is known as the phantom problem • Why does it occur? • No tuples are in conflict • However, conflict occurs for tuples that satisfy a certain condition (dept = “CS”, dept = “Math”) • T1 require access for ALL tuples satisfying the condition • However, T2 changes the number of tuples satisfying the condition • No quick solution: index-locking as a possibility (later)
Multiple granularity locking • Another implementation issue: what to lock? • Options • Whole database • Table • Pages (assume no page holds tuples from multiple tables) • Tuples • Part of a tuple (e.g. single attributes) • Pros and cons?
Multiple granularity locking • Example 1: select * from person where age > 30 • How many tuples need to be examined? (suppose we do not have any index) • What kind of lock you need to issue? • What level do you want to lock?
Multiple granularity locking • Example 2: select * from person where id# = “A1234567” • Now suppose we have an index (say, B+-tree) on the person table • How many tuples need to be examined? • What kind of lock you need to issue? • What level do you want to lock?
Multiple granularity locking • Example 3: update person set salary = salary * 1.1 • How many tuples need to be examined? • What kind of lock you need to issue? • What level do you want to lock?
Multiple granularity locking • Example 4: update person set salary = salary * 1.1 where age > 40 • How many tuples need to be examined? (Assume no index) • What kind of lock you need to issue? • What level do you want to lock?
Multiple granularity locking • In short • Allowing only locking tuples lead to significant inefficiency in many queries • How about allowing only locking tables? • Less concurrency • Can we allow both? • How to implement that?
Multiple granularity locking • Assume we allow both • i.e. for S-lock(X), X-lock(X), X can either be a table, a page or a tuple
Multiple granularity locking • Example 1: (no index) select * from person where age > 30 • S-lock(Person) • Example 2: (with index) select * from person where id# = “A1234567” • S-lock(the tuple in Person with ssn = “A1234567”)
Multiple granularity locking • Example 3: update person set salary = salary * 1.1 • X-lock(Person) • Example 4: update person set salary = salary * 1.1 where age > 40 • S-lock(Person), X-lock(tuples with age > 40)
Multiple granularity locking • How to implement? • Entries in lock table specify which type of object it is. • To request a lock • Check if a lock is placed on the object • How to check?
Multiple granularity locking • Suppose X-lock(Person) • What do we need to check • Any lock placed on the Person table • Check if there is an entry of Person table on the lock table • Is that enough? • What if a lock placed on a tuple of Person? • However, if checking the lock table for tuples in Person table inefficient
Multiple granularity locking • Suppose X-lock(Person) • Alternative solution: • When placing a X-lock on tuple of Person, place a X-lock on the Person table also • Defeat the purpose of multiple granularity! • What can we do?
Multiple granularity locking • Goal: • Ensure correctness: • Example: if a tuple in Table A is X-locked, then no other transaction can S-lock/X-lock Table A • Ensure concurrency: • Example: if a tuple in Table A is X-locked, then a transaction should be allowed to X-lock another tuple in Table A, provided no transaction S-lock/X-lock the whole table. • Ensure efficiency: • Example: one should avoid scanning the whole lock table to determine whether locking is allowed
Multiple granularity locking • Solution: • Correctness : hierarchy • The resource are organized in a hierarchical fashion • E.g. DB = database, A = Table, F = pages, r = tuple • Locking must be done from top to bottom • Unlocking must be done from bottom to top • Locking any node (S, X)-lock implies locking every node below with the same lock
Multiple granularity locking • Solution: • Concurrency: Intention locks (I-locks) • Not locking the object, but declare intension to lock part of the object below • Three types • IS lock: intention to S-lock portion of the table below • IX lock: intention to X-lock portion of the table below • SIX lock: S-lock the object with an intention to X-lock part of the object below
Multiple granularity locking • Solution: • Efficiency: separate lock table for each type • Lock table for table, page, tuple etc. • With the intention lock, one only need to check one entry for each table
Multiple granularity locking • Example: S-Lock a tuple in Person table • IS-lock the whole database • IS-lock the Person table • S-lock the required tuple • Example: X-lock a tuple in Person table • IX-lock the whole database • IX-lock the Person table • X-lock the required tuple
Multiple granularity locking • Example: X-lock the whole Person table • IX-lock the whole database • X-lock the Person table
Multiple granularity locking • Revisiting earlier examples • Example 1: (no index) select * from person where age > 30 • IS-lock(Database) • S-lock(Person) • Example 2: (with index) select * from person where id# = “A1234567” • IS-lock(Database) • IS-lock(Person) • S-lock(the tuple in Person with ssn = “A1234567”)
Multiple granularity locking • Example 3: update person set salary = salary * 1.1 • IX-lock(Database) • X-lock(Person) • Example 4: update person set salary = salary * 1.1 where age > 40 • IX-lock(Database), IS-lock(Database) • SIX-lock(Person) • X-lock(tuples with age > 40) • If the system is smart, it can try to figure out how many tuples with age > 40, then decide whether to put individual SIX-lock on the tuples on upgrade to X-lock(Person)
Multiple granularity locking • Lock compatibility table T2 holds T1 Request
Multiple granularity locking • Lock compatibility table T2 holds T1 Request
Multiple granularity locking • Transaction Ti can lock a node Q, using the following rules: 1. The lock compatibility matrix must be observed. 2. The root of the tree must be locked first, and may be locked in any mode. 3. A node Q can be locked by Ti in S or IS mode only if the parent of Q is currently locked by Ti in either IX or IS mode. 4. A node Q can be locked by Ti in X, SIX, or IX mode only if the parent of Q is currently locked by Ti in either IX or SIX mode. 5. Ti can lock a node only if it has not previously unlocked any node (that is, Tiis two-phase). 6. Tican unlock a node Q only if none of the children of Q are currently locked by Ti.
Isolation levels • The goal of locking is to ensure serializability • To ensure serializability, we require 2PL. • No new locks are acquired once releasing locks begin • Locks acquired need to be held for a long time (long locks) • One cannot acquire a lock, done work with it, and then release it immediately • Adv: ensure serializability • Dis: less concurrency • Is serializability ( long locks/two-phase locking) necessary in all cases?
Isolation levels • Example • Consider recording temperature from a electronic thermometer • We record the temperature every minute onto a new location on the disk • We have variables which store last hour’s mean and variance; they need to be updated hourly
Isolation levels • Example • Consider the following four transaction • T0: Record the current temperature onto the disk • T1: Calculate the mean and the variance of the temperature in the past hour and store it onto the disk • T2: Read the mean from the disk and do two calculation and store the results in two locations (to be used by other transactions) • T3: Read the mean and variance and do two calculation and store the result in two locations (to be used by other transactions)
Isolation levels • T0:Record the current temperature onto the disk • It just record the latest temperature • Need S-locks? • No • Need X-locks? • Yes • Need long X-locks? • No, can release lock immediately and commit, since we are only recording “current” temperature, no need to check for any other consistency
Isolation levels • T1: Calculate the mean • It read the most recent values • It writes onto two fixed location • Need S-locks? • No. Because the values read is never going to be updated/overwritten/invalid • That means, it reads without requesting any locks • Need X-locks? • Yes • Need long X-locks? (i.e. can we do the following: X-lock(mean) Write(mean) Unlock(mean) X-lock(Variance) Write(Variance) Unlock(Variance) • We cannot allow that (i.e. we must have long X-locks). Otherwise, a transaction may read mean and variance there and the values are not consistent with one another
Isolation levels • T2:Read the mean and do two calculation and store the results in two locations • It need to read a valid value for the mean • Write the calculated results into two separate locations • Need S-locks? • Yes. If we read without obtaining a lock, it may read a value that is written by a T2 which may abort, thus reading an invalid value of the mean • Need long S-locks? • No. It only read the mean once, and once it obtains a lock, the mean should be a valid one. • Need long X-locks? • Yes, similar argument as before
T3:Read the mean and variance and do two calculation and store the results in two locations It need to read a valid value for the mean and variance Write the calculated results into two separate locations Need S-locks? Yes. Similar argument as before Need long S-locks? Yes. Otherwise, the following schedule may happen Inconsistent value of mean and variance for T4 Need long X-locks? Yes, similar argument Isolation levels S-lock(mean) Read(mean) Unlock(mean) S-lock(variance) Read(variance) Unlock(variance) Read(readings) X-lock(mean) Write(mean) Unlock(mean) X-lock(variance) Write(variance) Unlock(variance) T2 T4
Isolation levels • Thus we see that in many cases, 2PL is not necessary • We need to decide when it is necessary, and when it is not
Isolation levels • T1 : Calculate the mean • No need for S-lock • i.e. it can read data that is uncommitted • But in real life such won’t happen anyway • Because it reads values written by T0, which is always valid • It needs long X-lock • Ensure values written are consistent • In other words, it must ensure all writes commit together (at the end)
Isolation levels • T2: Read mean to calculate • Need S-lock • To avoid reading means that may not be valid • Because when T1 finished writing the mean, but not the variance, T1 will have an X-lock on mean, and T2 cannot obtain S-lock on mean • Thus T2 cannot read data that is uncommitted (dirty data)
Isolation levels • T3: Read mean and variance to calculate • Need long S-lock • To avoid mean and variance are both valid and consistent with one another • i.e. once T3 read the mean but before the T3 read the variance, T3 will have an S-lock on mean, so T2 cannot update the value of mean and variance, thus invalidate the value of mean read by T3 • Thus no transaction can overwrite (dirty) data that T3 has read before T3 terminates • This implies that if the transaction read the same item from the disk twice (without updating it), it is guaranteed the value is the same (repeatable read)
Isolation levels • To formalize the notion, introduce isolation level (level of consistency) • For each transaction T, 4 requirements • T does not overwrite dirty data of other transactions (level 0,1,2,3) • T does not commit any writes before end of transaction (level 1,2,3) • T does not read dirty data of other transactions (level 2,3) • Other transactions do not dirty any data read by T before T commits (level 3) • A transaction is said to be at the certain level if the corresponding requirement(s) are satisfied.
Isolation levels • To ensure the appropriate level of isolation, one need to apply the appropriate locking protocol
Isolation levels • A third way of describing isolation levels, using the ideas of schedules and dependencies: • Given two transactions T1 and T2, suppose T1 has an operation O1[X] and T2 follows with another operation O2[X], we define: • T1 <3 T2 if there is at least a write in O1 & O2 (RW, WR, WW conflict) • T1 <2 T2 if O1 = write (WR, WW conflict) • T1 <1 T2 if O1 and O2 = write (WW conflict) • Define graphs G1, G2, G3 similar to the serializability graph (but with <1, <2, <3 as edges respectively) • Theorem: a schedule observe isolation level 1 (2, 3) if the corresponding graph G1 (G2, G3) is acyclic
Isolation levels in SQL • SQL defined four isolation levels