Two Phase Locking 2PL

1. 1 Two Phase Locking (2PL) System components and Approaches of CC Two Phase Locking Protocol & variants basic, strict 2PL (dynamic) & static (conservative) 2PL Performance and implementation issues CC in distributed database systems

2. 2 Components in a Database System Single-site database system transaction manager (TM) scheduler (S) recovery manager (RM) cache manager (CM) RM + CM, called data manager (DM) TM: The interface between applications and the rest of the system Performs pre-processing Monitor the execution of transactions Receive operations from a transaction Forward operations to the scheduler Transaction termination (commit or abort)

3. 3 Components in a Database System Scheduler It controls the execution order of operations from the same or from different transactions It may delay, reject or forward an operation immediately to the DM Operations arrival orders Vs operations execution orders (Not FCFS) DM It manipulates storage (cache and disk) by providing operations to fetch data from stable storage into volatile storage and to flush data from volatile storage to stable storage RM is responsible for commit and abort of a transaction Distributed DB model Communication manager for coordination among different sites

4. 4 Components in a Database System

5. 5 Centralized Transaction Execution

6. 6 Distributed Transaction Execution

7. 7 Purposes of Concurrency Control Purposes: Concurrent execution of transactions: transactions may request to access on the same data item at the �same time� Data conflict (data contention): transaction T1 has accessed data item x. Before T1 commits, another transaction T2 wants to access x Serial execution is good for database consistency but bad for performance no data conflict Concurrency execution of transactions is good for performance but may result in inconsistent database The bad effect is permanent Rules are defined to control the interleaving in transaction execution such that all schedules are correct (serializable) and the resulting database is consistent (with low processing overhead)

8. 8 Approaches in Concurrency Control Using detection method (I.e., serialization graphs testing, optimistic or aggressive) When a transaction wants to commit, the transaction will allow to commit if the schedule is serializable. Otherwise, it is aborted (or restarted) May need to restart a large number of transactions (heavy undo and redo overheads) Require high overhead in searching the graphs Preventive method (pessimistic or conservative) Rules are defined so that all the serialization graphs are acyclic Use blocking (operations and transactions) to prevent the formation of any cyclic SG

9. 9 2 Phase Locking Similar to management of critical sections in operating systems The scheduler (lock manager) maintains a lock table Rules are defined to determine under which condition a transaction is to allow to set a lock Before a transaction is allowed to access a data item, it has to set a lock corresponding to the data item Mapping between a lock and data items (1-to-1, or 1-to-many) What is a data item? (Record? Table? Or a field?) Blocking is used to resolve data conflict (lock conflict) There are two modes for locking read mode for read operations (shared locks) write mode for write operations (exclusive locks)

10. 10 2 Phase Locking (Example)

11. 11 2 Phase Locking Lock Compatibility table Basic Rules Growing phase The scheduler sets lock for a transaction based on the required data items of its received operation Shrinking phase Once the scheduler has released a lock for a transaction, it may not subsequently obtain any more locks for that transaction Number of locks belonged to a transaction increases (up) initially and then decreases (down) to none (two phases)

12. 12 Lock Compatibility Table

13. 13 Variants of 2PL For site-single site DBS Basic 2PL Strict 2PL (dynamic 2PL) Conservative 2PL (static 2PL) For distributed database systems Centralized lock scheduler Distributed lock schedulers Hybrid lock schedulers

14. 14 Basic 2 Phase Locking Basic 2PL Rule 1 (Growing Phase) When the scheduler receives an operation pi[x] from the TM (from transaction Ti), the scheduler tests if pli[x] conflicts with some qlj[x] that is already set on the lock table. If so, it delays (block) pi[x], forcing Ti to wait, until it can set the lock it needs If no lock conflict, the scheduler sets pli[x], and then sends pi [x] to the DM for processing Rule 2: (Growing Phase) Once the scheduler has set a lock for Ti , say pli[x], it may not release that lock at least until after the DM acknowledges that it has processed the lock�s corresponding operation, pi[x] Rule 3 (Shrinking Phase) Once the scheduler has released a lock for a transaction, it may not subsequently obtain any more locks for that transaction

15. 15 Basic 2 Phase Locking Rule 1: prevent two transactions concurrently accessing the same data item in conflicting modes Rule 2: ensure that the DM processes operations on a data item in the order that the scheduler submits them to the DM Rule 3: ensure the two phase rule (one growing phase and one shrinking phase) Note: The (basic) rule does not specify when the commit/abort operation may be performed relative to the lock release operations The lock release operations may be performed before the commit/abort operation (NOT Strict and may have pre-mature write problem, unrecoverable and cascading abort) Note the difference between the arrival order of operations to the scheduler and the execution order (by DM) of operations

16. 16 Basic 2 Phase Locking

17. 17 Why 2 Phase Rules Example: NOT 2PL H1 = rl1[x]r1[x]ru1[x]wl2[x]w2[x]wl2[y]w2[y]wu2[x]wu2[y]c2 wl1[y]w1[y]wu1[y]c1 The SG(H) is T1 ? T2 ? T1. It is non-serializable The problem is due to T1 releases its lock on x before it gets the lock on y If T1 releases its lock on x after it gets its lock on y, then the history will be: H1 = rl1[x]r1[x]wl1[y]w1[y]c1ru1[x] wu1[y] wl2[x]w2[x]wl2[y]w2[y]wu2[x]c2 The SG(H) is T1 ? T2. It is serial. (Locking is a conservative approach)

18. 18 2 Phase Locking Example Detailed mechanism of the last example Initially, neither transaction owns any locks The scheduler receives r1[x] from TM. Accordingly, it performs rl1[x] and submits r1[x] to DM Then DM acknowledges the processing of r1[x] to the scheduler The scheduler receives w2[x] from TM. The scheduler cannot performs wl2[x], which conflicts with rl1[x]. So it delays the execution of w2[x] by placing it on a block queue The scheduler receives w1[y] from TM. It performs wl1[y] and submits w1[y] to DM. Then, DM ack the processing of w1[y] The scheduler receives c1 from TM, signaling that T1 has terminated. The scheduler sends c1 to DM. After DM ack the processing of c1, the scheduler releases rl1[x] and wl1[y]

19. 19 2 Phase Locking Example The scheduler performs wl2[x] so that w2[x], which has been delayed, can now be sent to DM. Then DM ack w2[x] The scheduler receives w2[y] from the TM. It sets wl2[y] and sends w2[y] to DM. DM then ack processing w2[y] T2 terminates and TM sends c2 to the scheduler. The scheduler sends c2 to DM. After DM ack processing c2, the scheduler releases wl2[x] and wl2[y]

20. 20 2 Phase Locking (Example)

21. 21 Strict 2 Phase Locking Basic 2PL The lock release time may be before the commit of a transaction B2PL cannot prevent cascading abort and may be unrecoverable I.e., WL1[x] W1[x] WU1[x] RL2[x] R2[x] C2 �. C1 Strict 2PL (dynamic 2PL) It differs from Basic 2PL that it requires the scheduler to release the locks of a transaction altogether after the transaction has terminated Holds the locks until after the commit of a transaction WL1[x] W1[x] C1 WU1[x] RL2[x] R2[x] �. C2 It results in strict execution (no premature write)

22. 22 Strict 2 Phase Locking Hold locks after the commit.

23. 23 Deadlock Problem in 2PL Deadlock 2PL may result in deadlock (both B2PL and S2PL) Probability of deadlock (lock conflict) depends on: The number of locks required by a transaction The total number of locks, which are locked by other transactions The total number of locks in the system Lock conversion problem If a transaction tries to strengthen a read lock to a write lock, deadlock may occur

24. 24 Lock Conversion Problem

25. 25 Lock Conversion Problem

26. 26 Conservative 2PL Conservative 2PL (Static 2PL) (static refers to the number of locks belonged to a transaction is fixed after it has started execution) It requires a transaction to pre-declare its read-set and write-set of data items The scheduler tries to set all of the locks needed by a transaction before the start of execution of any its operations If the scheduler can set all the locks, the processing of a transaction may start Otherwise, none of the transaction�s locks will be set It inserts the transaction�s lock requests and the transaction into a block queue Every time, the scheduler releases a lock, the transaction�s lock requests will be checked again

27. 27 Conservative 2PL (Example)

28. 28 Comparing S2PL & C2PL Probability of lock conflict Conservative 2PL is higher Locking overhead Conservative 2PL is higher Number of locks Conservative 2PL is greater Deadlock Strict 2PL is possible but Conservative 2PL is not Conservative 2PL may not be possible for some systems (Why)

29. 29 Correctness of 2PL (For reference only) To prove all histories generated according to 2PL are serializable or serial: 1: If oi[x] is in C(H), oli[x] and oui[x] are in C(H) and oli[x]<oi[x]<ou[x] 2: If pi[x] & qj[x] are conflicting operations in C(H), either pui[x] <qlj[x] or quj[x] <pli[x] 3: If pi[x] & qi[y] are in C(H), pli[x] < qui[y]

30. 30 Correctness of 2PL (For reference only) Lemma 1: Let H be a 2PL history, and suppose Ti?Tj is in SG(H). Then, for some data item x and some conflicting operations pi[x] & qj[x] in H, pui[x] <qlj[x] Proof: Since Ti ?Tj, there must exist conflicting operations pi[x] and qj[x] such that pi[x] < qj[x]. From 1, pli[x] < pi[x] < pui[x] and qlj[x] < qj[x] < quj[x] From 2, either pui[x] < qlj[x] or quj[x] < pli[x] (contradiction) Then, pui[x] < qlj[x]

31. 31 Correctness of 2PL (For reference only) Lemma 2: Let H be a 2PL history, and let T1 ?T2 ?... ?Tn be a path in SG(H) where n >1. Then for some data items x and y, and some operations p1[x] and qn[y] in H, pu1[x] < qln[y] Lemma 3: Every 2PL history is serializable T1 ?T2 ?... ?Tn ?T1 is a contradiction

32. 32 Implementation of 2PL The lock scheduler is called lock manager (LM) LM maintains a lock table and supports the lock operations such as Lock/Unlock(transaction-id, data item, mode) Lock operations are invoked very frequently. So, it must be very efficiently implemented The lock operations must be atomic (all-or-none) The lock table is usually implemented as a hash table with the data item identifier as key to reduce the search delay An entry in the table for data item x contains a queue header, which points to a list of locks on x that have been set and a list of locks requests that are waiting (block queue) Since the number of data items and locks can be very large, the LM may limit the size of lock table by dynamic allocation of entries

33. 33 Implementation of 2PL To make the lock release operations more efficient, all the read and write locks of a transaction may be linked together When a transaction commits, all the locks of it will be released at the same time by making one call to the LM The lock table should be protected and only be accessed by the LM

34. 34 Implementation of 2PL

35. 35 Implementation of 2PL A lock manager services the operations Lock(trans-id, data-item-id, mode) Unlock(trans-id, data-item-id) Unlock(trans-id)

36. 36 Locking Performance Resource contention Vs data contention Resource contention: the workload on the system resources, I.e., CPU High RC, long queuing delay for processing Data contention: the probability of data (lock) conflict High DC, high blocking (or restart) probability Factors affecting RC Workload, I.e., arrival rate of transactions and processing time to complete a transaction Transaction restart probability Factors affecting data contention Lock granularity (the mapping of a lock to data items) Multiprogramming level (the number of concurrent transactions) Transaction size (the number of operations (data items) in a transaction) Database size (the number of data items in the database)

37. 37 Locking Granularity Granularity - size of data items to lock e.g., files, pages, records, fields Coarse granularity (table locking) implies very few locks, so little locking overhead must lock large chunks of data, so high chance of conflict, so concurrency may be low Fine granularity (field locking) implies many locks, so high locking overhead lock conflict occurs only when two transactions try to access the exact same data item (I.e., field) concurrently High performance TPS requires record locking

38. 38 Reduce Lock Contention Reduce lock conflict probability use finer grained locks, e.g., by partitioning tables vertically

39. 39 Blocking and Lock Thrashing

40. 40 Interesting Sidelights By getting all locks before a transaction starts (conservative 2PL), you may increase throughput at the thrashing point because blocked transactions hold no locks Free of deadlock You need to get exactly the locks you need and retries of get-all-locks are cheap (why? More lock checking) Pure restart policy - abort when there is a lock conflict and restart when the conflict disappears If aborts are cheap and there is low resource contention, then this policy produces higher throughput before thrashing than a blocking policy But response time could be greater than a blocking policy

41. 41 How to Reduce Lock Contention

42. 42 Hot Spot Techniques Hot spot - A data item that is more popular than others, so a large number of transactions need it summary information (total inventory) end-of-file marker in data entry application counter used for assigning serial numbers Hot spots often create a convoy of transactions and the hot spot lock serializes transaction execution Special techniques are needed to: keep the hot data in main memory delay operations on hot data as late as possible, i.e., till commit time partition hot spot data

43. 43 Distributed 2PL Approaches: distributed central hybrid primary copies (for replicated database) Distributed 2PL Each site has a local lock scheduler The local scheduler is responsible to set the locks at its site If the required lock of an operation of a transaction is in a remote site, the TM forwards the lock request to the remote site The scheduler at the remote site processes the lock request and then the operation Advantages: fully distributed, fault tolerance, load balancing Disadvantages: distributed deadlock

44. 44 Distributed 2PL

45. 45 Centralized 2PL Centralized 2PL Only one lock scheduler in the whole system The lock scheduler is situated at one of the sites, usually called the central site (or primary site) The lock requests from all the sites are sent to the central site The lock scheduler processes lock requests in the same way as it is in a single site db system After setting the lock, an ack will be sent to TM which initiates the lock request TM does not need to send the operation to remote site for processing and it processes the operation locally Advantage: no distributed deadlock Disadvantages: ???

46. 46 Centralized 2PL There is only one lock scheduler in the system. Lock requests are issued to the central scheduler. Data Processor at participating site Coordinating TM Central Site LM

47. 47 Hybrid 2PL Hybrid 2PL Combining distributed 2PL and centralized 2PL There are several lock schedulers in the system but not all the sites have a lock scheduler A scheduler is responsible for the locking of the data items maintained at one to several sites Advantages: combining the benefits of distributed and centralized 2PL Disadvantages: distributed deadlock is possible Replicated Database To reduce the access delay in read operations replicated data may be maintained at several sites One of the copies is called primary copy Lock has to be set on the primary copy before the processing of an operation

48. 48 Distributed Conservative 2PL How to ensure the atomicity in setting locks for a distributed transaction? A distributed 2 phase commit approach may be needed Pre-lock at a site If all the sites can set the pre-locks, the coordinator for the locking will broadcast to lock to all the sites Problem: long holding time in locking and higher communication overheads Sequential locking approach The sites are ordered The locking sequence follows the site order

49. 49 References Ozue: 11.3 Bernstein: ch3 � 3.1, 3.2, 3.4, 3.6, 3.10, 3.12

Two Phase Locking 2PL

Two Phase Locking 2PL

Presentation Transcript

Two-phase behavior

Phase Two Results

Phase Two

Two-Phase: Overview

Phase Three Session Two

Cited 48 times Keywords: ENSO, Phase locking

Two-Phase Commit

Phase Equilibrium: Two Components

Two Phase Locking, Lecture 3 (BHG , Chap. 3)

PHASE TWO

CSIS 7102 Spring 2004 Lecture 3 : Two-phase locking

Concurrent Control Using 2-Phase Locking

Two phase commit

Two Phase Control

Two phase commit

Two-Phase Commit

Two-Phase Commit

Two Phase Pipeline Example

« Device for locking two telescopic elongated elements »

Phase Two of Faith

Phase Two: Strategy

CSIS 7102 Spring 2004 Lecture 3 : Two-phase locking