730 likes | 2.64k Views
2. Components in a Database System. Single-site database systemtransaction manager (TM)scheduler (S)recovery manager (RM)cache manager (CM)RM CM, called data manager (DM)TM:The interface between applications and the rest of the systemPerforms pre-processing Monitor the execution of tran
E N D
1. 1 Two Phase Locking (2PL) System components and Approaches of CC
Two Phase Locking
Protocol & variants
basic, strict 2PL (dynamic) & static (conservative) 2PL
Performance and implementation issues
CC in distributed database systems
2. 2 Components in a Database System Single-site database system
transaction manager (TM)
scheduler (S)
recovery manager (RM)
cache manager (CM)
RM + CM, called data manager (DM)
TM:
The interface between applications and the rest of the system
Performs pre-processing
Monitor the execution of transactions
Receive operations from a transaction
Forward operations to the scheduler
Transaction termination (commit or abort)
3. 3 Components in a Database System Scheduler
It controls the execution order of operations from the same or from different transactions
It may delay, reject or forward an operation immediately to the DM
Operations arrival orders Vs operations execution orders (Not FCFS)
DM
It manipulates storage (cache and disk) by providing operations to fetch data from stable storage into volatile storage and to flush data from volatile storage to stable storage
RM is responsible for commit and abort of a transaction
Distributed DB model
Communication manager for coordination among different sites
4. 4 Components in a Database System
5. 5 Centralized Transaction Execution
6. 6 Distributed Transaction Execution
7. 7 Purposes of Concurrency Control Purposes:
Concurrent execution of transactions: transactions may request to access on the same data item at the “same time”
Data conflict (data contention): transaction T1 has accessed data item x. Before T1 commits, another transaction T2 wants to access x
Serial execution is good for database consistency but bad for performance
no data conflict
Concurrency execution of transactions is good for performance but may result in inconsistent database
The bad effect is permanent
Rules are defined to control the interleaving in transaction execution such that all schedules are correct (serializable) and the resulting database is consistent (with low processing overhead)
8. 8 Approaches in Concurrency Control Using detection method (I.e., serialization graphs testing, optimistic or aggressive)
When a transaction wants to commit, the transaction will allow to commit if the schedule is serializable. Otherwise, it is aborted (or restarted)
May need to restart a large number of transactions (heavy undo and redo overheads)
Require high overhead in searching the graphs
Preventive method (pessimistic or conservative)
Rules are defined so that all the serialization graphs are acyclic
Use blocking (operations and transactions) to prevent the formation of any cyclic SG
9. 9 2 Phase Locking Similar to management of critical sections in operating systems
The scheduler (lock manager) maintains a lock table
Rules are defined to determine under which condition a transaction is to allow to set a lock
Before a transaction is allowed to access a data item, it has to set a lock corresponding to the data item
Mapping between a lock and data items (1-to-1, or 1-to-many)
What is a data item? (Record? Table? Or a field?)
Blocking is used to resolve data conflict (lock conflict)
There are two modes for locking
read mode for read operations (shared locks)
write mode for write operations (exclusive locks)
10. 10 2 Phase Locking (Example)
11. 11 2 Phase Locking Lock Compatibility table
Basic Rules
Growing phase
The scheduler sets lock for a transaction based on the required data items of its received operation
Shrinking phase
Once the scheduler has released a lock for a transaction, it may not subsequently obtain any more locks for that transaction
Number of locks belonged to a transaction
increases (up) initially and then decreases (down) to none (two phases)
12. 12 Lock Compatibility Table
13. 13 Variants of 2PL For site-single site DBS
Basic 2PL
Strict 2PL (dynamic 2PL)
Conservative 2PL (static 2PL)
For distributed database systems
Centralized lock scheduler
Distributed lock schedulers
Hybrid lock schedulers
14. 14 Basic 2 Phase Locking Basic 2PL
Rule 1 (Growing Phase)
When the scheduler receives an operation pi[x] from the TM (from transaction Ti), the scheduler tests if pli[x] conflicts with some qlj[x] that is already set on the lock table. If so, it delays (block) pi[x], forcing Ti to wait, until it can set the lock it needs
If no lock conflict, the scheduler sets pli[x], and then sends pi [x] to the DM for processing
Rule 2: (Growing Phase)
Once the scheduler has set a lock for Ti , say pli[x], it may not release that lock at least until after the DM acknowledges that it has processed the lock’s corresponding operation, pi[x]
Rule 3 (Shrinking Phase)
Once the scheduler has released a lock for a transaction, it may not subsequently obtain any more locks for that transaction
15. 15 Basic 2 Phase Locking Rule 1: prevent two transactions concurrently accessing the same data item in conflicting modes
Rule 2: ensure that the DM processes operations on a data item in the order that the scheduler submits them to the DM
Rule 3: ensure the two phase rule (one growing phase and one shrinking phase)
Note:
The (basic) rule does not specify when the commit/abort operation may be performed relative to the lock release operations
The lock release operations may be performed before the commit/abort operation (NOT Strict and may have pre-mature write problem, unrecoverable and cascading abort)
Note the difference between the arrival order of operations to the scheduler and the execution order (by DM) of operations
16. 16 Basic 2 Phase Locking
17. 17 Why 2 Phase Rules Example: NOT 2PL
H1 = rl1[x]r1[x]ru1[x]wl2[x]w2[x]wl2[y]w2[y]wu2[x]wu2[y]c2
wl1[y]w1[y]wu1[y]c1
The SG(H) is T1 ? T2 ? T1. It is non-serializable
The problem is due to T1 releases its lock on x before it gets the lock on y
If T1 releases its lock on x after it gets its lock on y, then the history will be:
H1 = rl1[x]r1[x]wl1[y]w1[y]c1ru1[x] wu1[y] wl2[x]w2[x]wl2[y]w2[y]wu2[x]c2
The SG(H) is T1 ? T2. It is serial.
(Locking is a conservative approach)
18. 18 2 Phase Locking Example Detailed mechanism of the last example
Initially, neither transaction owns any locks
The scheduler receives r1[x] from TM. Accordingly, it performs rl1[x] and submits r1[x] to DM
Then DM acknowledges the processing of r1[x] to the scheduler
The scheduler receives w2[x] from TM. The scheduler cannot performs wl2[x], which conflicts with rl1[x]. So it delays the execution of w2[x] by placing it on a block queue
The scheduler receives w1[y] from TM. It performs wl1[y] and submits w1[y] to DM.
Then, DM ack the processing of w1[y]
The scheduler receives c1 from TM, signaling that T1 has terminated. The scheduler sends c1 to DM. After DM ack the processing of c1, the scheduler releases rl1[x] and wl1[y]
19. 19 2 Phase Locking Example The scheduler performs wl2[x] so that w2[x], which has been delayed, can now be sent to DM. Then DM ack w2[x]
The scheduler receives w2[y] from the TM. It sets wl2[y] and sends w2[y] to DM. DM then ack processing w2[y]
T2 terminates and TM sends c2 to the scheduler. The scheduler sends c2 to DM. After DM ack processing c2, the scheduler releases wl2[x] and wl2[y]
20. 20 2 Phase Locking (Example)
21. 21 Strict 2 Phase Locking Basic 2PL
The lock release time may be before the commit of a transaction
B2PL cannot prevent cascading abort and may be unrecoverable
I.e., WL1[x] W1[x] WU1[x] RL2[x] R2[x] C2 …. C1
Strict 2PL (dynamic 2PL)
It differs from Basic 2PL that it requires the scheduler to release the locks of a transaction altogether after the transaction has terminated
Holds the locks until after the commit of a transaction
WL1[x] W1[x] C1 WU1[x] RL2[x] R2[x] …. C2
It results in strict execution (no premature write)
22. 22 Strict 2 Phase Locking Hold locks after the commit.
23. 23 Deadlock Problem in 2PL Deadlock
2PL may result in deadlock (both B2PL and S2PL)
Probability of deadlock (lock conflict) depends on:
The number of locks required by a transaction
The total number of locks, which are locked by other transactions
The total number of locks in the system
Lock conversion problem
If a transaction tries to strengthen a read lock to a write lock, deadlock may occur
24. 24 Lock Conversion Problem
25. 25 Lock Conversion Problem
26. 26 Conservative 2PL Conservative 2PL (Static 2PL) (static refers to the number of locks belonged to a transaction is fixed after it has started execution)
It requires a transaction to pre-declare its read-set and write-set of data items
The scheduler tries to set all of the locks needed by a transaction before the start of execution of any its operations
If the scheduler can set all the locks, the processing of a transaction may start
Otherwise, none of the transaction’s locks will be set
It inserts the transaction’s lock requests and the transaction into a block queue
Every time, the scheduler releases a lock, the transaction’s lock requests will be checked again
27. 27 Conservative 2PL (Example)
28. 28 Comparing S2PL & C2PL Probability of lock conflict
Conservative 2PL is higher
Locking overhead
Conservative 2PL is higher
Number of locks
Conservative 2PL is greater
Deadlock
Strict 2PL is possible but Conservative 2PL is not
Conservative 2PL may not be possible for some systems (Why)
29. 29 Correctness of 2PL (For reference only) To prove all histories generated according to 2PL are serializable or serial:
1: If oi[x] is in C(H), oli[x] and oui[x] are in C(H) and oli[x]<oi[x]<ou[x]
2: If pi[x] & qj[x] are conflicting operations in C(H), either pui[x] <qlj[x] or quj[x] <pli[x]
3: If pi[x] & qi[y] are in C(H), pli[x] < qui[y]
30. 30 Correctness of 2PL (For reference only) Lemma 1:
Let H be a 2PL history, and suppose Ti?Tj is in SG(H). Then, for some data item x and some conflicting operations pi[x] & qj[x] in H, pui[x] <qlj[x]
Proof:
Since Ti ?Tj, there must exist conflicting operations pi[x] and qj[x] such that pi[x] < qj[x].
From 1,
pli[x] < pi[x] < pui[x] and
qlj[x] < qj[x] < quj[x]
From 2,
either pui[x] < qlj[x] or quj[x] < pli[x] (contradiction)
Then, pui[x] < qlj[x]
31. 31 Correctness of 2PL (For reference only) Lemma 2:
Let H be a 2PL history, and let T1 ?T2 ?... ?Tn be a path in SG(H) where n >1. Then for some data items x and y, and some operations p1[x] and qn[y] in H, pu1[x] < qln[y]
Lemma 3:
Every 2PL history is serializable
T1 ?T2 ?... ?Tn ?T1 is a contradiction
32. 32 Implementation of 2PL The lock scheduler is called lock manager (LM)
LM maintains a lock table and supports the lock operations such as Lock/Unlock(transaction-id, data item, mode)
Lock operations are invoked very frequently. So, it must be very efficiently implemented
The lock operations must be atomic (all-or-none)
The lock table is usually implemented as a hash table with the data item identifier as key to reduce the search delay
An entry in the table for data item x contains a queue header, which points to a list of locks on x that have been set and a list of locks requests that are waiting (block queue)
Since the number of data items and locks can be very large, the LM may limit the size of lock table by dynamic allocation of entries
33. 33 Implementation of 2PL To make the lock release operations more efficient, all the read and write locks of a transaction may be linked together
When a transaction commits, all the locks of it will be released at the same time by making one call to the LM
The lock table should be protected and only be accessed by the LM
34. 34 Implementation of 2PL
35. 35 Implementation of 2PL A lock manager services the operations
Lock(trans-id, data-item-id, mode)
Unlock(trans-id, data-item-id)
Unlock(trans-id)
36. 36 Locking Performance Resource contention Vs data contention
Resource contention: the workload on the system resources, I.e., CPU
High RC, long queuing delay for processing
Data contention: the probability of data (lock) conflict
High DC, high blocking (or restart) probability
Factors affecting RC
Workload, I.e., arrival rate of transactions and processing time to complete a transaction
Transaction restart probability
Factors affecting data contention
Lock granularity (the mapping of a lock to data items)
Multiprogramming level (the number of concurrent transactions)
Transaction size (the number of operations (data items) in a transaction)
Database size (the number of data items in the database)
37. 37 Locking Granularity Granularity - size of data items to lock
e.g., files, pages, records, fields
Coarse granularity (table locking) implies
very few locks, so little locking overhead
must lock large chunks of data, so high chance of conflict, so concurrency may be low
Fine granularity (field locking) implies
many locks, so high locking overhead
lock conflict occurs only when two transactions try to access the exact same data item (I.e., field) concurrently
High performance TPS requires record locking
38. 38 Reduce Lock Contention Reduce lock conflict probability
use finer grained locks, e.g., by partitioning tables vertically
39. 39 Blocking and Lock Thrashing
40. 40 Interesting Sidelights By getting all locks before a transaction starts (conservative 2PL), you may increase throughput at the thrashing point because blocked transactions hold no locks
Free of deadlock
You need to get exactly the locks you need and retries of get-all-locks are cheap (why? More lock checking)
Pure restart policy - abort when there is a lock conflict and restart when the conflict disappears
If aborts are cheap and there is low resource contention, then this policy produces higher throughput before thrashing than a blocking policy
But response time could be greater than a blocking policy
41. 41 How to Reduce Lock Contention
42. 42 Hot Spot Techniques Hot spot - A data item that is more popular than others, so a large number of transactions need it
summary information (total inventory)
end-of-file marker in data entry application
counter used for assigning serial numbers
Hot spots often create a convoy of transactions and the hot spot lock serializes transaction execution
Special techniques are needed to:
keep the hot data in main memory
delay operations on hot data as late as possible, i.e., till commit time
partition hot spot data
43. 43 Distributed 2PL Approaches:
distributed
central
hybrid
primary copies (for replicated database)
Distributed 2PL
Each site has a local lock scheduler
The local scheduler is responsible to set the locks at its site
If the required lock of an operation of a transaction is in a remote site, the TM forwards the lock request to the remote site
The scheduler at the remote site processes the lock request and then the operation
Advantages: fully distributed, fault tolerance, load balancing
Disadvantages: distributed deadlock
44. 44 Distributed 2PL
45. 45 Centralized 2PL Centralized 2PL
Only one lock scheduler in the whole system
The lock scheduler is situated at one of the sites, usually called the central site (or primary site)
The lock requests from all the sites are sent to the central site
The lock scheduler processes lock requests in the same way as it is in a single site db system
After setting the lock, an ack will be sent to TM which initiates the lock request
TM does not need to send the operation to remote site for processing and it processes the operation locally
Advantage: no distributed deadlock
Disadvantages: ???
46. 46 Centralized 2PL There is only one lock scheduler in the system.
Lock requests are issued to the central scheduler.
Data Processor at
participating site Coordinating TM Central Site LM
47. 47 Hybrid 2PL Hybrid 2PL
Combining distributed 2PL and centralized 2PL
There are several lock schedulers in the system but not all the sites have a lock scheduler
A scheduler is responsible for the locking of the data items maintained at one to several sites
Advantages: combining the benefits of distributed and centralized 2PL
Disadvantages: distributed deadlock is possible
Replicated Database
To reduce the access delay in read operations replicated data may be maintained at several sites
One of the copies is called primary copy
Lock has to be set on the primary copy before the processing of an operation
48. 48 Distributed Conservative 2PL How to ensure the atomicity in setting locks for a distributed transaction?
A distributed 2 phase commit approach may be needed
Pre-lock at a site
If all the sites can set the pre-locks, the coordinator for the locking will broadcast to lock to all the sites
Problem: long holding time in locking and higher communication overheads
Sequential locking approach
The sites are ordered
The locking sequence follows the site order
49. 49 References Ozue: 11.3
Bernstein: ch3 – 3.1, 3.2, 3.4, 3.6, 3.10, 3.12