520 likes | 640 Views
Review Concurrency. Enterprise Systems architecture and infrastructure DT211 4 . Desirable Properties of Transactions. 1. Atomicity : A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all.
E N D
Review Concurrency Enterprise Systems architecture and infrastructure DT211 4
Desirable Properties of Transactions • 1. Atomicity: A transaction is an atomic unit of processing; it is either performed in its entirety or not performed at all. • 2. Consistency preservation: A correct execution of the transaction must take the database from one consistent state to another. • 3. Isolation: A transaction should not make its updates visible to other transactions until it is committed; this property, when enforced strictly, solves the temporary update problem and makes cascading rollbacks of transactions unnecessary. • 4. Durability or permanency: Once a transaction changes the database and the changes are committed, these changes must never be lost because of subsequent failure.
Why Concurrency Control? • A Transaction: logical unit of database processing that includes one or more access operations (read retrieval, write insert or update, delete). • Concurrency control is used to ensure the isolation property of concurrently executing transactions via protocols such as locking, timestamping, optimistic concurrency control…
Concurrency problems • 2 general problems can occur when there is no proper concurrency control: • Lost update: • Temporary update (dirty read) • Essentially they both break the isolation property of database transactions • Make updates visible to other transactions before they are committed to the database • There is a conflict in the schedule
Example 1: Lost Updates Transactions • User X: Updating Customer A account with withdrawal of $50 • User Y: Updating Customer A account with deposit of $25 • Customers balance should be (100 –50 + 25 = 75) USER XUSER Y 1 Read Cust A record (Balance = $100) 1 Read Cust A record (Balance = $100) 2 Bal = Bal - 50 (Balance = $50) 2 Bal = Bal + 25 (Balance = $125) 3 Write Cust A record (Balance = $50) 3 Write Cust A record (Balance = $125) Time
Example 3:Temporary Update Problem Transactions • A update to a product that that falters • A product update that completes USER X USER Y 1 Read Prod A record (QOH = 35) 2 Update QOH (+100) 3 Write Prod A record (QOH = 135) 1 Read Prod A QOH (QOH = 135) 2 Update QOH (-30) (QOH = 105) 4 Failure: Rollback 3 Write Prod A QOH (QOH = 105) Begin Recovery… 4 ….commit;
Schedule Conflict • Two operations in a schedule are said to conflict if they satisfy all three of the following conditions: • 1) They belong to different transactions • 2) They access the same item X. • 3) At least one of the operation is a write_item(X). • Non conflicting schedules are classes as serial or more serialisibility (equivalent to serial schedule) • A serializable schedule gives the benefits of concurrency without giving up correctness.
CLASSIFICATION OF CONCURRENCY CONTROL TECHNIQUES • 1. Locking data items to prevent multiple transactions from accessing the items concurrently; a number of locking protocols have been proposed. • 2. Use of timestamps. A timestamp is a unique identifier • for each transaction, generated by the system. • 3. Optimistic Concurrency Control: based on the concept of validation or certification of a transaction after it executes its operations; these are sometimes called optimistic protocols. They proceed optimistically; back up and repair if needed • 4. Pessimistic protocol: do not proceed until knowing that no back up is needed.
Two-Phase Locking • transaction divided into 2 phases: • – growing - new locks acquired but none released • – shrinking - existing locks released but no new ones acquired
Two-Phase Locking (cont.) • If every transaction in a schedule follows the twophase locking protocol, the schedule is guaranteed to be serializable i.e. no concurrency problems will occur. • The twophase locking protocol guarantees serializability however the use of locks can cause two additional problems: deadlock and starvation.
DEADLOCK PREVENTION: • Use of transaction timestamp TS(T) • Two protocols can be used to prevent or more precisely roll-back one transaction in the case of deadlock. • Wait-die (older transaction waits for a younger one…) • Wound –wait protocol (younger waits for older transaction to finish)
Timestamping protocol Transaction Timestamps: the time the transaction starts Data Timestamps • Read-Timestamp is timestamp of largest timestamp to read that data item • Write-Timestamp is timestamp of largest timestampto write (update) that data item • Timestamping prevents deadlock and starvation
Concurrency control using Timestamp • Basic timestamp methods • Write- operation • When a Transaction attempts a write operation on a data item X it must first check that X has not been read or updated by a younger transaction: proceeds if no and rolled back if yes. • Read operation • When a Transaction attempts a read operation on a data item X it must first check that X has not been updated by a younger transaction.proceeds if no and rolled back if yes.
Concurrency Control based on Timestamps • The basic idea or rules are as follows: • 1. Each transaction receives a timestamp when it is initiated at its site of origin. • 2. Each read or write operation which is required by a transaction has the timestamp of the transaction. • 3. For each data item x, the largest timestamp of a read operation and the largest timestamp of a write operation are recorded; they will be indicated as TRD(x) and TWR(x) • 4. Let T be the timestamp of a read operation on data item x. If T < TWR(x), the read operation is rejected and the issuing transaction restarted with a new timestamp; otherwise, the read is executed, and TRD(x) = max(TRD(x), T). • 5. Let T be the timestamp of a write operation on data item x. If T < TWR(x) or T < TRD(x), then the operation is rejected and the issuing transaction is restarted; otherwise, the write is executed, and TWR(x) = T.
Optimistic Currency Control Conditions • Small number of conflicts likely – no ‘hotspots’ for writing Technique Each transaction has up to 3 phases. • Read phase • Execution phase of transaction. • All writes are done in private buffer space. • RS(T): all items read by transaction T • WS(T): all items written by transaction T • Validation Phase • Check is performed to ensure that writes can be installed in the database without violating serializability. • Write Phase • If validation succeeds, actual updates are applied to the database. • If validation fails, the transaction is aborted.
Question to consider • Explain / illustrate each of the concurrency control methods prevents the violation of the ACID properities.
Review Recovery Denis Manley Enterprise Systems DT211 4
RECOVERY TECHNIQUES ARE NEEDED BECAUSE TRANSACTIONS MAY FAIL • A computer failure or system crash: A hard ware or software error occurs during transaction execution. • Concurrency control enforcement: The concurrency control method may decide to abort the transaction • Disk failure: Some disk blocks may lose their data because of a read or write malfunction or because of a disk read/write head crash. • Physical problems and catastrophes: This refers to an endless list of problems that includes power or airconditioning failure, fire. Need disaster recovery as well for such problems .
The System Log • T is the system generated transaction-id. • 1. [start_transaction,T]: Records that transaction T has started execution. • 2. [write_item,T,X,old_value,new_value]: Records that transaction T has changed the value of database item X from old_value to new_value. • 3. [read_item,T,X]: Records that transaction T has read the value of database item X • 4. [commit,T]: Records that transaction T has completed successfully, and affirms that its effect can be committed to the DB. • 5. [abort,T]: Records that transaction T has aborted.
RECOVERY USING LOG RECORDS • If the system crashes, we can recover to a consistent database state by examining the log. • It is possible to undo the effect of these WRITE operations of a transaction T by tracing backward through the log and resetting all items changed by a WRITE operation of T to their old_values. • We can also redo the effect of the WRITE operations of a transaction T by tracing forward through the log and setting all items changed by a WRITE operation of T to their new values.
Commit Point • A transaction T reaches a commit point when all its operations that access the DB have been executed successfully and the effect of all the transaction operations on the DB have been recorded in the log.
Undo/redo • UNDO/REDO (Immediate update): • write-ahead to log on disk • update database anytime • commit allowed before database is completely updated • Goal: Maximize efficiency during normal operation. • Some extra work is required during recovery time. • Following a failure, the following is done. • Redo all transactions for which the log has both “start” and “commit” entries. • Undo all transactions for which the log has “start” entry but no “commit” entry. 24
Example of undo/redo • We consider two transactions executed sequentially by the system. • T1: Read(A) T2: Read(A) A A + 50 A A +10 Read(B) Write(A) B B + 100 Read(D) Write(B) D D -10 Read(C) Read(E) C 2C Read(B) Write(C) E E + B A A + B + C Write(E) Write (A) D D + E Write(D) • The initial values are: A=100 B=300 C=5 D=60 E=80
Example (cont) • The Log 1. <T1 starts> 2. <T1, B, old: 300, new: 400> 3. <T1, C, old: 5, new: 10> 4. <T1, A, old: 100, new: 560> 5. <T1 commits> 6. <T2 starts> 7. <T2, A, old: 560, new: 570> 8. <T2, E, old: 80, new: 480> 9. <T2, D, old: 60, new: 530> 10. <T2 commits> • Output of B can occur anytime after entry 2 is output to the log, etc. Determine action at T = 1, 1 =< T =< 4, 5= < T=< 9, T =10
Example (cont) • Assume a system crash occurs. The log is examined. Various actions are taken depending on the last instruction (actually) written on it.
No-Undo/Redo • NO-UNDO/REDO (Deferred update): • don’t change database until ready to commit • write-ahead to log to disk • change the database after commit is recorded in the log • Advantages • Faster during recovery: no undo. • No before images needed in log. • Disadvantages • Database outputs must wait. • Lots of extra work at commit time.
Undo/No-Redo • UNDO/NO-REDO (Immediate update): • All changed data items need to be output to the disk before commit. • Requires that the write entry first be output to the (stable) log. • At commit: • Output (flush) all changed data items in the cache. • Add commit entry to log. • Advantages • No after images are needed in log. • No transactions need to be redone. • Disadvantages • data requires a flush for each committed write. • Implies lots of I/O traffic.
No-Undo/No-Redo • NO-UNDO/NO-REDO (shadow paging): • No-undo don't change the database during a transaction • No-redo on commit, write changes to the database in a single atomic action • Advantages • Recovery is instantaneous. • No recovery code need be written.
y x x z z z y x z y x z y x z x y y Last committed value of x Last committed value of x Last committed value of x Last committed value of y Last committed value of y Last committed value of y Last committed value of z Last committed value of z Last committed value of z New version of x New version of x New version of x New version of y New version of y New version of y Master Master Master Shadow paging • During a transaction: • After preparing new directory for commit: • After committing:
Checkpointing • Checkpointing speeds up recovery by flushing pages to disk. • During the execution in addition to the activities of the previous method, periodically perform checkpointing. • Output the log buffers to the log. • Force database buffers to the disk. • Output an entry <checkpoint> on the log. • During recovery • Undo all transactions that have not committed. • Redo all transactions that have committed after checkpoint.
T1 T2 T3 Recovery with Checkpoints Tc Tf Time • If the protocol is undo/redo then: • T1 is ok. • T2 and T3 are redone. • T4 is undone T4 Checkpoint System Failure
Question to consider • How might you determine the most appropriate recovery method for your database system?
Denis Manley Enterprise Systems DT2114 Review Query Optimisation
Selection • Primary key, pointsFilmID = 000002 (Film) • PointsTitle = ‘Terminator’ (Film) • Ranges1 < RentalPrice < 4 (Film) • ConjunctionsType = ‘M’ (AND) Distributor = ‘MGM’ (Film) • Selection contains only tuples that satisfy all conjuncts • DisjunctionsPubDate < 1990 (OR) Distributor = ‘MGM’ (Film) • Selection contains tuples formed by the union of all tuples that satisfy the disjuncts
Query Optimization • Transform query into faster, equivalent query equivalent query 1 equivalent query 2 faster query query ... equivalent query n • Heuristic (logical) optimization • Query tree (relational algebra) optimization • Query graph optimization • Cost-based (physical) optimization
Steps in typical Heuristics Optimisation Step 1: Decompose s operations. Step 2: Move s as far down the query tree as possible. Step 3: Rearrange leaf nodes to apply the most restrictive s operations first. Step 4: Form joins from and subsequent s operations. Step 5: Decompose p and move down the query tree as far as possible. Step 6: Identify candidates for combined operations.
Query Tree Optimization Example • What are the names of customers living on Elm Street who have checked out “Terminator”? • SQL query:SELECT NameFROM Customer CU, CheckedOut CH, Film FWHERE T.Title = ’Terminator’ AND F.FilmId = CH.FilmIDAND CU.CustomerID = CH.CustomerID and CU.Street = ‘Elm’
Canonical Query Tree pName sTitle = ‘Terminator’ F.FilmId = CH.FilmID CU.CustomerID = CH.CustomerID CU.Street = ‘Elm’ F CH CU
Apply Selections Early pName s F.FilmId = CH.FilmID sCU.CustomerID = CH.CustomerID sTitle = ‘Terminator’ F sStreet = ‘Elm’ CH CU
Apply More Restrictive Selections Early pName s CU.CustomerID = CH.CustomerID s F.FilmId = CH.FilmID sStreet = ‘Elm’ CU sTitle = ‘Terminator’ CH F
Form Joins pName ⋈CU.CustomerID = CH.CustomerID ⋈F.FilmId = CH.FilmID sStreet = ‘Elm’ CU CH sTitle = ‘Terminator’ F
Apply Projections Early pName ⋈CU.CustomerID = CH.CustomerID ⋈F.FilmId = CH.FilmID pFilmID, CustomerID sStreet = ‘Elm’ pFilmID pFilmID, CustomerID CU sTitle = ‘Terminator’ CH F
Example: Identify Combined Operations 4 pName nCU.CustomerID = CH.CustomerID nF.FilmId = CH.FilmID pFilmID, CustomerID 2 3 sStreet = ‘Elm’ pFilmID pFilmID, CustomerID 1 CU sTitle = ‘Terminator’ CH F
Cost-Based Optimization • Use transformations to generate multiple candidate query trees from the canonical query tree. • Measuring Cost • Typically disk access is the predominant cost, and is also relatively easy to estimate. • Therefore number of block transfers from disk is used as a measure of the actual cost of evaluation. • It is assumed that all transfers of blocks have the same cost. • Do not include cost to writing output to disk. • Cost formulas estimate the cost of executing each operation in each candidate query tree. • The candidate query tree with the least total cost is selected for execution.
Relevant Statistics • Per relation • Tuple size • Number of tuples (records): r • Per attribute • Attribute average size • Number of distinct values for attribute A: dA
Cost Estimation Example 4 pName ⋈CU.CustomerID = CH.CustomerID ⋈F.FilmId = CH.FilmID pFilmID, CustomerID 2 3 sStreet = ‘Elm’ pFilmID pFilmID, CustomerID 1 CU sTitle = ‘Terminator’ CH F
Operation 1: s followed by a p • Statistics • Relation statistics: rFilm= 5,000 • Attribute statistics: sTitle= 1 • Result relation size: 1 tuple. • Cost (in disk accesses): C1 = 1 • Statistics • Relation statistics: rCheckedOut= 40,000 • Attribute statistics: sFilmID= 8 • Result relation size: 8 tuples. • Cost: C2 = 8
Operation 3: s followed by a p • Statistics • Relation statistics: rCustomer= 200 • Attribute statistics:sStreet= 10 • Result relation size: 10 tuples. • Cost: C3 = 10 • Operation: Main memory join on relations in main memory. • Cost: C4 = 0 Total cost: 19