720 likes | 733 Views
Learn about logging systems, serializability, dirty data problems, and recovery methods in transaction management for databases. Understand different lock notions, data consistency, schedulers, and recoverable schedules in this detailed chapter.
E N D
Brice – 遠山研 – 教育輪講 More about transacion management Brice - 教育輪講
About this chapter... • Assume that Chapter 17 and 18 are known... • ??? • Cover issues that were not addressed yet about transaction management Brice - 教育輪講
Overview of Chapter 17... • Logging system • Used to recover the DB state when it crashes • In case of a crash, it is possible to reconsctruct the actions of the committed transactions on the disk copy of the database • No attempt to support serialization • Views of database computation • Values move between non volatile disk, volatile memory and local address space of transaction Brice - 教育輪講
Overview of Chapter 18... • Serializability • Outcome is the same as if transactions were executed sequentially • Schedulers • Allow more possibilities than log managers • Can overwrite a locked element A before committing • Can write and abort without undoing the writing and the scheduler should maintain serializability Brice - 教育輪講
Outline of the presentation • Dirty data problem • Deadlocks • Long transactions Brice - 教育輪講
Outline of the presentation • Dirty data problem • Deadlocks • Long transactions Brice - 教育輪講
Notations with transactions • We use the following notations : • l : lock • xl : exclusive lock • sl : shared lock • u : unlock • r : read • w : write Brice - 教育輪講
The dirty-data problem • Dirty data • A data that has been written by a transaction that is not yet committed • Can appear either in the buffers or on disk • Both can cause trouble Brice - 教育輪講
Dirty-data w/ serializable schedule • Example with serializable schedule • 2 transactions • Shared lock • Can be any number of SL for any element X • Allow a transaction to read the value of X • Exclusive lock • Only 1 or 0 EL on any element X • Allow to read AND write Brice - 教育輪講
Dirty-data w/ serializable schedule T1 T2 A B 25 25 l1(A); r1(A); A := A+100; w1(A); l1(B); u1(A); 125 l2(A); r2(A); A := A*2; w2(A); l2(B) Denied 250 r1(B); Abort; u1(B); l2(B); u2(A); r2(B); B := B*2; w2(B); u2(B); 50 Brice - 教育輪講
Dirty-data w/ serializable schedule • After reading B, T1 abort • The scheduler the lock on B that T1 obtained • If not, B would be forever unavailable to any other transaction • The value that T2 has read is not “consistent” • The A read by T2 has been changed with T1 • The B value read by T2 is prior to T1’s actions • A written by T1 is dirty data • We would need to rollback both T2 and T1 Brice - 教育輪講
Dirty-data w/ timestamp scheduler • Example with timestamp-based scheduler • 3 transactions • Notations : • RT(X) : read time of X • Highest timestamp of a transaction that has read X • WT(X) : write time of X • Highest timestamp of a transaction that has written X • C(X) : commit bit for X is not used • True if and only if the most recent transaction to write X has been committed Brice - 教育輪講
Dirty-data w/ timestamp scheduler T1 T2 T3 A B C 200 150 175 RT=0 RT=0 RT=0 WT=0 WT=0 WT=0 WT=150 w2(B); r1(B); RT=150 r2(A); r3(C); RT=175 w2(C); Abort; WT=0 w3(A); WT=175 Brice - 教育輪講
Dirty-data w/ timestamp scheduler • When T1 read B, T1 does not delay • There is no commit bit check • Otherwise, delayed after T2 aborted • T2 tries to write C in a physically unrealizable way, thus T2 abord • Effect of T2 before the write of B is cancelled • WT of B is reset to what is was before T2 wrote • T1 has been allowed to used cancelled value of B : B is dirty value Brice - 教育輪講
Cascading rollback • If dirty data is available, we might have to perform cascading rollback • When T aborts • We determine which transactions have read data written by T • We abort them • We recursively abort any transaction that have read data written by an aborted transaction Brice - 教育輪講
Undoing aborted transactions • To cancel the effect of aborted transaction • We can use the logs • If the data has not migred to the disk we can restore it from disk copy • Avoiding cascading rollback • Timestamp-based scheduler with commit bit (transaction with dirty data do not proceed) • A validation-based scheduler (only write after it is determined that transaction will commit) Brice - 教育輪講
Allowing recovery • For any logging methods, to allow recovery • The set of transaction regarded as « committed » after recovery must be consistent • Example : • After recovery, T1 is regarded as committed • T1 uses a value written by T2 • Thus, T2 must also remain commited after recovery Brice - 教育輪講
Recoverable schedules • Recoverable schedule • If each transaction commits only after each transaction from which it has read has committed • Examples • Serial : • Not serial : • Non serializable : • Might require cascading rollback • If T1 need to rollback , then T2 need to rollback S1 : w1(A); w1(B); w2(A); r2(B); c1; c2; S2 : w2(A); w1(B); w1(A); r2(B); c1; c2; S3 : w1(A); w1(B); w2(A); r2(B); c2; c1; Brice - 教育輪講
Recoverable schedules • Avoids cascading rollback schedule • ACR schedule • If transactions may read only values written by committed transactions • Forbids the reading of dirty data • Previous exemples S1, S2, S3 • Not ACR because T2 reads from T1 non committed • Example of ACR schedule : S4 : w1(A); w1(B); w2(A); c1; r2(B); c1; Brice - 教育輪講
Managing rollbacks using locking • If the scheduler is lock-based • Strict lock : • A transaction must not release any exclusive locks (or increment locks) ... • Until the transaction has committed or aborted... • And the commit or abort log has been flushed to disk Brice - 教育輪講
Concerning strict schedules • Strict schedule • A schedule that follow the strict lock rule • Properties : • Every strict schedule is ACR • T2 cannot read X of T1 until T1 releases any EL or IL • This release does not occur until after commit • Every strict schedule is serializable • Equivalent to a serial schedule is which each transaction runs instantaneously at the time it commits Brice - 教育輪講
Classes of schedules • Relationship between the differents kind of schedules Serial Serializable Strict ACR Recoverable Brice - 教育輪講
Data in the buffers...? • Under strict schedule, we cannot read dirty data • Data written in a buffer by uncommitted transaction remains locked until it commits • Problem : fixing data in buffer when transaction aborts • Changes must have their effect cancelled • Difficulty depends on the dabatase elements Brice - 教育輪講
Rollback for blocks • If lockable database elements are blocks • Method that does not use the logs! • T has exclusive lock on A and written a new value for A in a buffer and has to abort • A has been locked since T has written its value thus no transaction has read A • Rule : • Blocks written by uncommitted transaction are pinned in main memory Brice - 教育輪講
Managing rollbacks using locking • In that case • We rollback T on aborts • We tell the buffer manager to ignore the value of A • The buffer occupied by A is not written anywhere and is an available buffer • On top of that, if using multiversion system • We remove the value of A written by T from the list of available values of A Brice - 教育輪講
Rollback for small elements • If lockable elements are fraction of a block • A buffer may contain data changed by several transactions • If one abort, we should preserve the changes by the others Brice - 教育輪講
Rollback for small elements • To restore the value of A written by a transaction that has aborded • Read original value of A from DB stored on disk and modify the buffer accordingly • If the log is an undo/redo log we get the value of the former log • Keep a separate main-memory log of the changed made by each transaction, keeped only for when that transaction is active • None of the approaches is ideal Brice - 教育輪講
Group commit • Sometimes, we can avoid reading dirty data without flushing record on disk immediatly • If we flush log records in the order they are written • We can release locks as soon as the commit record is written to the log in a buffer Brice - 教育輪講
Group commit : example • Example : • T1 writes X, finishes, writes COMMIT record on log • But, the log remains in a buffer • T1 has not “committed” but we release T1’s locks • T2 reads X and “commits” but record remains in buffer • T2 is not seen as committed unless T1 is also committed if we flush in order Brice - 教育輪講
Group commit : example • Example : (con’t) • The recovery manager finds that • T1 is committed on disk, so we know that T2 did not read X from uncommitted transaction • T1 is not committed on disk, then neither is T2 and both are aborted. • Group commit • Do not release locks until T finishes and log record appear in a buffer • Flush log blocks in the order they were created Brice - 教育輪講
Problems... • Dirty read easier to fix when elements or blocks • Problems when database elements are blocks • The logging methods require the old or new or both values to be recorded in logs : redundancy • The requierement for the schedule to be recoverable inhibit concurrency Brice - 教育輪講
Logical logging • Logical logging • Only changing to the blocks are described • A small number of bytes of the DB elements are changed (update of a fixed-length field) • Acutally, it has the effect of changing most of the bytes in the DB element • Further changed can prevent this change from being ever undone • Illustrated with the examples Brice - 教育輪講
Logical logging : 1st example • Elements • blocks that contain a set of tuples from some relation • Update • “tuple t had its attribute changed from value v1 to v2” • Insertion • “tuple t with value (v1,v2 ,..) was inserted with offset p” Brice - 教育輪講
Logical logging : 1st example • Implied inverses • Restoration • The value of t[a] is restored from v2 to v1 • Deletion • Tuple t is removed • The operations are idempotent • Same result when performed several times Brice - 教育輪講
Logical logging : 2nd example • Elements • Blocks holding tuples that have variable-length fields • We might have to slide large portions of blocks to make size • Or use overblow block... • Block-plus-overflown must be considered as holding tuples at a “logical” level Brice - 教育輪講
Logical logging : 3nd example • Logical logging of B-tree nodes • Does not permit overflow blocks • Instead of old/new value of a node in the log, we write the changes • Insertion/Deletion of a key/pointer pair for a child • Change of the key associated with a pointer • Splitting or merging of nodes Brice - 教育輪講
Logical logging : 3nd example • Logical logging of B-tree nodes • Allow us to release locks earlier than for a recoverable schedule • Dirty reads of B-tree blocks is not a problem if its purpose is to use the B-tree to locate data we need to access • T read leaf N, but U that last wrote N aborts Brice - 教育輪講
Logical logging : 3nd example • Exemple • T read leaf N, but U that last wrote N aborts • Some changes done to N need to be undone • If T has a key/pointer pair into N, we cannot restore it to before U modification • BUT effect of U on N can be undone, by deleting the key/pointer that U has inserted • U is not the same as before T but no inconstancy • Logical level and not physical level Brice - 教育輪講
Recovering from logical logs • If the logical actions are idempotent • First example and insertion • We do not worry whether we had already inserted • If not • Second and third examples • We cannot associate a particular place for a tuple to be inserted • “the tuple t was inserted somewhere on block B” • Several copies of t on B or if the 1st copy in on disk Brice - 教育輪講
Log sequence number • Log sequence number • Each log is giving a number greater than that of the previous log record • Form of the log : <L, T, A, B> • For each action there is a compensating action that logically undoes the action • If T aborts, then for each action performed, the corresponding compensation action is performed and everything is written in the log • Each block maintains the log sequence number of the last action that affected that block Brice - 教育輪講
How to proceed the recovery • After a crash • Reconstruct the state of the DB at the time of the crash • Find the most recent ceckpoint and determine the active transaction at the time • For each log entry compare the log seq nb on B with the log quence L for the log. • If N < L, redo A :the action was never performed on B • If N > L, do nothing : the effect was felt already • Adjust the set of active transactions accordinly Brice - 教育輪講
How to proceed the recovery • After a crash • The set of transactions that remain active when we reach the end of the log must be aborded • Scan the log from the end back to previous checkpoint • Each time we encounter a record for T that must be aborded, we perform the compensing action for A on B and we record that in the log • If we must abord a transaction that began prior to the most recent checkpoint then continue back in the log until the start-records for all such transactions • Write abort-records in the log for each transaction we had to abort Brice - 教育輪講
Outline of the presentation • Dirty data problem • Deadlocks • Long transactions Brice - 教育輪講
Deadlocks • Deadlock • Each of several transactions is waiting for a resource held by one of the others and none can progress • 2 broad approaches for dealing with that problem • Detect and fix them • Arrange so that they never happen Brice - 教育輪講
Deadlocks detection • We use timeouts • Put a limit on how long a transaction may be active • If a transaction exeeds this time, we roll it back • T in ms, then timout minutes • When T times out and rolls back, it releases its locks or other ressources • It is possible for T involved in a deadlock to complete before “ timing out” Brice - 教育輪講
The wait-for graphs • Wait-for graphs • Indicates which transactions are waiting for locks held by another transaction • Detection and prevention • The latter requires to maintains the graph at all times, refusing cycles in the graph Brice - 教育輪講
The wait-for graphs • There is an arc from node (transaction) T to node U if there is an element A so that : • U holds a lock on A • T is waiting for a lock on A • T cannot get a lock on A in its desired mode unless U first releases its lock on A • We roll back any transaction that makes a request that would cause a cycle Brice - 教育輪講
Wait-for graphs : example • Example • 4 transactions • Each reads an element and writes another T1 : l1(A); r1(A); l1(B); w1(B); u1(A); u1(B); T2 : l2(C); r2(C); l2(A); w2(A); u2(C); u2(A); T3 : l3(B); r3(B); l3(C); w3(C); u3(B); u3(C); T4 : l4(D); r4(D); l4(A); w4(A); u4(D); u4(A); Brice - 教育輪講
Wait-for graphs : example • Schedule with a deadlock T1 T2 T3 T4 1) l1(A); r1(A); 2) l2(C); r2(C); 3) l3(B); r3(B); 4) l4(D); r4(D); 5) l2(A); Denied 6) l3(C); Denied 7) l4(A); Denied 8) l1(B); Denied Brice - 教育輪講
Wait-for graphs : example • Wait-forgraph after step (7) 4 3 2 1 Brice - 教育輪講