90 likes | 97 Views
Understanding database recovery, WAL protocol, transaction failures, system failures, buffer policies, and checkpointing methods for ensuring data integrity and consistency in a DBMS. Learn about COMMIT, ROLLBACK, STEAL, NO-FORCE policies and their impact on data recovery.
E N D
12. Recovery REVIEW: COMMIT is the sucessful end-of-transaction operation. Changes to data items are not made permanent until the COMMIT issued by the TM is acknowledged by the DM. .. | Transaction Manager(s) | .. |SCHEDULER | .. |DATA MANAGER | .. |DATABASE ON DISK | Following that ack, the DBMS must guarantee that the updates will never be lost, no matter what happens! (DB must be recoverable). Techniques used by DBMS to guarantee recoverability to a recent COMMITTED DB STATE (all data items show the value written by a committed transaction and the resulting state is consistent with the integrity constraints) ROLLBACK (ABORT) is the unsuccessful end-of-transaction operation). All changes are undone using the LOG (or JOURNAL). - The on-line LOG holds all updates as they are made. - when on-line log fills up, written to off-line log (usually on tape) - LOGs can grow to be as large as the database itself. Write-Ahead Logging (WAL) Protocol: requires that a log record is physically written with the "last committed value" on it, before that item is changed (overwritten). WAL protocol facilitate "UNDO by re-installing before-values" of all changes (removes effects of abortd transaction).
Transaction Failure TYPES OF FAILURES: Transaction local (ABENDS, NSF check) System failures (DBMS itself fails) Media failure (disk crash) TRANSACTION FAILURE (transactions themselves are responsible for action) e.g., Abnormal program ends (ABENDS), Non-Sufficient Funds (NSF) Transaction code can can trap these and specify remedy (e.g., ROLLBACK). However, in order to facilitate proper transaction actions, system must hold all output messages until COMMIT. Otherwise, this can happen: A T M ///// | | !@#$%! | O ` | o o o | ....ROLLBACK! | > | o o o | BANKER | `-| | o o o | | o o | `----' |_______ | | _ | | | / $ / | | ' ` | _______ |--------------/___/ | `-----' | | | | | |---- | NSF | | | | | |_______ | ^ | | |_____ ____ | | | | ----- | | | | | | | | | L L |_______ | | L | At an ATM cash machine, if the "message" (the cash) is given to the customer before commit, it is probably going to be impossible to ROLLBACK the transaction.
System Failure SYSTEM FAILURE: DBMS itself fails, and the memory contents are lost (including buffers), but the data on disk is undamaged (The Data Manager is allowed to do its job any way it wants to (to optimize its activity). That's the reason for the component separation in the first place (instead of monlithic system). So the DM can be implemented so that the Disk(s) may contain some "uncommitted values" and/or it may not contain all committed values. The Disk(s) may contain uncommitted values if a STEAL policy is used. STEALpolicy: The Buffer Manager can replace a page which still has uncommitted values (write a page to disk that contain uncommitted values) (actually "stealing" a page from 1 trans and give it to another) (Necessary for very long running trans e.g., a payroll processing) The Disk(s) may not contain all committed values if a NO-FORCE policy is used. NO-FORCE policy: Buffer Mgr may not write a page with newly committed values until later. (e.g., In a Banking system, may not be able to afford to force every write immediately) BUFFER POLICIES: |FORCE | STEAL | YES | YES | YES | NO | NO | YES < - the hardest to implement but the best! | NO | NO Although there are system that use either a NO-STEAL or a FORCE policy (or both), we discuss only STEAL, NO-FORCE (STEAL NO-FORCE requires the most demanding recovery system).
Steal, No-force buffer policy In a STEAL NO-FORCE system: All transactions active at fail-time (BEGUN, not ENDed) must be UNDONE. (because some of the changes it made may have been written under the STEAL policy). All transactions committed at fail-time must be idempotently REDONE (because the committed changes it made may have not been written under the NO-FORCE policy). One way is to UNDO all active transactions and then idempotently REDO all committed transactions. Do we have to go all the way back to IPL (Initial Program Load) and REDO all committed transactions? Can that be avoided? YES! Through checkpointing! The System periodically takes a CHECKPOINT. There are many, many checkpointing methods, the next slide shows a "Standard" CHECKPOINT:
2. forcewrites a "checkpoint" record to the log. A CHECKPOINT record must have an "active list" containing all currently active transactions. Steal, No-force checkpoint There are many, many checkpointing methods, this slide shows a "Standard" CHECKPOINT: It is usually done at a quiescent point in time (no activity going on), but not necessarily (i.e., there are "on-the-fly" checkpointing methods, but they are complex). 1. forcewrites all buffers to disk immediately (flushes buffers). 2. forcewrites a "checkpoint" record to the log. A CHECKPOINT record must have an "active list" containing all currently active transactions. Trans | "ca-chunk" | .- | "change record" | : | "ca-chunk" | log : | "COMMIT *-1st flush |-. record: | . | : .- - | then"check-point-rec" | : //// : : | |/ /| (- -)- : \|. | / O `-' / 2 | log | |database| / `._ _|_/ :.<- | buff | | buffer|/ | :* |______|__|_______ | | @@@ :: / ) ^ @o>:: ( / | | @\/ :: `----'| | | |--:: \___/ L L /() :: | |:: V /^\:: | | disk copy| L L:`>tr-log | | database | `>_chpt-rec | |__________| : //// (0 0)- `-'/ _/|_/ - - -' | | ^ | | | | L L
Steal, No-force checkpoint With standard SNF Checkpointing (described above), of the following which must be undone and which must be redone? Active where |------> | ^ ^ BEGIN COMMIT CHECKPOINT CRASH T1 |------->| T2 |---------------------------->| T3 |-----------------------------------------> T4 |--->| T5 |----------> T6 |-------------------------->| After the crash, the RECOVERY PROCESS would: 1. Start at most recent Checkpoint record in LOG containing ACTIVE-list={T2,T3,T6} UNDO-list = ACTIVE-list e.g., UNDO={T2,T3,T6} REDO-list = empty. 2. Scan forward in the LOG from CHECKPOINT record. For each BEGIN encountered, put trans in UNDO-list (UNDO={T4, T5} For each COMMIT encountered, move trans from UNDO to REDO. (e.g., move T4,T2). 3. When LOG is exhausted, Idempotently REDO REDO-list in commit in order. (e.g., {T6, T4, T2} ) UNDO all trans in UNDO-list (e.g., {T3, T5} ) Note: Since transactions are redone in commit-order = REDO-order, it must be the case that the Serial Order to which execution is equivalent is COMMIT order. That is, if another serial order is the order to which the serializability is equivalent, the REDO must be done in that order.
Steal, No-force checkpoint Note: Since transactions are redone in commit-order = REDO-order, it must be the case that the Serial Order to which execution is equivalent is COMMIT order. That is, if another serial order is the order to which the serializability is equivalent, the REDO must be done in that order. In T2 and T4 above, messages may have gone back to the users which were based on and execution order equivalent to SOME serial order (values reported to users were generated by the execution in that order). Thus, RECOVERY must regenerate in the same order. The only way that the RECOVERY process can know what serial order the original execution was equivalent to is that the initial execution be equivalent to some serial order identifiable from the LOG. One order identifiable from the LOG is COMMIT order. Therefore, it is common to demand that the order of execution be equivalent to the serial COMMIT-order. (S2PL does that. Is that why it is so popular?)
Media Failure MEDIA FAILURE (from disk crash) RECOVERY ARCHIVE: periodically dump database (i.e., make an ARCHIVE copy to off-line tape?): 1. Shut down the DBMS (e.g., late at night or during "quiescent" period) 2. Copy the entire database to off-line storage (tape) 3. Bring up the DBMS again 4. Erase the LOG and restart logging ___ | | . . | disk copy | | tape |< - - - - - - - | of | ___ . ___ . | database | |________________ | Following a media failure (disk crash), 1. RESTORE DB from archive, ___ | | . . | disk copy | | tape |- - - - - - - > | of | ___ . ___ . | database | |________________ | 2. REDO transaction-log from archive-time to as near to crash-time as possible (using both the off-line and the on-line log (the on-line is kept on separate disk from the database itself for durability)). This is called ROLL-FORWARD: ___________COMPUTER________ LOG |- - - ->| "ca-chunk" | ____| | "ca-chunk | | "redo transaction" | |--------------. .-------------- | | log | | database | | buffer | | buffer | |_________ |_________ |_________ |
Media Failure MEDIA FAILURE (from disk crash) RECOVERY There are many other methods. DUPLEXING = make two copies of every data item on separate disks (at least separate failure modes). The amount of extra disk space used can be reduced by methods such as Huffman coding to as low as 5% extra disk space, however, in this, the Age Of Infinite Storage is it worth doing? Huffman coding is used in some in RAID systems. (Redundant Arrays of Independent Disks) APPENDIX Storage past, present and future: In 1956, IBM developed RAMAC, a refrig sized disk system with 50 2-ft diam platters. RAMAC had a capacity of 5 megabytes. Since then: 1. The amount of data stored on given area has increased 1,000,000-fold. 2. The transfer speed has increased 3,000-fold. 3. The cost per bit has decreased 500,000-fold (comparable $s). This is due to breakthroughs in 1. "areal density" (# bits/squarech in). 2. revolution speeds. 3. read-write head technologies. How much more higher can disk capacity go? So far predictions of "upper limits" have been made by engineers and they have always been wrong (way wrong). However, we are approaching a limit determined by fundamental physics, not engineering ingenuity. There comes a point beyond which random jiggle of electron spins due to temperature is likely to cause the directions of bit's magnetization to spontaneously reverse within the expected livetime of the disk. This is called the SUPERPARAMAGNETIC LIMIT and it may limit the progress that can be achieved through minaturizing or the "scaling down" of existing technologies. Where is the superparamagnetic limit? Most agree it will be encountered at densities ~120 Gbits/square_inch. At 6.5 sq_in per 3.5 inch surface, that gives ~ 800 Gigabits/surface. or ~ 100 GigaBytes/surface times 50 surfaces, we can conclued that a 3.5 inch hard-drives may go to 5000 GigaBytes/disk= 5 TeraBytes/disk Note that COMMODITY drives today have reached 500 GigaBytes/drive: so another 10=fold increas and we're there with commodity drives!!! Indexing and providing reference paths and access paths to data stores of this size is nearly impossible!