chapters 21: recovery

1. Chapters 21: Recovery A portion of these slides are being used with the permission of Dr. Ling Lui, Associate Professor, College of Computing, Georgia Tech. Remaining slides represent new material.

3. What is a Transaction? A Transaction is A Logic Unit of Database Processing Represents the Collection of Actions that Make Consistent Transformations of System States while Preserving System Consistency

4. Two Sample Transactions Transaction T1 Reads/Writes X/Y, Modifying X by Subtracting N and Y by Adding N Transaction T2 Reads X and Modifies X by Adding M Their Interleaved Execution can Yield Dramatically Different Results! What are the Possibilities?

5. ACID Transaction Model Database Consists of Set of Data Items Read(x) Gets Last Stored Value in X Write(x) Stores a New Value Into X Atomicity: A Set of R/W Operations that Either Completes Entirely or Not at All Consistency: R/W Operations take the Database from a One Consistent State to Another Consistent State Isolation: No Intermediate Values Produced by the R/W Operations will be Visible to Other Transactions Durability: Once the Transaction is Completed, and All the Updates Are Committed, then these Changes Must Never be Lost because of Subsequent Failure

6. Modeling Transactions System State All Aspects which Encompass a Snapshot of the DB Including Assertions, Constraints, Meta-Data, etc., that can be used to Maintain and Verify DB Transactions must Preserve System State by Insuring DB Consistency Failures Require Corrective Actions via Undo - Correct to a Prior Consistent State Redo - Rerun Aborted/Incomplete Transactions DB Actions in a Transaction Categorized by: Unprotected - Abort Does not Require Undo/Redo Protected - Abort May Require Undo/Redo Real - Once Done - Transaction Cannot be Undone

7. Complications in Transaction Execution What are Different Types of Transactions? Simple Linear Sequence of Actions Inter-Transaction Concurrency What we�ve seen so for in TP/CC Nested Transactions Transactions within Transactions (e.g. Nested Select) Requires Intra-Transaction Concurrency Control Long-Term Transactions Require Hours or Days for Execution Can�t Just Make DB Unavailable for Long Periods Both OCC and PCC are Often Infeasible!

8. Examples of Nested/Long-Lived Nested Travel Agent Example - Reserving a Trip Airline Reservation Hotel Reservation Car Rental A Transaction of Transactions Long-Term Transactions CAD/CAM Consider a Jet Engine Some Analysis Techniques (Structure, Fluids, etc.) may Require Hours and Update DB DB Can�t be Available for Long-Term

9. Handling Nested/Long-Term Transactions Time-Domain Addressing Maintain Time History of DB Objects All Versions/Values of Objects are Stored Never Delete - Record All Values (States) For Example Consider Data Item V with Initial Value v Value is �Reset� By Transactions at Different Times - let t Represent time of a Transaction Let T1 T2 T3 be Transaction Execution Order Suppose T1 Accesses v - Record as < v0, t1 > Suppose T2 Accesses v - Record as < v1, t2 > Suppose T3 Accesses v - Record as < v2, t3 > What Happens if t2 > t3 and both are Modifies?

10. Transaction Execution Who Participates in Transaction Execution?

11. Two Phase Commit Policy All Actions for a Transaction are Performed in a Workspace (in Memory) Rather than Directly on the DB Copy of the Data These Actions are Written in Log/Journal (Including the Commit Action) Leads to Two-Phase Commit Policy Transaction Cannot Write to DB Until Committed Transaction Cannot Commit Until All Changes have been Recorded First in the Log/Journal Two Phases are: Phase 1: Write Data in Log/Journal Phase 2: Write Data in DB Failure Can Occur Anytime!

12. Why is Two Phase Commit Important? Suppose DB Writes Occur Before Commit Assume a Transaction Aborts in the Middle of Processing Undo DB Changes Made to Actual Database Prior to Failure Relatively Straightforward and Manageable Undo Actions of Other Transactions that Read Information Written by Aborted Transaction Impossible! Undo May Require you to Propagate to Many Other Transactions, Particularly if Aborted Transaction was Long-Duration (hours) Basic Concepts of Recovery are Used to Non-Locking Optimistic CC Approach!

13. Recovery and Two-Phase Commit Two-Phase Commit Still Requires Recovery Failure Before Any Commit to Log/Journal Must Redo Transaction T and Undo Effects of Any Dependent Transactions that Read Results of T Failure After Partial Commit to Log/Journal Must Undo/Redo Transaction T and Undo Effects of Any Dependent Transactions that Read Results of T Failure After Total Commit to Log/Journal Failure Before Any Writes to DB (Permanent Writes) Use Log to Write to DB Failure After Partial Writes to DB (Permanent Writes) Use Log to Write to DB - Consider Dependent Transactions Failure After All Writes to DB - Not a Problem

14. Why is Recovery Needed? Impossible to Build a Perfect System There will be Failures of Various Types Ability to Recover and Restart Unreliability Can Occur Not Really Failures, but Unexpected Behavior Inconsistent Results at Different Times Unavailability Often Happens Can�t Run Transactions When Desired How is Recovery Achieved? Redundancy for Fault-Tolerance Mirroring/Shadowing for Data (Disks are Cheap) Ability to UNDO/REDO to obtain �Correct� and �Consistent� DB State

15. Why is Recovery Needed? Transactions are Liable to Fail for Many Reasons Hardware or Software Failure Deadlock Occurs Transaction Error (e.g., Divide by Zero) after Partial Execution In Either Case We May Need to Abort a Completed Transaction Due to Error in Another Transaction We Must Recover the DB to �Correct� State What do OS�s Do? Weekly Backups of File System Incremental Backups (To another Disk) Raid Arrays System and Editor Log Files

16. Database Recovery Approaches Evolved from OS Techniques Backup Copies of Database Tape Copies (early days) and CD Copies Online (Mirror or FTP) Off Site Storage of DB (Daily/Weekly) Maintenance of Journal or Log File Containing All Changes to DB Since Last �Backup� Each Journal Entry Contains Transaction ID Old/New Values of Data Item(s) Beginning/Ending Point of Transaction When Failure Occurs Redo Aborted Transactions/Rollback Completed Transactions/Undo Partially Executed Transaction

17. Recovery Objective Maintaining DB State - Three Possibilities Correct State - Contains Most Recent Copies of Data put Into DB by Users and Contains no Data Deleted by Users Valid State - Contains Information that is Part of a Correct State Consistent State - Valid State Plus DB Information Must Satisfy the User�s Consistency Criteria What are Analogies for these Three States?

18. DB State Concepts Consider Source Files (.c) and Object Files (.o) oldtest.c/oldtest.o;test.c/test.o; newtest.c/newtest.o; What are Different States in this Case Correct State Most Recent Source and Object File (newtest.*) Valid State A Source and Object File But Not Necessarily the Source that Corresponds to the Object oldtest.c and newtest.o Consistent State A Source and its Corresponding Object File but Not Necessarily the Most Recent One test.c and test.o

19. Kinds of DB Recovery Recovery to a Correct State Recovery to a Correct State that May have Existed in the Past Recovery to a Possible Previous State (Many not have Existed) Recovery to a Valid State (May be Undesirable) Recovery to a Consistent State (Old - Backup Version) Crash Resistance Keep DB in a State Such that If Failure Occurs, System will Always be in Correct State This is Almost Impossible in Practice!

20. Types of Failures in DBMS Transaction Failures Transaction Aborts (Unilaterally/Due to Deadlock) Avg. 3% of Transactions Abort Abnormally System (Site) Failures Processor, Main Memory, Power Supply, ... Main Memory Contents Are Lost, but Secondary Storage Contents Are Safe Partial Vs. Total Failure Media Failures Secondary Storage Devices (Stored Data Is Lost) Head Crash/controller Failure (?) Communication Failures Lost/Undeliverable Messages Network Partitioning Transaction failures Transaction aborts (unilaterally or due to deadlock) A transaction may abort within the transaction logic. for example, a transaction tries to transfer $1000 from account 1 to account 2. It will fail and abort if account 1 doesn�t have $1000. Transactions may abort due to failures of the runtime system or due to deadlock Avg. 3% of transactions abort abnormally System (site) failures Failure of processor, main memory, power supply, � Main memory contents are lost, but secondary storage contents are safe Partial vs. total failure Media failures Failure of secondary storage devices such that the stored data is lost Head crash/controller failure (?) Disk reading/writinng head.It is very difficult to recover. Controller is a little computer that translates the disk seek command to the disk physical moves. For these types of failures one needs to restore backup from tapes. These types of failure is very difficult to recover. Communication failures Lost/undeliverable messages Network partitioning When a node or a set of nodes were disconnected from the rest of the network due to various reasons such as the fiber got cut or your ISP service provider went down due to software upgrade etc. or the network router crashed Transaction failures Transaction aborts (unilaterally or due to deadlock) A transaction may abort within the transaction logic. for example, a transaction tries to transfer $1000 from account 1 to account 2. It will fail and abort if account 1 doesn�t have $1000. Transactions may abort due to failures of the runtime system or due to deadlock Avg. 3% of transactions abort abnormally System (site) failures Failure of processor, main memory, power supply, � Main memory contents are lost, but secondary storage contents are safe Partial vs. total failure Media failures Failure of secondary storage devices such that the stored data is lost Head crash/controller failure (?) Disk reading/writinng head.It is very difficult to recover. Controller is a little computer that translates the disk seek command to the disk physical moves. For these types of failures one needs to restore backup from tapes. These types of failure is very difficult to recover. Communication failures Lost/undeliverable messages Network partitioning When a node or a set of nodes were disconnected from the rest of the network due to various reasons such as the fiber got cut or your ISP service provider went down due to software upgrade etc. or the network router crashed

21. Jim Gray, Why Do Computers Stop and What can be Done About It?, Tandem Technical Report 85.7, 1985. Failures �Tandem Data (1985)

22. Recovery Concepts Recovery from Failures: Transaction Abort: Rollbacks - To Earlier DB State Machine Crash: Database in Temporary State Undo Aborted Transactions for Permanent DB Changes Redo Committed Transactions Written in Log but Not in DB Dependent on Aborted Transactions Media Failure: Need a Backup Copy Strategies of Keeping the Backup Copy

23. Recovery Management Architecture Volatile Storage Main Memory of the Computer System (RAM) Stable Storage Resilient to Failures and Loses its Contents Only in Media Failures (e.g., Head Crashes on Disks) Implemented via a Combination of Hardware (Non-volatile Storage) and Software (Stable-write, Stable-read, Clean-up) Components

24. Reading Uncommitted Data May Increase Concurrency May Cause Cascaded Aborts 2PL Example: Problem Due of Transaction Durability Property To Avoid Cascaded Aborts: Read Only Committed Data Read(X) in T2 is on �Old� Value of X (Before T1) Cascaded Aborts Update transaction1 acquires write lock on X Processes and releases the write lock on X. Another transaction T2 acquires read lock and reads X. T1 now aborts. T2 must be aborted because of aborted value in X. Update transaction1 acquires write lock on X Processes and releases the write lock on X. Another transaction T2 acquires read lock and reads X. T1 now aborts. T2 must be aborted because of aborted value in X.

25. Atomic Commit Execution Algorithm: All the Writes are Stored in Private Workspaces At Commit Time, All of the Writes are Performed on Database Atomically (Write All or Nothing) If a Transaction Aborts (or Machine/system Crashes Before Commit): Throw Away the Workspaces If the Machine Crashes in the Middle of Writing Use Only Idempotent Writes (No Incr/decr) Re-executing Transaction is Equivalent to Executing it Once When Crash, Start Over With All Idempotent Writes Expensive Commit

26. Database Log Every Action of a Transaction Must Not Only Perform the Action, but Must Also Write a Log Record to an Append-only File A Log File Maintained by the DBMS System and Residing on Stable Storage When a Change is Made to the Database, a Record Containing Values of the Updated Item is Written to the Log File Log-Based Techniques

27. Log Information The Log Contains Information Used by the Recovery Process to Restore the Consistency of a System The Type of Log Records Include Start: Transaction T Has Started Read: Transaction T Has Read Data Item X Write: Transaction T Has Changed a Value Track the Old Value (Before Image) of X Track the New Value (After Image) of X Abort Commit

28. Transactions and the Log Consider the Transactions Below The Execution Results in Write Actions to Log Objective: Recover Using Log Which has �State� of DB w.r.t. Concurrently Executing Transactions

29. REDO Protocol REDO'ing an Action Means Performing it Again The REDO Operation Uses the Log Information and Performs the Action that Might have Been Done Before, or Not Done Due to Failures The REDO Operation Generates the New Image

30. UNDO Protocol UNDO 'ing an Action Means to Restore the Object to its Before Image The UNDO Operation uses the Log Information and Restores the Old Value of the Object

31. Consider Following Schedule/Log What Occurs When the System Crashes? * T3 is rolled back since it did not reach its commit point ** T2 is rolled back since it reads the value of item B written by T3

32. View Schedule Graphically Like CC Algorithms, Recovery Algorithms Track Who is Writing What Data Item Who is Reading Written Data Items Is Write to Permanent DB? Is Read from Committed or Uncommitted Data?

33. Why Logging? Upon Recovery: All of T1 's Effects should be Reflected in the Database (REDO If Necessary Due to a Failure) None of T2 's Effects should be Reflected in the Database (UNDO If Necessary)

34. What Does System Log Contain? Conceptually, Two Logs Undo Log Contains Transaction Actions Needed to Undo the Effects of a Transaction if there is a Failure Attempt to Bring the DB Back to a Prior Correct State Worst Case - Valid or Consistent State Redo Log Contains Transaction Actions Needed to Redo the Effects of a Transaction that Did Not Complete Attempt to Roll the DB Forward to a New Correct State Avoid Valid or Consistent State

35. When to Write Log Records to Stable Store Assume a Transaction T Updates Page p Fortunate Case System Writes p in Stable Database System Updates Stable Log for this Update SYSTEM FAILURE OCCURS (B/4 T Commits) We Can Recover (Undo) by Restoring P to its Old State by Using the Log Unfortunate Case System Writes P in Stable Database SYSTEM FAILURE OCCURS (B/4 Stable Log is Updated) We Cannot Recover From this Failure Because there is No Log Record to Restore the Old Value Solution: Write-ahead Log (WAL) Protocol

36. Write�Ahead Log Protocol WAL Protocol : 1. Before a Stable Database is Updated, the Undo Portion of the Log should be Written to the Stable Log (Force-write) 2. When a Transaction Commits, the Redo Portion of the Log Must Be Written to Stable Log Prior to the Updating of the Stable Database Notice: If a System Crashes B/4 Transaction Completely Committed, then All Operations Must Be Undone Need the Before Images - BFIM - (Undo Portion of Log) Once a Transaction is Committed, Some of its Actions Might have to Be Redone Need After Images- AFIM - (Redo Portion of Log)

37. Logging Interface Possible Execution Strategies: Undo/No-Redo (Immediate Update) No-Undo/Redo (Deferred Update) Undo/Redo No-undo/No-Redo

38. Undo/No-Redo Incremental Log with Immediate Updates (Undo-only) Execution Algorithm: All the Writes are Performed �Directly� on the Stable DB Abort Buffer Manager May have Written Some of the Updated Pages into Stable Database LRM Performs Transaction Undo (Partial Undo) If Transaction T Aborts (or Machine Crashes), Recovery Procedure Undo (T) Must Undo its Effects, e.g., by Consulting with the Log File Commit LRM Issues a Flush Command to the Buffer Manager for All Updated Pages LRM Writes an "End_of_transaction" into Log Recover No Need to Perform Redo Perform Global Undo

39. No-Undo/Redo Incremental Log With Deferred Updates (Redo-only) Execution Algorithm: All Writes are Performed in Private Workspaces (Log Files) Abort None of Updated Pages have Been Written into Stable DB Throw Away the Workspaces, i.e., Release the Fixed Pages Commit LRM Writes an "End_of_transaction" Record into the Log LRM Sends an Unfix Command to the Buffer Manager for All Pages that were Previously Fixed Recover Perform Partial Redo If a Commit is Interrupted by Crash, Must Gradually Redo the Unwritten by Committed Transaction Operations No Need to Perform Global Undo

40. Undo/Redo Execution Algorithm: All the Writes are Gradually Written to the Database, Before or After the Commit Time Abort Buffer Manager May have Written Some of the Updated Pages Into Stable Database LRM Performs Transaction Undo (Partial Undo) to �Cover Up� Inconsistency by Undoing its Effects Commit LRM Writes an "End_of_transaction" Record into the Log Recover For Transactions with a "Begin_transaction" and an "End_of_transaction" Record in the Log, a Partial Redo is Initiated by LRM For Transactions with Only a "Begin_transaction" in the Log, a Global Undo is Executed by LRM

41. No-Undo/No-Redo Abort None of the Updated Pages have been Written Into Stable Database Release the Fixed Pages Commit (the Following Have to Be Done Atomically) LRM Issues a Flush Command to the Buffer Manager for All Updated Pages LRM Sends an Unfix Command to the Buffer Manager for All Pages that were Previously Fixed LRM Writes an "End_of_transaction" Record into the Log Recover No Need to Do Anything

42. Log-Based Recover Summary Types of Failures: Transaction aborts Machine Crashes Data Movement Policies: Deferred Updates Immediate Updates No Deferred or Immediate Updates Recovery Actions:

43. Log-based Recover Strategies Undo-Only Recovery (Immediate Updates) Immediate Updates on the Database Do Not Leave Blocks on Disk After Commit Force All Blocks at Commit Time Redo-only Recovery (Deferred Updates) Do Not Write Blocks to Disk Before Commit Deferred Update Redo and Undo Recovery: Write to the Database Before or After Commit If Abort, Undo If Crash, Redo/Undo

44. Deferred Update Example - Single-User The [write_item,...] operations of T1 are redone T2 log entries are ignored by the recovery process

45. Deferred Update Example - Multi-User T2 and T3 are ignored because they did not reach their commit points T4 is redone because its commit point is after the last system checkpoint

46. Checkpoints Checkpoints Tell the Recovery Scheme what Changes have Actually Been Made to the Database In Addition to the Log File, the System Periodically Performs Check Points Transaction Save Points Internally Consistent Synchronization Points Long Transactions May Return to them Instead of the Begin-transaction DBMS Check Points Transaction-Consistent Log Record Force All Committed Pages to Disk Flush All Log Records and a Check Point Record DB Recovery May Start From the Last Check Point Instead of the Beginning of the Transaction

47. Augmenting Log with CheckPoints Consider the Prior Schedule Note the Addition of [checkpoint] Entries This is a Save Point that Can be at Logical States (After Commits) Regularly (After Every X Log Entries) Reduce Recovery Effort to go Back to Last Checkpoint

48. Recovery Technique: Shadow Paging The Database is Partitioned into Fixed-sized Pages (Blocks). The System Maintains Two Tables for Each Transaction: A Current Page Table A Shadow Page Table When the Transaction Starts, the Current and Shadow Page Tables are Identical Current Page Table Kept in Main Memory If it is not too Large Each Update Creates a New Page from the Free-page-list Modify the Current Page Table to Record the Modifications to the Database by Using Pointers to Point to the New Pages That Hold the Modified Data Values Shadow Page Table Saved on Nonvolatile Storage (e.g., Disks) Used to Keep the Pointers to the Old Pages Before the Updates

49. Shadow Paging

50. Shadow Paging Correctness: Case(1): The Transaction has Not Committed When the System Crashes, and the Back up is Needed, Copy the Shadow Page Table into Main Memory Write Case (1) Guarantees that the State is Recovered to the One Before the Execution of the Transaction Case(2): The Transaction has Committed Starting Address of the Current Page Table Replaces the Starting Address of the Last Shadow Page Table All Changes are Reflected in the Current Page Table

51. Shadow Paging Advantages: No Overhead of Log File Recovery From Failure is Faster Disadvantages: Pages that are Desirable to Be Physically Close by May Scatter All Over the Disk Each Time a Transaction Commits Pages Containing the Old Version of Changed Data Becomes Free but Unavailable Garbage Collection is Fired up Periodically so that �Third� and Higher Versions of Same Page Freed Difficult to Allow Concurrent Execution of Transactions

52. Backups for Recovery Stop the TP and Make a Copy Easy to Implement and Correct Introduces Periods of Unavailability With Two Versions You Can Always Make a Backup (May Be Old) Fuzzy Dump Read the Database Incrementally Example: Read All Accounts, Money Transfer Incremental Reading of Entire Databases Off Site Storage - Backup Tapes/Disks

53. Media Failure Recovery Disks are Cheap Today - There is No Reason Not to have Multiple Hard Drives for DB Copies Concept of �Mirrored� FTP and other Sites

54. Concluding Remarks Intent of Chapter Review and Re-enforce Transaction Processing Concepts and Their Relationship to Recovery Introduction to Concepts of Undo, Redo, and Cascading (for Dependent Transactions) Discussion of Correct vs. Valid vs. Consistent Database States What can your Application Live With? What about your Semester Project? Alternative Recovery Techniques Log-Based Checkpoint Shadow-Paging What about Combinations?

chapters 21: recovery

chapters 21: recovery

Presentation Transcript

Chapters 20-21

Chapters 21-29

Chapters 21 and 22:

Chapters 19-21

Chapters 21 Vocabulary

Chapters 21 & 22

Survey of John Chapters 11-21

Chapters 21-24

The Awakening Chapters 20 & 21

Chapters 21 & 22

Chapters 21 & 22

Chapters 19, 21, & 23 Review

Primer A Chapters 21 to 25

Unit 8—Chapters 20 – 21

Chapters 21-24

Chapters 21, 22, 23, 37

Chapters 20 & 21

Unit 8—Chapters 20 – 21

Chapters 47 & 21

CHAPTERS 21 AND 22

STAFFING- CHAPTERS 19, 20, & 21

Chapters 21 and 22

chapters 21: recovery