Recovery Concepts

Recovery Concepts Chapter 18 (lightly)

Occasions for Recovery • Non-catastrophic: need Log only • Transaction abort • "Normal" system crash • Catastrophic: need Log + Backup • Physical loss of disks • Pervasive application error • System software error (corrupted filesystem, buffer management errors, etc.) • Virus, sabotage, etc.

ARIES • A particular recovery algorithm emphasized in the book • Actually it uses many standard techniques • Strategies and concepts more important that ARIES per se

(Review) The Log File • Needs to be on "stable storage" • Contains: • Transaction starts/stops • DB writes: "before" and "after" images • befores can be used to rollback an aborted transaction • afters can be used to redo a transaction without reexecuting it • COMMITs and ABORTs The log itself is as critical as the DB! Reliable backups are critical, too!

Strategies Which Anticipate Normal Recovery • Deferred update • Writes are not actually applied to DB until after T commits. • No UNDO is needed. • Implementation: buffers, shadow page table, etc. • Immediate update • Writes are actually applied as T executes • Aborted transactions: UNDO (rollback)

System Crash (without loss of DB) • Assume that ALL uncommitted T's have aborted • Procede as for transaction abort • Allow no new transactions until recovery is complete

Catastrophe • First: restore from a full backup • Then: Roll forward from log • REDO all committed transactions • Apply all logged WRITEs • Could actually REDO changes in reverse chrono order: i.e., only apply latest change • T's interrupted by the catastrophe must be restarted or user notified

Checkpoints • Checkpoint: "snapshot" of a system at a particular time • Might include: info about which T's are running; which pages are "dirty", etc. • Use: May be able to recover from the last checkpoint (rather than reprocessing entire log)

Disaster Recovery via Redundancy • A reliable duplicate copy could be used for "instant" recovery • copy could be "hot" (in use by applications) or only on standby • SW-based • managed by DBMS or OS • could be part of a distributed system • HW-based • RAID: Redundant Array of Inexpensive Disks

Recovery Concepts