CS 540 Database Management Systems

CS 540 Database Management Systems Logging and Recovery: ARIES

The ACID Properties of Transactions • Atomicity: • Either all actions are done, or none • Consistency: • DB satisfies all the consistency constraints • Transactions are expected to preserve consistency • Isolation: • As if each transaction were executed alone • Durability: • Once a transaction is completed, its effect must persist Concurrency/Lock Manager: consistency & isolation Recovery/Log Manager: atomicity & durability

Motivation • How to ensure atomicity and durability? • transactions may abort (need to “rollback”) • what if DBMS stops running? • Desired status after system restarts: • T1, T2 & T3 should be durable. • T4 & T5 should be aborted (effects not seen). crash! T1 T2 T3 T4 T5

Assumptions • Concurrency control is in effect • strict 2PL, in particular. • request s/x locks before read/write • all the locks held until EOT (strict locking) • Updates are happening “in place” (no shadow pages) • Data is overwritten on (or deleted from) the disk • A simple scheme to guarantee atomicity & durability?

Handling the Buffer Pool No Steal Steal • Force writing to disk at commit? • poor response time • but provides durability • Steal buffer-pool frames from uncommitted transactions? • if not, inefficient use of the buffer • if so, how can we ensure atomicity? • Recovery scheme vs. B.M.: • undo-only: can steal? must force? • redo-only: no steal? no force? Force Trivial Desired No Force

Basic Idea: Logging • Record redo and undo information in log • sequential writes to log (put it on a separate disk). • minimal info (diff) written to log, so multiple updates fit in a single log page • log: ordered list of redo/undo actions • log record contains: <XID, pageID, offset, length, old data, new data> • and additional control info (which we’ll see soon)

Write-Ahead Logging (WAL) • Write-Ahead Logging Protocol: • must force the log record for an update before the corresponding data page gets to disk. • must force all log records for a xactbeforecommit. • #1 guarantees atomicity (undo) • #2 guarantees durability (redo) • Exactly how is logging (and recovery!) done? • we’ll study the ARIES algorithms

ARIES Main Principles • WAL • Repeating history during REDO • Logging changes during UNDO • Enables: • simplicity and flexibility • finer granularity locking (than a page) • updates to (different parts of) same page are streamed in redo/undo • redoing and undoing not necessarily exact physical inverse

pageLSN WAL & the Log • Each log record has unique Log Sequence Number • LSNs always increasing • Each data pagecontains a pageLSN • LSN of the most recent log recordof latest update • System keeps track of flushedLSN • the max LSN flushed so far • WAL: before writing a page, • pageLSN<= flushedLSN • pageLSN in flushed already Log records flushed to disk “Log tail” in RAM

prevLSN XID type pageID length offset before-image after-image Log Records Possible log record types: • Update • Commit • Abort • End • end of commit or abort • Compensation Log Records (CLRs) • for UNDO actions LogRecord fields: update records only

Other Log-Related State • Transaction table: • one entry per active Xact • contains XID, status (running/committed/aborted), and lastLSN • Dirty page table: • one entry per dirty page in buffer pool • contains recLSN -- the LSN of the log record which first caused the page to be dirty

Normal Execution of an Xact • Series of reads & writes, followed by commit or abort • Strict 2PL • STEAL, NO-FORCE buffer management, with write-ahead logging

Checkpointing • Periodical checkpoint: • minimize the (analysis) time to recover from system crash • Write to log: • begin_checkpointrecord: indicates when chkpt began • end_checkpointrecord: contains current xact table and dirty page table. `Fuzzy checkpoint’: • other xacts continue to run; these tables accurate only as of the time of the begin_checkpointrecord • no attempt to force dirty pages to disk • effectiveness limited by earliest recLSN in dirty page table • oldest unwritten change to a dirty page • so a good idea to periodically flush dirty pages to disk! • Store LSN of chkpt record in master record

prevLSN XID type pageID length offset before-image after-image Big Picture: What’s Stored Where LOG RAM DB LogRecords Xact Table lastLSN (last log) status Dirty Page Table recLSN (first log) flushedLSN Data pages each with a pageLSN master record

Simple Transaction Abort • For now, consider an explicit abort of a xact • e.g., validation error, deadlock; no crash involved • Play back the log in reverse order, UNDOing updates: • get lastLSN of xact from xact table • can follow chain of log records backward via the prevLSN field • before starting undo, write an abort log record. • for recovering from crash during undo

Abort, cont. • To perform UNDO, must have a lock on data • no problem (strict locking) • Before restoring old value of a page, write a CLR: • you continue logging while you undo • CLR has one extra field: undoNextLSN • points to the next LSN to undo (i.e. the prevLSN of the record we’re currently undoing) • CLRs never undone (but might be redone when repeating history after another crash) • At end of UNDO, write an “End” log record. • 120 CLR • undo 101 • undonextLSN=98 • (T1: lastLSN=120) T1 abort T1: lastLSN=101 101 98

Transaction Commit • Write commit record to log • All log records up to xact’s lastLSN are flushed. • guarantees thatflushedLSN>=lastLSN • Commit() returns (after synchronous IO) • Write End record to log

Crash Recovery: Big Picture Oldest log rec. of Xact active at crash • Start from a checkpoint (found via master record) • Three phases. Need to: • figure out which xacts committed since checkpoint, which failed (Analysis). • REDOall actions. • repeat history • UNDO effects of failed xacts. Smallest recLSN in dirty page table after Analysis Last chkpt CRASH A R U

Crash Recovery vs. Transaction Abort? • What are the differences?

Crash Recovery vs. Transaction Abort? • Abort: • (state in memory, then) undo one xact • Recovery • reconstruct state, then undo all uncommitted xact • reconstruction: analysis + redo • undo: must consider global ordering of undos

Recovery: Analysis Phase Goal: reconstruct two state tables: • xact-table: what xacts to abort (undo)? • dirty-page table: where to start redo? • (init) Restore state at checkpoint • via end_checkpointrecord • (delta after ckpt) Scan log forward from ckpt • End record: remove xact from xact table • Other records: • add Xact to Xact table, set lastLSN=LSN • change xact status if commit seen • Update record only: If P not in Dirty Page Table, • add P to DPT, set its recLSN=LSN

Recovery: REDO Phase • Repeat Histosryto reconstruct state at crash: • reapply all updates (even of aborted xacts!) and redo CLRs (CLRs are now simply dirty-data before last crash) • Scan forward from earliest recLSN in DPT Redo each CLR or update log rec LSN, unless: • affected page is not in the Dirty Page Table, or • affected page is in DPT, but has recLSN > LSN • why can this happen? page out and in after this LSN • pageLSN (in DB) >= LSN • why this is done last? (in fact, this also checks the above two) • To REDO an action: • reapply logged action (not only work for image-based!) • set pageLSN to LSN. No additional logging!

T1 Recovery: UNDO Phase T3 T2 1 2 3 ToUndo={ lastLSN of all “loser”xacts} Repeat: • choose largest LSN among ToUndo • if this LSN is a CLR and undoNextLSN==NULL • write an End record for this xact • If this LSN is a CLR, and undoNextLSN != NULL • add undoNextLSN to ToUndo • (what happens to other CLRs of this xact?) • only last CLR seen on this chain; others not on chain • Else this LSN is an update. Undo the update, write a CLR, add prevLSN to ToUndo. Until ToUndo is empty 4 5

RAM Example of Recovery LSN LOG 00 05 10 20 30 40 45 50 60 begin_checkpoint end_checkpoint update: T1 writes P5 update T2 writes P3 T1 abort CLR: Undo T1 LSN 10 T1 End update: T3 writes P1 update: T2 writes P5 CRASH, RESTART prevLSNs Xact Table lastLSN status DPT recLSN flushedLSN ToUndo

RAM Example: Crash During Restart! LSN LOG begin_checkpoint, end_checkpoint update: T1 writes P5 update T2 writes P3 T1 abort CLR: Undo T1 LSN 10, T1 End update: T3 writes P1 update: T2 writes P5 CRASH, RESTART CLR: Undo T2 LSN 60 CLR: Undo T3 LSN 50, T3 end CRASH, RESTART CLR: Undo T2 LSN 20, T2 end 00,05 10 20 30 40,45 50 60 70 80,85 90 undoNextLSN Xact Table lastLSN status DPT recLSN flushedLSN ToUndo

Summary of Logging/Recovery • Recovery Manager guarantees atomicity & durability. • Use WAL to allow STEAL/NO-FORCE w/o sacrificing correctness. • LSNs identify log records; linked into backwards chains per transaction (via prevLSN). • pageLSN allows comparison of data page and log records (so redo becomes simple)

Summary, Cont. • Checkpointing: A quick way to limit the amount of log to scan on recovery • Recovery works in 3 phases: • Analysis: Forward from checkpoint. • Redo: Forward from oldest recLSN. • Undo: Backward from end to first LSN of oldest Xact alive at crash. • Upon Undo, write CLRs. • Redo “repeats history”: simplifies the logic!

What You Should Know • Basic idea of ARIES • What are the three phases of recovery? • How does each phase work? • How do the data structures support the 3 phases?

Lessons • Opening a problem vs. “closing” a problem • Both are milestones • Look for long-standing important problems and try to “close” it (aiming at the best solution) • Simplicity counts! • Simple = uniform processing (no/few special rules) • Ensure the considered solution space to be complete • Don’t overlook unusual solutions

CS 540 Database Management Systems