310 likes | 322 Views
This paper discusses the concept of split snapshots and skippy indexing as a way to access and retain data from the past in a storage system. It explores the benefits of taking snapshots, the challenges of other approaches, and introduces the Split-COW and Skippy techniques for efficient snapshot management. The implementation details and non-disruptiveness of the proposed approach are also discussed.
E N D
Split Snapshots and Skippy Indexing:Long Live the Past! Ross Shaull <rshaull@cs.brandeis.edu> Liuba Shrira <liuba@cs.brandeis.edu> Brandeis University
Our Idea of a Snapshot • A window to the past in a storage system • Access data as it was at time snapshot was requested • System-wide • Snapshots may be kept forever • I.e., “long-lived” snapshots • Snapshots are consistent • Whatever that means… • High frequency (up to CDP)
Why Take Snapshots? • Fix operator errors • Auditing • When did Bob’s salary change, and who made the changes? • Analysis • How much capital was tied up in blue shirts at the beginning of this fiscal year? • We don’t necessarily know now what will be interesting in the future
BITE • Give the storage system a new capability: Back-in-Time Execution • Run read-only code against current state and any snapshot • After issuing request for BITE, no special code required for accessing data in the snapshot
Other Approaches: Databases • ImmortalDB, Time-Split BTree (Lomet) • Reorganizes current state • Complex • Snapshot isolation (PostgreSQL, Oracle) • Extension to transactions • Only for recent past • Oracle FlashBack • Page-level copy of recent past (not forever) • Interface seems similar to BITE
Other Approaches: FS • WAFL (Hitz), ext3cow (Peterson) • Limited on-disk locality • Application-level consistency a challenge • VSS (Sankaran) • Blocks disk requests • Suitable for backup-type frequency
A Different Approach • Goals: • Avoid declustering current state • Don’t change how current state is accessed • Application requests snapshot • Snapshots are “on-line” (not in warehouse) • Split Snapshots • Copy past out incrementally • Snapshots available through virtualized buffer manager
Our Storage System Model • A “database” • Has transactions • Has recovery log • Organizes data in pages on disk
Our Consistency Model • Crash consistency • Imagine that a snapshot is declared, but then before any modifications can be made, the system crashes • After restart, recovery kicks in and the current state is restored to *some* consistent point • All snapshots will have this same consistency guarantee after a crash
Our Storage System Model I want record R Snapshot Now AccessMethods Page Table Cache P1 Address X P2 Address Y … Find Table Find Root Search for R Return R P1 Application Disk P3 P1 … Pn … Database
Retaining the Past Versus
Copy-on-Write (COW) The old page table became the Snapshot page table Snapshot PageTable “S” PageTable P1 P2 P1 P1 P2 PageTable Operations: Snapshot “S” Modify P1 P1 P2
Expensive to update P2 in both page tables Split-COW PageTable P1 P2 P1 P1 P1 P2 P2 P1 P1 P2 SPT(S) SPT(S+1) P1 P1 P1 P2 P2
What’s next • How to manage the metadata? • How will snapshot pages be accessed? • Can we be non-disruptive? ?
Metadata Solution • Metadata (page tables) created incrementally • Keeping many SPTs costly • Instead, write “mappings” into log • Materialize SPT on-demand
Maplog • Mappings created incrementally • Added to append-only log • Start points to first mapping created after a snapshot is declared P1 P1 P2 P1 P2 P1 P1 P2 P1 P3 Maplog Start Snap 1 Snap 2 Snap 3 Snap 4 Snap 5 Snap 6
Maplog • Materialize SPT with scan • Scan for SPT(S) begins at Start(S) • Notice that we read some mappings that we do not need P1 P1 P2 P1 P2 P1 P1 P2 P1 P3 Maplog Start Snap 1 Snap 2 Snap 3 Snap 4 Snap 5 Snap 6
Cost of Scanning Maplog • Let overwrite cycle length L be the number of page updates required to overwrite entire database • Maplog scan cannot be longer than overwrite cycle • Let N be the number of pages in the database • For a uniformly random workload, L N ln N (by the “coupon collector’s waiting time” problem) • Skew in the update workload lengthens overwrite cycle • Skew of 80/20 (80% of updates to 20% of pages) increases L by a factor of 4 ! Skew hurts
Skippy P1 P2 P2 P1 P1 P3 Skippy Level 1 • Copy first-encountered mapping (FEM) within node to next level Copies P1 P1 P2 P1 P2 P1 P1 P2 P1 P3 Maplog Pointers Start Snap 1 Snap 2 Snap 3 Snap 4 Snap 5 Snap 6
Cut redundant mapping count in half Skippy P1 P2 P2 P1 P1 P3 Skippy Level 1 P1 P1 P2 P1 P2 P1 P1 P2 P1 P3 Maplog Start Snap 1 Snap 2 Snap 3 Snap 4 Snap 5 Snap 6
K-Level Skippy • Can eliminate effect of skew — or more • Enables ad-hoc, on-line access to snapshots, whether they are old or young
Accessing Snapshots • Transparent to layers above cache • Indirection layer to redirect page requests from a BITE transaction into the snapstore Cache Read Current State BITE P1 P1 P2 P2 P2 P1 P1 P2
Non-Disruptiveness • Can we create Skippy and COW pre-states without disrupting the current state? • Key idea: • Leverage recovery to defer all snapshot-related writes • Write snapshot data in background to secondary disk
Implementation • BDB 4.6.21 • Page cache augmented • COWs write-locked pages • Trickle COW’d pages out over time • Leverage recovery • Metadata created in-memory at transaction commit time, but only written at checkpoint time • After crash, snapshot pages and metadata can be recovered in one log pass • Costs • Snapshot log record • Extra memory • Longer checkpoints
Early Disruptiveness Results • Single-threaded updating workload of 100,000 transactions • 66M database • We can retain a snapshot after every transaction for a 6–8% penalty to writers • Tests with readers show little impact on sequential scans (not depicted)
Paper Trail • Upcoming poster and short paper at ICDE08 • “Skippy: a New Snapshot Indexing Method for Time Travel in the Storage Manager” to appear in SIGMOD08 • Poster and workshop talks • NEDBDay08, SYSTOR08
Recovery Sketch 1 • Snapshots are crash consistent • Must recover data and metadata for all snapshots since last checkpoint • Pages might have been trickled, so must truncate snapstore back to last mapping before previous checkpoint • We require only that a snapshot log record be forced into the log with a group commit, no other data/metadata must be logged until checkpoint.
Recovery Sketch 2 • Walk backward through WAL, applying UNDOs • When snapshot record is encountered, copy the “dirty” pages and create a mapping • Trouble is that snapshots can be concurrent with transactions • Cope with this by “COWing” a page when an UNDO for a different transaction is applied to that page
The Future • Sometimes we want to scrub the past • Running out of space? • Retention windows for SOX-compliance • Change past state representation • Deduplication • Compression