280 likes | 293 Views
Journal-guided Resynchronization for Software RAID. Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison. RAID Consistent Update Problem. RAID task is to maintain consistency Challenging in the face of crashes
E N D
Journal-guided Resynchronizationfor Software RAID Timothy E. Denehy, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison
RAID Consistent Update Problem • RAID task is to maintain consistency • Challenging in the face of crashes • Updates must be applied to more than one disk • Inconsistency means window of vulnerability • Disk failure may lead to data loss P P P P P P P P P P P P P P P P
High-end RAID Solution • Consistent update with non-volatile memory • Logs writes in NVRAM until they reach disk • Performance – logging to NVRAM is fast • Reliability – data is safe in NVRAM • Availability – recovery is fast • But, enterprise systems are expensive
Software RAID Solutions • Consistent update is challenging • Performance versus reliability trade-off • Performance: resynchronization after crash • Scan entire volume to fix inconsistencies • Extremely slow, hours for 100s of GBs to days for TBs • Reliability: lengthens window of vulnerability • Availability: consumes array bandwidth • Reliability: log intentions to a bitmap • Performance: extra writes to maintain bitmap
Cooperative Software RAID Solution • Journaling file systems perform logging • Maintain file system data structure consistency • ext3, ReiserFS, JFS, NTFS • Journal-guided resynchronization • New ext3 mode: declared mode • New software RAID interface: verify read • Achieves performance, reliability, availability
Journal-guided Resync Overview • Crash: What writes were outstanding? • Narrow the range of possible inconsistencies • Obtain information from journal (declared mode) • Restart: journal-guided resynchronization • Use journal to identify outstanding writes • Communicate locations to RAID (verify read) • Check redundancy and repair inconsistencies • Greatly reduce the time for resynchronization
Outline • Problem • ext3 Background and Analysis • ext3 Declared Mode and RAID Verify Read • Journal-guided Resynchronization • Evaluation • Conclusion
ext3 Modes • Data-journaling mode • All data and metadata is written to the journal • Ordered mode (default) • Only metadata is written to the journal • Strict ordering between data and metadata • Writeback mode • Only metadata is written to the journal • No ordering between data and metadata
ext3 Transactions • Updates are grouped into transactions • Transaction states • Running – collect updates in memory • Commit – write updates to journal • Checkpoint – write updates to home locations
ext3 Journal Structures • Journal superblock • Head and tail pointers into journal file • Transaction sequence number • Descriptor block • List of home locations for upcoming blocks • Commit block • Marks the end of a transaction
Data-journaling Write Analysis Running Checkpointing Committing Commit: write desc, meta, data to journal, wait (bounded) write commit to journal, wait (bounded) Checkpoint: write journaled blocks to home, wait (known) update superblock (known) Running: collect file system updates in memory META DATA DATA DATA DATA DESC 11 META DATA DATA DATA DATA COMM 11 Journal Super P P P P P P P P P P P P P P P P
Data-journaling Summary • Provides a record of all outstanding writes • Suitable for journal-guided resynchronization • Offers poor performance
Ordered Write Analysis Running Committing Commit: write data to home, wait (unknown) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded) Running: collect file system updates in memory pdflush may write data to home (unknown) META DATA DATA DATA DATA DESC 11 META DATA COMM 11 Journal Super P P P P P P P P P P P P P P P P
Ordered Summary • Does not provide outstanding write record • Unsuitable for journal-guided resynchronization
Outline • Problem • ext3 Background and Analysis • ext3 Declared Mode and RAID Verify Read • Journal-guided Resynchronization • Evaluation • Conclusion
Declared Mode • Variation of ordered mode • Only metadata is journaled, strict ordering • Declares its intent to write to home locations • New journal structure: declare block • List of home data locations for the transaction • Space and performance overheads
Declared Write Analysis Running Committing Commit: write declare to journal, wait (bounded) write data to home, wait (known) write desc and meta to journal, wait (bounded) write commit to journal, wait (bounded) Running: collect file system updates in memory pdflush may write data to home (unknown) META DATA DATA DATA DATA DECL 11 DESC 11 META DATA COMM 11 Journal Super P P P P P P P P P P P P P P P P
Software RAID Verify Read • File system must communicate possible inconsistencies to the software RAID layer • New interface: verify read request • Read block and verify its redundant information • Repair redundant information if inconsistent xor xor = ? P P P P P P P P P P P P P P P P P
Outline • Problem • ext3 Background and Analysis • ext3 Declared Mode and RAID Verify Read • Journal-guided Resynchronization • Evaluation • Conclusion
Journal-guided Resynchronization Recovery and Resynchronization: superblock write: verify read for superblock checkpointing: verify reads for descriptor home locations committing: verify reads for head of the journal home data writes: verify reads for declared home locations checkpoint committed transactions DECL 11 DESC 11 META DATA COMM 11 DECL 12 Journal Super P P P P P P P P P P P P P P P P
Outline • Problem • ext3 Background and Analysis • ext3 Declared Mode and RAID Verify Read • Journal-guided Resynchronization • Evaluation • Conclusion
Declared Mode Evaluation • Microbenchmarks (versus ordered mode) • Random write (3% slowdown) • Sequential write (5% slowdown) • Sprite create, read, unlink (4% slowdown) • Macrobenchmarks • ssh Benchmark (3% speedup for unpack) • Postmark (40% speedup - 5% slowdown) • Speedup from globally sorted write order • TPC-B (20% - 5% slowdown) • Small transaction size increases declare overhead
Implementation Complexity • Cooperative approach reduces complexity
Resynchronization Experiment • Five disk, 1 GB RAID-5 array • Foreground process reading a set of files • After 30 seconds, crash and restart machine • Resynchronization begins • Foreground process restarts • Monitor foreground bandwidth and resync
Resynchronization Results • Availability: foreground BW from 29.6 to 34.1 MB/s • Reliability: vulnerability from 254 to 0.21 seconds • Reduced from O(array size) to O(journal size)
Outline • Problem • ext3 Background and Analysis • ext3 Declared Mode and RAID Verify Read • Journal-guided Resynchronization • Evaluation • Conclusion
Conclusion • RAID consistent updates are challenging • Analyzed ext3 journaling, declared mode • Identifies outstanding writes after a crash • Software RAID verify read interface • Journal-guided Resynchronization • Leverages functionality, reducing complexity • Provides performance, reliability, and availability • Cooperation between layers is the key
Questions? http://www.cs.wisc.edu/adsl/