1 / 28

Why panic () ? Improving Reliability through Restartable File Systems

Swaminathan Sundararaman , Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift. Why panic () ? Improving Reliability through Restartable File Systems. Data Availability. Slave Nodes. GFS Maste r. GFS Maste r. Slave Nodes.

mayer
Download Presentation

Why panic () ? Improving Reliability through Restartable File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Swaminathan Sundararaman, Sriram Subramanian, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Michael M. Swift Why panic()? Improving Reliability through Restartable File Systems

  2. Data Availability Slave Nodes GFS Master GFS Master Slave Nodes • Applications require data • Use FS to reliably store data • Both hardware and software can fail • Typical Solution • Large clusters for availability • Reliability through replication

  3. User Desktop Environment App App App OS FS Disk Raid Controller Disks • Replication infeasible for desktop environments • Wouldn’t RAID work? • Can only tolerate H/W failures • FS crash are more severe • Services/applications are killed • Requiring OS reboot and recovery • Need: better reliability in the event of file system failures

  4. Outline Motivation Background Restartable file systems Advantages and limitations Conclusions

  5. Failure Handling in File Systems • Exception paths not tested thoroughly • Exceptions: failed I/O, bad arguments, null pointer • On errors: call panic,BUG,BUG_ON • After failure: data becomes inaccessible • Reason for no recovery code • Hard to apply corrective measures • Not straightforward to add recovery

  6. Realworld Example: Linux 2.6.15 ReiserFS int journal_mark_dirty(….){    struct reiserfs_journal_cnode *cn = NULL;    if (!cn) {        cn = get_cnode(p_s_sb);        if (!cn) {            reiserfs_panic(p_s_sb, "get_cnode failed!\n");        }} } File systems already detect failures void reiserfs_panic(struct super_block *sb, ...) { BUG();   /* this is not actually called, but makes reiserfs_panic() "noreturn" */    panic("REISERFS: panic %s\n“, error_buf);} Recovery: simplified by generic recovery mechanism

  7. Possible Solutions Lightweight Heavyweight CuriOS EROS Stateful Nooks/Shadow Xen, Minix L4, Nexus SafeDrive Singularity Stateless • Code to recover from all failures • Not feasible in reality • Restart on failure • Previous work have taken this approach FS need: stateful & lightweight recovery

  8. Restartable File Systems FS Failures: completely transparent to applications Goal: build lightweight & stateful solution to tolerate file-system failures Solution: single generic recovery mechanism for any file system failure • Detect failures through assertions • Cleanup resources used by file system • Restore file-system state before crash • Continue to service new file system requests

  9. Challenges • Transparency • Multiple applications using FS upon crash • Intertwined execution • Fault-tolerance • Handle a gamut of failures • Transform to fail-stop failures • Consistency • OS and FS could be left in an inconsistent state

  10. Guarantying FS Consistency • Not all FS support crash-consistency • FS state constantly modified by applications • Periodically checkpoint FS state • Markdirty blocks as Copy-On-Write • Ensure each checkpoint is atomically written • On Crash: revert back to the last checkpoint FS consistency required to prevent data loss

  11. Overview of Our Approach 5 3 Periodically create checkpoints Open (“file”) write() read() write() write() Close() 1 Application File System Crash 2 VFS 6 Unwind in-flight processes 3 checkpoint File System 2 Move to recent checkpoint 4 4 1 Epoch 0 Epoch 1 Replay completed operations 5 time Re-execute unwound process Legend: Completed In-progress Crash 6

  12. Checkpoint Mechanism • File systems constantly modified • Hard to identify a consistent recovery point • Naïve Solution: Prevent any new FS operation and call sync • Inefficient and unacceptable overhead

  13. Key Insight App App App All requests go through the VFS layer VFS File System ext3 VFAT Control requests to FS and dirty pages to disk Page Cache File Systems write to disk through Page Cache Disk

  14. Generic COW based Checkpoint App App App VFS VFS VFS File System File System File System 1 1 Page Cache Page Cache Page Cache Disk Disk Disk STOP STOP At Checkpoint After Checkpoint Regular Membrane

  15. Interaction with Modern FSes • Have built-in crash consistency mechanism • Journaling or Snapshotting • Seamlessly integrate with these mechanism • Need FSes to indicate beginning and end of an transaction • Works for data and ordered journaling mode • Need to combine writeback mode with COW

  16. Light-weight Logging • Log operations at the VFS level • Need not modify existing file systems • Operations: open, close, read, write, symlink, unlink, seek, etc. • Read: • Logs are thrown away after each checkpoint • What about logging writes?

  17. Page Stealing Mechanism Write (fd, buf, offset, count) VFS VFS VFS File System File System File System Page Cache Page Cache Page Cache Before Crash During Recovery After Recovery • Mainly used for replaying writes • Goal: Reduce the overhead of logging writes • Soln: Grab data from page cache during recovery

  18. Handling Non-Determinism

  19. Skip/Trust Unwind Protocol

  20. Evaluation Setup

  21. OpenSSH Benchmark

  22. Postmark Benchmark

  23. Recovery Time Restart ext2 during random-read micro benchmark

  24. Recovery Time (Cont.)

  25. Advantages • Improves tolerance to file system failures • Build trust in new file systems (e.g., ext4, btrfs) • Quick-fix bug patching • Developer transform corruptions to restart • Restart instead of extensive code restructuring • Encourage more integrity checks in FS code • Assertions could be seamlessly transformed to restart • File systems more robust to failures/crashes

  26. Limitations Inode# Mismatch File1: inode# 15 File1: inode# 12 create (“file1”) stat (“file1”) write (“file1”, 4k) create (“file1”) write (“file1”, 4k) stat (“file1”) Application VFS File System File : file1 Inode# : 12 File : file1 Inode# : 15 Epoch 0 Epoch 0 After Crash Recovery Before Crash • Only tolerate fail-stop failures • Not address-space based • Faults could corrupt other kernel components • FS restart may be visible to application • e.g., Inode numbers could be changed after restart

  27. Conclusions • Failures are inevitable in file systems • Learn to cope and not hope to avoid them • Generic recovery mechanism for FS failures • Improves FS reliability availability of data • Users: Install new FSes with confidence • Developers: Ship FS faster; as not all exception cases are now show-stoppers

  28. Thank You! Advanced Systems Lab (ADSL) University of Wisconsin-Madison http://www.cs.wisc.edu/adsl Questions and Comments

More Related