260 likes | 400 Views
SQCK: A Declarative File System Checker. Haryadi S. Gunawi , Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin – Madison OSDI ’08 – December 9 th , 2008. Corrupt file systems. File systems Store massive amounts of data Must be reliable
E N D
SQCK: A Declarative File System Checker Haryadi S. Gunawi, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin – Madison OSDI ’08 – December 9th, 2008
Corrupt file systems • File systems • Store massive amounts of data • Must be reliable • Corrupted file system images • Due to hardware errors, file system bugs, etc. • Need to be repaired a.s.a.p.
Who should repair? • Does journaling (write-ahead log) help? • No, only for crashes • Does file system repair itself online? • No, not enough machinery • Fsck: the last line of defense • It’s a “must have” utility • XFS: “no need fsck ever”, but deploys fsck at the end • Must be fully reliable
But … fsck is complex • Fsck has a big task • Turn any corrupt image to a consistent image • E.g. check if a data block is shared by two inodes • How are they implemented? • Written in C hard to reason about • Large and complex • Ext2 fsck: 150 checks in 16 KLOC • XFS fsck: 340 checks in 22 KLOC • Hundreds of cluttered if-check statements • Bottom line: fsck code is “untouchable”
Are current checkers really reliable? If not, how should we build robust checkers? Two Questions
e2fsck is unreliable • Analyze e2fsck (ext2 file system checker) • Findings: • Inconsistent repair • The file system becomes unreadable • Consistent but not “correct” • Fsck deletes valid directory entries • Fsck loses a huge number of files
SQCK • Lesson: Complexity is the enemy of reliability • Big task + bad design complexity unreliability • Need a higher-level approach for simplicity • SQCK (SQL-based Fsck) • Use a declarative query language to write checks • Put simply: write fewer lines of code • Evaluation • Simple and reliable: e2fsck in 150 queries (vs. 16 KLOC of C) • More: Great flexibility and reasonable performance
Outline • Introduction • Analysis of e2fsck • SQCK Design • SQCK Evaluation • Conclusion
Methodology • E2fsck task: cross-check all ext2 metadata • An indirect pointer should not point to the superblock • A subdir should only be accessible from one directory • Inject single corruption • Observe how e2fsck repairs a single corruption • Only corrupton-disk pointers • Corrupt an indirect pointer to point to the superblock • Corrupt a directory entry to point to another directory • Usually, a corrupt pointer is simply cleared to zero
Indirect block 0 … 850 … 851 … 998999 … 853 Superblock 0 … … 0 … … … … 0 … … Inconsistent (Out-of-order) Repair • Check bad indirect pointer 2. Check indirect content Inode *ind Inode *ind Superblock 0 Ideal fsck e2fsck 2. Check indirect content • Check bad indirect pointer Inode *ind Inode *ind Superblock
/ a1 b1 a2 b2 Consistent but Incorrect Repair (1) / / / a1 b1 a1 b1 a1 b1 LF X a2 b2 a2 b2 b2 a2 Ideal fsck Kidnapping problem! e2fsck E2fsck does not use all available information / a1 b1 X b2
Result Summary • Four problems • Inconsistent • Information-incomplete • Policy-inconsistent • Insecure • E2fsck does not handle all corruptions • “Warning: Programming bug in e2fsck! Or some bonehead (you) is checking a mounted (live) filesystem.” • Not simple implementation bugs • Difficult to combine available information • Difficult to ensure correct ordering
Outline • Introduction • Analysis • SQCK Design • SQCK Evaluation • Conclusion
Hundreds of checks Complex cross-checks Taxonomy of checks in e2fsck: Must be ordered correctly struct A { int x int y } A { x y } A { x y } A { x y } B { m n } A { x y } A { x y } A { x y } A { x y } B { m n } B { m n } B { m n } Fsck Properties
A Declarative Approach • Lesson: Complexity is the enemy of reliability • SQCK • Use a declarative query language (e.g. SQL), why? • It is declarative: high-level intent is clear • Fit for cross-checking massive information • Goals achieved • Simple: e2fsck in 150 queries (vs. 16 KLOC of C) • Reliable: Each check/query is easy to understand • Flexible: Plug in/out different queries
Using SQCK • Take a fs image • Load metadata to db tables • Temporary tables • Ex: InodeTable, GroupDescTable, DirEntryTable • Run checks and repairs (in the form of queries) • Flush any modification, and delete tables Database tables Scanner Loader Checks + Repairs Flush File system image
Declarative check (example 1) • Cross-checking asingle instance of a structure • “Find block bitmap that is not located within its block group” first_block = sb->s_first_data_block; last_block = first_block + blocks_per_group; for (i = 0, gd=fs->group_desc; i < fs->group_desc_count; i++, gd++) \{ if (i == fs->group_desc_count - 1) last_block = sb->s_blocks_count; if ((gd->bg_blk_bmap < first_block) || (gd->bg_blk_bmap >= last_block)) { px.blk = gd->bg_block_bitmap; if (fix_problem(BB_NOT_GROUP, ...)) gd->bg_block_bitmap = 0; } ... } SELECT * FROM GroupDescTable G WHERE G.blockBitmap NOT BETWEEN G.startANDG.end
Declarative check (example 2) • Cross-checking multiple instances of the same structure • “Find false parents (i.e. directory entries that point to a subdirectory that already belongs to another directory)” • Must read all directory entries in dir data blocks • Wrong implementation in e2fsck (the kidnapping problem)
Declarative check (example 2) if ((dot_state > 1) && (ext2fs_test_inode_bitmap (ctx->inode_dir_map, dirent->inode))) { // ext2fs_get_dir_info // is 20 lines long subdir = e2fsck_get_dir_info (dirent->inode); ... if (subdir->parent) { if (fix_problem(LINK_DIR,..)) { dirent->inode = 0; goto next; } } else { subdir->parent = ino; } }
Declarative check (example 2) SELECT F.* // returns the // false parent(s) FROM DirEntryTable P, C, F WHERE // P says C is its child P.entry_num >= 3 AND P.entry_ino = C.ino AND // and C says P is his parent C.entry_num = 2 AND C.entry_ino = P.ino AND // F also says C is its child F.entry_num >= 3 AND F.entry_ino = C.ino AND F.ino <> P.ino AND F P C
Running declarative checks is part of the problem Must also perform the declarative repairs A repair = An update query Some repairs simply update a few fields A repair = A series of queries Ex: Reconnect an orphan directory to the lost+found directory Combine a series of queries with C code All repairs are written in SQL C code is only used for connecting them Declarative Repairs ... SET T.field = newValue, T.dirty = 1
Outline • Introduction • Analysis • SQCK Design • SQCK Evaluation • Conclusion
SQCK Evaluation • Complexity • 150 queries in 1100 lines of SQL statements • (compared to 16,000 lines of C in e2fsck) • Reliability • Pass hundreds of corruption scenarios • Flexibility • Add new checks/repairs • Enable different versions of e2fsck • Performance • Introduce some optimizations
SQCK vs. e2fsck • Reasonable • First generation of SQCK (with MySQL) • Within 1.5x of e2fsck • Future optimizations • Hierarchical checks • Concurrent queries
Conclusion • Complexity is the enemy of reliability • Recovery code is complex • SQCK: Build recovery tools with a higher-level approach
Thank you!Questions? ADvanced Systems Laboratory www.cs.wisc.edu/adsl