340 likes | 361 Views
Tolerating File-System Mistakes with EnvyFS. Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin Madison. Lakshmi N. Bairavasundaram NetApp, Inc. File Systems in Today’s World. Modern file systems are complex
E N D
Tolerating File-System Mistakes with EnvyFS Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin Madison Lakshmi N. Bairavasundaram NetApp, Inc.
File Systems in Today’s World • Modern file systems are complex • Tens of thousands of lines of code (e.g., XFS 45K LOC) • Storage stack is also getting deeper • Hypervisor, network, logical volume manager • Need to handle a gamut of failures • Memory allocation, disk faults, bit flips, system crashes • Preserve integrity of its meta-data and user data Tolerating File-System Mistakes with EnvyFS
File System Bugs • Bug reports for Linux 2.6 series from Bugzilla • ext3: 64, JFS: 17, ReiserFS: 38 • Some are FS corruption causing permanent data loss • FS bugs broadly classified into two categories • “fail-stop”: System immediately crashes • Solutions: Nooks [Swift 04], CuriOS [David08] • “fail-silent”: Accidentally corrupt on-disk state • Many such bugs uncovered [Prabhakaran05, Gunawi08, Yang04, Yang06b] Tolerating File-System Mistakes with EnvyFS
Bugs are inevitable in file systems Challenge: how to cope with them? Tolerating File-System Mistakes with EnvyFS
N-Version File Systems • Based on N-version programming [Avizienis77] • NFS servers [Rodrigues01], databases [Vandiver07], security [Cox06] Application • EnvyFS: Simple software layer • Store data in Nchild file systems • Operations performed on all children • Rely on a simple software layer • Challenge: reducing overheads while retaining reliability • SubSIST: Novel Single Instance Store EnvyFS layer … Child 1 Child N Child 2 SIS layer Disk driver Disk Tolerating File-System Mistakes with EnvyFS
Results • Robustness • Traditional file systems handle few corruptions (< 4%) • EnvyFS3 tolerates 98.9% of single file system mistakes • Performance • Desktop workloads: EnvyFS3 has comparable performance • I/O intensive workloads: • Normal mode: EnvyFS3 + SubSIST acceptable performance • Under memory pressure: EnvyFS3 + SubSIST large overheads • Potential as a debugging tool for FS developers • Pinpoint the source of “fail-silent” bug in ext3 Tolerating File-System Mistakes with EnvyFS
Outline • Introduction • Building reliable file systems • Reducing overheads with SubSIST • Evaluation • Conclusion Tolerating File-System Mistakes with EnvyFS
N-Version Systems Development process: • Producing the specification of software • Implementing N versions of the software • Creating N-version layer • Executes different versions • Determines the consensus result Tolerating File-System Mistakes with EnvyFS
1. Producing Specification • Our own specification ? • Impractical: Requires wide scale changes to file systems • Specifications take years to get accepted • Can we leverage existing specification ? • Yes, can leverage VFS, but there are some issues • VFSnot precise for N-versioning purpose • Needs to handle cases where specification is not precise • e.g., Ordering directory entries, inode number allocation Tolerating File-System Mistakes with EnvyFS
Imprecise VFS Specification File 1 File 2 File 3 Ordering directory entries • Issue: • No specified return order • Can’t blindly compare entries • Solution: • Read all entries from a directory (dir: test in our case) from all FSes • Match entries from FSes • Return majority results Dir: test EnvyFS layer File 1 File 2 File 3 No Entries Readdir: test File 2 File 3 File 1 File 3 File 1 File 2 File 1 File 2 File 3 FS X FS Z FS Y … File 1 File 2 File 3 Dir: test Dir: test Dir: test Tolerating File-System Mistakes with EnvyFS
Imprecise VFS Specification (cont) • Inode number allocation • Inode numbers returned through system calls • Each child file system issues different inode numbers • Possible solution: Force file systems to use same algorithm? • Our solution: Issue inode numbers at EnvyFS layer EnvyFS layer File 1 10 File 2 15 File 3 16 File 2 04 File 3 44 File 1 36 File 3 99 File 1 65 File 2 43 Stat: File 1 File 1 | ?? 15 Virt # FS 1 FS 2 FS 3 15 10 36 65 FS X FS Z File 1 | 10 File 1 | 36 File 1 |65 FS Y Inode Mapping Table Inode Mapping Table not persistently stored Dir: test Dir: test Dir: test Inode Numbers Tolerating File-System Mistakes with EnvyFS
2. Implementing N versions of FS • Painful process • High cost of development, long time delays • Lucky! Hard work already done for us • 30 different disk based file systems in Linux 2.6 • Which file systems to use? • ext3, JFS, ReiserFS in a three-version FS • Others should work without modifications Tolerating File-System Mistakes with EnvyFS
3. Creating N-Version Layer Application • N-Version layer (EnvyFS) • Inserted beneath VFS • Simple design to avoid bugs • Example: Reading a file • Allocate N data buffers • Read data block from the disk • Compare: data, return code, file position • Return: data, return code • Issues: • Allocate memory for each read operation • Extra copy from allocated buffer to application • Comparison overheads Read (file, 1 block) err , VFS layer Read (file, 1 block) err , EnvyFS Layer Inode Mapping Table Wrappers Comparators F err = Read (…) err = Read (…) err = Read (…) F F … pos: x JFS ext3 D D pos: x D D D D D D ReiserFS pos: x Disk Tolerating File-System Mistakes with EnvyFS
Reading a File in EnvyFS Application • Solution: • Same application buffer for all FS • TCP-like checksums for data comparison • Compare: checksums, return code, file position • Read data until majority Read (file, 1 block) err , VFS layer Read (file, 1 block) err , EnvyFS Layer Inode Mapping Table Wrappers Comparators F err = Read (…) err = Read (…) err = Read (…) F F … pos: x ext3 JFS D D pos: x D D D D D D ReiserFS pos: x Disk 435 435 436 FS 1 # FS 2 # … FS N # … Checksums Tolerating File-System Mistakes with EnvyFS
Outline • Introduction • Building reliable file systems • Reducing overheads with SubSIST • Evaluation • Conclusion Tolerating File-System Mistakes with EnvyFS
Case for Single Instance Storage (SIS) Application • Ideal: One disk per FS • Practical: One disk for all FS • Overheads • Effective storage space: 1/N • N times more I/O (Read/write) • Challenge:Maintain diversity while minimizing overheads VFS layer EnvyFS layer 1 1 1 2 2 N N … FS 1 FS N FS 2 … Part 1 Part 2 Part N … Disk Disk 1 Disk 2 Disk N Disk Req. Queue Tolerating File-System Mistakes with EnvyFS
SubSIST: Single Instance Store Application • Variant of an Single Instance Store • Selectively merges data blocks • Block addressable SIS • Exports virtual disks to FSes • Manages mapping, free space info. • Not persistently stored on disk • EnvyFS writes through N file systems • N data blocks merged to 1 data block • Content hashes not stored persistently • Meta-data blocksnot merged • Inter FS blocks and notintra FS VFS layer EnvyFS layer … FS 1 FS N FS 2 Vdisk 1 Vdisk 2 Vdisk N SubSIST CHash Layer Read Cache D D D D D D D Free Space Management M M M Disk Tolerating File-System Mistakes with EnvyFS
Handling Data Block Corruptions? Application • Corruption to data in a single FS • Due to bugs, bit flips, storage stack • Corrupt data blocks not merged • All other N-1 data blocks merged • Corrupt data block fixed at next read • Corruption to data block inside disk • Single copy of data • Different code paths • Different on-disk structures VFS layer EnvyFS layer … FS 1 FS N FS 2 D D D Vdisk 1 Vdisk 2 Vdisk N SubSIST CHash Layer Read Cache D D D D D D D D D Free Space Management Disk Tolerating File-System Mistakes with EnvyFS
Outline • Introduction • Building reliable file systems • Reducing overheads with SubSIST • Evaluation • Reliability • Performance • Conclusion Tolerating File-System Mistakes with EnvyFS
Reliability Evaluation: Fault Injection VFS Robustness of EnvyFS in recovering from a child file system’s mistake? • Corruption: bugs in FS / storage stack • Types of disk blocks • superblock, inode, block bitmap, file data, … • Perform different file ops • mount, stat, creat, unlink, read, … • Report user visible results • All results are applicable with SubSIST except corruption to data blocks EnvyFS layer ext 3 ReiserFS JFS Pseudo Device Driver Type-aware fault injection [Prabhakaran05] B Block Driver B B B B Disk B B B Tolerating File-System Mistakes with EnvyFS
path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Normal Data loss Cannot mount e Ops fail Data corrupt Crash Read-only Depends N/A ext3 E INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Result Matrix Tolerating File-System Mistakes with EnvyFS
path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Data loss Cannot mount N/A ext3 E INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Ext3 stores many superblock copies; but, does not handle superblock corruption Tolerating File-System Mistakes with EnvyFS
path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Data loss Cannot mount Ops fail Crash N/A ext3 E INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC • In addition to operations failing, inode corruption leads to data loss • Unlink: system crash during unmount Tolerating File-System Mistakes with EnvyFS
path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Normal Data loss Cannot mount e Ops fail Data corrupt Crash Read-only Depends N/A ext3 E INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Tolerating File-System Mistakes with EnvyFS
path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Normal N/A EnvyFS3 EnvyFS Kernel panic in ext3 R E J INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC EnvyFS3 works in every scenario Tolerating File-System Mistakes with EnvyFS
Potential for Bug Isolation ext3 EnvyFS3 Unlink on corrupt inode: - ext3_lookup (bug) - ext3_unlink • Unlink on corrupt inode: • - ext3_lookup (bug) • ext3 inode does not match others • Further ops not issued Time Time Unmount (panic) In EnvyFS3, a problem is noticed the first time child file system returns wrong results In typical use, a problem is noticed only on panic Tolerating File-System Mistakes with EnvyFS
path traversal SET-1 SET-2 read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 umount Normal Data loss Cannot mount a Ops fail Data corrupt Crash Read-only Depends N/A JFS J INODE DIR BMAP IMAP INTERNAL DATA SUPER JSUPER JDATA AGGR-INODE IMAPDESC IMAPCNTL Tolerating File-System Mistakes with EnvyFS
path traversal SET-1 SET-2 read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 umount Normal Crash N/A EnvyFS3 EnvyFS R E J INODE DIR BMAP IMAP INTERNAL DATA SUPER JSUPER JDATA AGGR-INODE IMAPDESC IMAPCNTL Kernel panic in EnvyFS3 Tolerating File-System Mistakes with EnvyFS
OpenSSH Benchmark Performance Evaluation 3 % overhead • Experimental setup • AMD Opteron 2.2 GHz Processor • 2GB RAM • 80 GB Hitachi Deskstar 7200-rpm SATA disk • Linux 2.6.12 • 4GB disk partition for each file system • CPU Intensive • OpenSSH 4.5 • -- Copy, untar and make Elapsed Time (in Seconds) Performance of EnvyFS3 is comparable to a single file system File Systems Tolerating File-System Mistakes with EnvyFS
Postmark Benchmark EnvyFS3: 8x + SubSIST: 4x • I/O Intensive • Mimics busy mail server workload • Transaction: creates, deletes, reads, appends, … • Postmark Configuration • 2500 files • File size: 4Kb – 40Kb • No. of transactions: 10K and 100K 851 EnvyFS3: 1.7x + SubSIST: 11.5% EnvyFS3: 3.3x + SubSIST: -32% Elapsed Time (in Seconds) 430 406 271 243 128 129.0 107 78 39.0 34 26.4 29 14.7 9.6 Tolerating File-System Mistakes with EnvyFS
Summary of Results • Robustness • Traditional file systems vulnerable to corruptions • EnvyFS3 tolerates almost all mistakes in one FS • Performance • Desktop workloads: EnvyFS3 has comparable performance • I/O intensive workloads: • Regular Operations: EnvyFS3 + SubSIST acceptable performance • Memory pressure:EnvyFS3 + SubSIST has large overhead Tolerating File-System Mistakes with EnvyFS
Outline • Introduction • Building reliable file systems • Reducing overheads with SubSIST • Evaluation • Conclusion Tolerating File-System Mistakes with EnvyFS
Conclusion • Bugs/mistakes are inevitable in any software • Must cope, not just hope to avoid • EnvyFS: N-version approach to tolerating FS bugs • Built using existing specification and file systems • SubSIST: single instance store • Decreases overheads while retaining reliability Tolerating File-System Mistakes with EnvyFS
Thank You! Advanced Systems Lab (ADSL) University of Wisconsin-Madison http://www.cs.wisc.edu/adsl Tolerating File-System Mistakes with EnvyFS