EXPLODE: A Lightweight, General Approach to Finding Serious Errors in Storage Systems

EXPLODE: A Lightweight, General Approach to Finding Serious Errors in Storage Systems Presented by Pramodh Pochu Junfeng Yang, Can Sar, and Dawso Engler

Agenda • Motivation • Introduction • Key Contributions • Checking a Storage System • Implementation • Exploring choices • Generating crash disks • Checking different storage systems • Conclusion • Evaluation

Motivation • Storage systems are important pieces of systems code • Storage systems are difficult to test • Verifying system recovery from a crash is difficult • Previous works (Traditional model checking and Implementation model checking) for finding data integrity bugs are heavy weight

Introduction • Traditional model checking takes as specification a model which it checks by starting from an initial state and repeatedly performing all possible actions to this state and successors • Traditional model checking requires rewriting the system using artificial modeling language Ex: Promela

Introduction • Implementation level model checking uses the code as its own model • So it eliminates need for writing a model • But it requires porting the entire OS to run on top of the model checker to run it as an user process • Verisoft and CMC are examples of these types

Key Contributions • Explode checkers are effective • Explode checkers can do thorough checks • A series of new file system specific checks for catching bugs in the data sync facilities used by applications • Simple checkers find bugs

Principles • Explore all choices : When program point can legally do one of N different actions, fork execution N times and do each • Exhaust states: Do every action before proceeding to next state • Touch nothing: modifying large checked systems may produce corner case mistakes. So those are not modified • Report true errors and deterministically

Checking Storage System • Clients use Explode for two things • Systematically exhaust all possibilities • Check that it correctly recovers from the crash • Clients provide three system specific components • Checker • Storage component • Checking stack

Choose • Explode provides “choose” fn. Given an program that has N possible actions clients insert a call “choose (N)”, Which will appear to fork execution N times returning 0,1,..N-1 in each child execution resp.

Choose • In low memory situations, kmalloc canreturn NULL when called without _GFP_NOFAIL flag. Void *kmalloc (size_t size, int flags) { If (flags & _GFP_NOFAIL)==0) If(choose(2)==0) return NULL; ……

Writing Checkers • Client provides a checker that Explode uses to drive and check a storage system • Checker implements five methods • Mutate: system specific operations and calls into Explode to explore choices and to do crash checking • Check: checking for storage errors after crash • get_sig: returns byte array signature representing the current state of the checked system

Writing Checkers • init and finish: setup and clear checkers internal states • File system checker checks if a file synchronously written to disk persists after a crash

Storage Components • For each storage subsystem involved in checking, clients provide a storage component that implements five methods • Init: for initialization • Mount: setup storage system • Unmount: tear down storage system • Recover: repair the storage system • Threads: return thread ids for the storage system’s kernel threads

Checking stack • Checking stack builds a checker by gluing storage system components together and then attaching single checker on top of them. • Currently stack can only have one checker • There can be fan-out of storage components, such as setting up multiple RAM disks to make a RAID array

Implementation • Create clean initial state and invoke client’s mutate on it • At every Choose (N) call, fork N children • On client request generate all crash disks and run the client check method on them • When mutate returns, re invoke it

Implementation

State Checkpoint and Restore • Explode checkpoints a state by recording sequence of choices the checked code took to reach S • It restores S by starting from a clean state and replaying these choices • Checkpoint records the sequence of n choices that produced S in an array

State Checkpoint and Restore • Unmounting clears in state memory, removes buffer cache entries and frees up kernel data structures. • To restore a state, the current disk is unmounted. Then a copy of the initial pristine disk is mounted and all previously made choices are replayed.

Re-executing the code deterministically • Doing the same choice: Explode discards any calls from an interrupt or calls from any other process whose id is not associated with checked system. • Controlling threads: Explode uses priorities to control when storage system threads run. • Requirements on checked system: Checked system must issue same choose calls across replay runs. The systems checked are partly isolated during checking and nothing besides checker and their kernel threads modifies RAM disks.

Generating crash disks • As the storage system executes, EKM logs operations that affect which blocks could be written to disk • Explode extracts this log using EKM ioctl • Then applies add/remove operations to initial write set • Whenever “write set shrinks”, it generates all possible crash disks.

Generating crash disks

Checking different storage systems • Explode checks different types of storage systems which include File systems, Version control systems , database etc., • Bugs were found in each of the storage system that is checked using Explode

Checking File System • Explode checks ten different types of Linux file systems • A common checkers is used initially • Three checkers are developed using the common checker which focused on different special cases

Generic checker core • Starts from empty file system and systematically generates FS topologies • Mutate applies eight system calls to each node (File, Link, Directory) in the current topology before exploring the next • For each operation invoked, mutate duplicates its effect on fake abstract file system.

Check : Failed systems calls have no user visible effect • It uses Explode to systematically fail calls to six kernel functions. • If a system call succeeds then it updates abstract file system but otherwise doesn’t . • It then checks abstract FS with real FS • Two bugs are found • Bug in ReisserFS ftruncate which can fail with its job half done if memory allocation fails.

Check: sync operations work • Applications use OS provided methods to force data to disk to prevent crashes from destroying it • SYNC • FSYNC • Synchronous mount • O_SYNC

Check: sync operations work

Check: a recovered FS is reasonable • After crash a file system recovers to a reasonable state • As Mutate issues operations, it builds two sets. • Stable set: contains operations it knows as contained on the disk • Volatile set: operations that may or may not be on the disk

Check: a recovered FS is reasonable • Check verifies that recovered file system can be constructed using some of the operations in volatile set legally combined with all the stable ones. • Two bugs (in JFS & Reiser4) are found. Crashed disks could not be recovered using FSCK. A bug was found in Reiser4, where mounting causes a kernel panic.

Checking Version Control • The checkers mutate method checks out a repository, does a local modification, commits the changes and simulates a crash on the block device. • It then calls check_crashes_now() • All three systems CVS, SubVersion and ExpENSiv made the same mistake. • To update File A, they update temp file B, which they then atomically rename it to A. However they forget to force B’s contents to disks before rename. So a crash can destroy it.

Checking Berkeley DB • The database checker checks that after a crash no committed transaction records are corrupted or disappear. • Mutate method is a simple loop that starts a transaction, adds several records to it an then commits this transaction. It records committed transactions • On ext2 creating a database inside the transaction may corrupt if system crashes.

Checking Berkely DB • Furthermore even committed transactions may disappear • On ext3 crash while adding a record can lead to unrecoverable state • On all three FS (ext2,ext3 and JFS) a record which is added but not committed can appear after crash

Checking RAID • Two checks are done • A file system’s behavior (crash & non crash) on top of RAID should be same as without it. • Losing any single sector in a RAID does not cause data loss • Explode found that Linux RAID does not reconstruct unreadable sector • If two sector read errors occurs then all maintenance operations fail. Disk writes also fail

Checking NFS • Setup NFS partition: Export local FS as an NFS partition over loopback interface • Tear : Unmount it • Recovery: FSCK for local file systems • Writing to a file, then reading the same file through a hard link in a different directory yields inconsistent data. • Linux NFS security feature called Subtree checking caused this error. • There are also additional bugs specific to individual file systems.

Conclusion • Lightweight approach can be used to find crash recovery errors • Explode runs on a slightly modified Linux kernel on raw hardware and • Explode is applied to a variety of storage systems and serious bugs are found.

Evaluation • Explode is more general, lightweight and easier to apply than FISC. • Efforts are being made to make Explode open source • Improvements have been made when compared to paper presented on Explode in 2004 at the workshop

Questions ?

Thank you

Related Work • File system testing tools • Software Model Checking • Generic bug finding

File System Checking Tools • Non deterministic • Less Comprehensive • But Lightweight and work ‘out of box’ • Complementary to explode

Software Model Checking • Model checkers are used to find errors in design and implementation of software systems • Verisoft doesn’t store states at checkpoints. It relies heavily on partial order reduction techniques. • Java pathfinder relies on Virtual machine, which extracts current state of a java program

Generic Bug finding • Static analysis is better at finding errors in surface properties visible in source code. • Model checking is more strenuous as it requires running code and moreover checks only the path executed • Because it executes code it can check properties implied by the code • Static analysis is complementary to Explode

EXPLODE: A Lightweight, General Approach to Finding Serious Errors in Storage Systems

EXPLODE: A Lightweight, General Approach to Finding Serious Errors in Storage Systems

Presentation Transcript

Turning Eclipse Against Itself: Finding Errors in Eclipse Sources

Hitachi Data Systems Network Storage Systems

PERFORMANCE ANALYSIS OF AN ORGANIC RANKINE CYCLE FOR INTEGRATION IN A CARNOT BATTERY

Chapter 3: Storage

General Approach in Investigation of Haemostasis

Chapter 3: Storage

Chapter 3: Storage

HPMMAP: Lightweight Memory Management for Commodity Operating Systems

Twitter-Like, Lightweight Approach for Continuous 360 Performance Feedback

Chapter 3: Storage

Approach to Dyspnea

Debugging Haskell Programs

Metropolitan’s Storage Management Approach

An Automatic Approach To Verify Sensor Network Systems

Data Security for Cloud Storage Systems

IEBS Intelligent ExaByte Storage based on Grid Approach

Dealer Name Street Address City, State Zip

Archival Storage Venti : A new approach to archival storage Sean Quinlan and Sean Dorward

Chapter 3: Storage

The Lightweight File System

Chapter 7 Storage Systems

Storage Systems