Formally Verifying a File System: A Successful Failure

Formally Verifying a File System: A Successful Failure • CSCI-P515/P415 • Spring 2008 • Michael Adams (adamsmd@cs.indiana.edu) • Joseph Near (jnear@cs.indiana.edu) • Aaron Kahn (aakahn@cs.indiana.edu)

Overview • Motivation • High Level Design • Approach • Minor Difficulties (and their solutions) • Major (Fatal) Difficulty (and explanation) • The Proposed Solution • Recap/Summation

Motivation • Our goal for this project was to attempt to formally verify a file system • We were under the impression that this would be a straight forward task, and as long as the abstraction was simple, there wouldn't be any major problems

Limitations • Are doing: • Can take a file number, and read/write to it • Create/Delete files • Not doing: • Directories • File Names • Permissions, Users, Groups, etc • The stuff we're not doing can be added as an abstraction on what we are doing

Design • Develop a B-tree Structure • The B-tree is actually serialized onto a disk • Disk represented as an array of bytes • Create the B-tree algorithms • insert, delete, lookup • Write the File System (read file, write file, create file, etc) algorithms in terms of the B-tree algorithms.

Process • Initially, we wrote the code in Scheme in order to have a fully working model of “live” code to test on, and then translated it in to PVS • In PVS, the file system was abstracted all the way down to a disk representation to allow for better simulation of real problems of writing file systems • This turned out to be essential to our learning the difficulties of actually verifying a file system

Additional Structures • In addition to the B-tree, we found that these auxiliary structures were needed • A free list • Blocks that represent files themselves, but are not part of the B-tree • Single block that holds all of the pointers to the root of the free list and the root of the B-tree (similar to a meta-data block)

Accomplishments • B-tree in Scheme • Thoroughly tested • Were able to successfully translate our code into PVS. • Made a number of discoveries in terms of tricks for proving the algorithms in PVS • However, very late in the game, we discovered a fatal limitation of how we modelled things in PVS • Have ideas for overcoming the problems in the future

Minor Problems • In a large project, there are many minor problems that are surprisingly difficult to solve • These often require the development of a simple but non-obvious trick • We ran into and solved many of these; here is a sample of what we learned • More detail included in report

Search • search(array, start, stop, val) • Search through a sorted array for the first value greater than or equal to the argument; return the position of that value • If no element is greater than the argument, return the length of the array • Unexpectedly difficult to prove • Measured induction on stop – start • Ended up using max(0, stop – start) • Lesson: make sure measure is well-founded; sometimes making it well-founded works

Well-formedness • Designed as part of our testing; believed to be an important part of the proof • Theory: algorithms are correct if they have the desired effect and the disk remains well-formed • Assuming a well-formed disk should give us a basis for proving correctness of our operations • Proved that a newly-formatted disk is well-formed • Partially proved that allocation preserves well-formedness

Well-formedness • Realization: well-formedness is irrelevant! • Well-formedness is defined by the observer (in this case, lookup) • lookup(key, insert(key, value, disk)) = value • If the observer can correctly interpret the data given to it, then that data is well-formed • Lesson: don't waste time proving things about well-formedness

Proving insert • Many uses of let due to state-passing style • Exponential blowup of expression size • Sequents become pages long! • Side effects make proofs difficult • When an object is effected, the sequent clauses no longer apply, even if the change doesn't affect them • User has to prove that the sequent clauses still apply

Main Problem: Side Effects • State Passing Style • Good for modelling state • Easy to implement, familiar • Bad for Proving!

The problem with side effects • Effects Invalidate Assumptions • Given a property about a disk, we need to prove the same property about a modified disk • Example: • If P(disk) then P(write_block(block, disk)) • Even if the effect does not affect P, we have to prove that P still holds • This makes sense: it does not hold automatically!

Obvious solution: Hoare Logic • Substitution enforces separation of variables • So P(x) => P(x) automatically as long as x isn't effected • Red herring: this only helps if we use Abstract Data Types • We serialize our ADT into a single disk object • Side-effecting one part will side-effect all parts, even if we use Hoare Logic

Naive Solution • Prove that side effecting one part of the B-Tree, Free list, etc doesn't effect assumptions about other parts of the disk • Possible, but Impractical • For every algorithm • For every effect • For every clause of the sequent • Must prove that the assumption still holds after the effect • A few such basic proofs were accomplished • But even they were long and easy to get lost in

What we want from a better solution • We want to write ADT style code • We want to write ADT style proofs • We want to push a button and have • Serialized style code • Serialized style proofs • Is it Possible???

What a solution would look like • Serialization Theorems • Example: deserialize(serialize(n)) = n • Fairly easy to prove • Already done • Even grind could do it • Proof that changing one value doesn't effect other values • Hmm...

Proof of effect independence • Language Run-Time for ADT is already doing this • Objects are serialized to memory • Language Run-Time Limitations • Language vs Programmer control of serialization • The Garbage collector • Known Hard Problem • Bad Idea on a Hard Disk

How to avoid GC • We don't need general GC • Side-effect view: • Values only “modified” if only reference • Or not reachable from values used in theorems • ADT view: • Values only “allocated” if we are “freeing” another value • Solution: ...

Linear Types!!!

What are Linear Types? • Objects must always have exactly one reference • No duplication • No erasure • No GC needed • Look Ma, No Garbage! • “Modifying” something is “de-alloc” plus “alloc” • Our algorithms already treat objects as linear • Just need to teach PVS to take advantage of that

Linear Types vs. Monads • Lost the battle of representing state to monads • Maybe could win the war for formal proofs • Pros and Cons • Monads are more General • Non-determinism, environments, etc. • Linear Types provide more guarantees • A reference to a linearly typed object is guaranteed to be the only reference

Recap • File Systems are Full of Bugs • But it is critical that they be right • Verification could fix this • We designed and implemented a File System • B-Tree based • Modelled all the way to “disk” • Auxiliary structures needed • Free List • File Blocks • Root File System Block

Recap • We proved linear search • Lesson: Make sure measures are well-founded • Lesson: Make measures well-founded if they aren't • Well-formedness • Red-herring • Actually defined by observers • Exponential blow-up due to let • Possible Improvement in how PVS presents sequents

Recap • Side effects are hard in an unexpected way • Implementing side effects in PVS is easy • Use State Passing Style (e.g. State monad) • Proving side effects in a serialized common store is hard • Must prove that every effect keeps the theorems true • Number of Proofs exploded beyond our ability

Recap • Linear Types to the Rescue!! • User writes ADT style proofs • System converts them to serialized proofs • Better than Monads • Need Theory for Linear Types in PVS

Final Results • Ultimately had to declare failure • Code is fragmentary • But learned more from failure than success • Main deliverable is report and what not to do • We have good ideas for how to make future attempts • ... and we don't feel too bad because others have estimated verifying a file system to take 2-3 years to accomplish. • A mini-challenge: build a verifiable filesystem. Rajeev Joshi, Gerard J. Holzmann

Formally Verifying a File System: A Successful Failure

Formally Verifying a File System: A Successful Failure

Presentation Transcript

The UNIX File System

Unit 8 – Solaris File Systems

CHAPTER 11: FILE-SYSTEM INTERFACE

File-System Interface

FILE SYSTEM

Lecture 19: File System Implementation (Ch 11)

Chapter 11: File System Implementation

Chapter 11: File System Implementation

Formerly /Formally

CS 415 Operating Systems Principles File System Implementation

Chapter 4 File System —— File System Cache

Building File Systems with

Chapter 10: File-System Interface

Chapter 10: File-System Interface

Distributed Systems Course Distributed File Systems

FILE SYSTEM

Chapter 10 File-System Interface

File System Implementation

System Administration