1 / 29

Formally Verifying a File System: A Successful Failure

Formally Verifying a File System: A Successful Failure. CSCI-P515/P415 Spring 2008 Michael Adams ( adamsmd@cs.indiana.edu ) Joseph Near ( jnear@cs.indiana.edu ) Aaron Kahn ( aakahn@cs.indiana.edu ). Overview. Motivation High Level Design Approach Minor Difficulties (and their solutions)

sachi
Download Presentation

Formally Verifying a File System: A Successful Failure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Formally Verifying a File System: A Successful Failure • CSCI-P515/P415 • Spring 2008 • Michael Adams (adamsmd@cs.indiana.edu) • Joseph Near (jnear@cs.indiana.edu) • Aaron Kahn (aakahn@cs.indiana.edu)

  2. Overview • Motivation • High Level Design • Approach • Minor Difficulties (and their solutions) • Major (Fatal) Difficulty (and explanation) • The Proposed Solution • Recap/Summation

  3. Motivation • Our goal for this project was to attempt to formally verify a file system • We were under the impression that this would be a straight forward task, and as long as the abstraction was simple, there wouldn't be any major problems

  4. Limitations • Are doing: • Can take a file number, and read/write to it • Create/Delete files • Not doing: • Directories • File Names • Permissions, Users, Groups, etc • The stuff we're not doing can be added as an abstraction on what we are doing

  5. Design • Develop a B-tree Structure • The B-tree is actually serialized onto a disk • Disk represented as an array of bytes • Create the B-tree algorithms • insert, delete, lookup • Write the File System (read file, write file, create file, etc) algorithms in terms of the B-tree algorithms.

  6. Process • Initially, we wrote the code in Scheme in order to have a fully working model of “live” code to test on, and then translated it in to PVS • In PVS, the file system was abstracted all the way down to a disk representation to allow for better simulation of real problems of writing file systems • This turned out to be essential to our learning the difficulties of actually verifying a file system

  7. Additional Structures • In addition to the B-tree, we found that these auxiliary structures were needed • A free list • Blocks that represent files themselves, but are not part of the B-tree • Single block that holds all of the pointers to the root of the free list and the root of the B-tree (similar to a meta-data block)

  8. Accomplishments • B-tree in Scheme • Thoroughly tested • Were able to successfully translate our code into PVS. • Made a number of discoveries in terms of tricks for proving the algorithms in PVS • However, very late in the game, we discovered a fatal limitation of how we modelled things in PVS • Have ideas for overcoming the problems in the future

  9. Minor Problems • In a large project, there are many minor problems that are surprisingly difficult to solve • These often require the development of a simple but non-obvious trick • We ran into and solved many of these; here is a sample of what we learned • More detail included in report

  10. Search • search(array, start, stop, val) • Search through a sorted array for the first value greater than or equal to the argument; return the position of that value • If no element is greater than the argument, return the length of the array • Unexpectedly difficult to prove • Measured induction on stop – start • Ended up using max(0, stop – start) • Lesson: make sure measure is well-founded; sometimes making it well-founded works

  11. Well-formedness • Designed as part of our testing; believed to be an important part of the proof • Theory: algorithms are correct if they have the desired effect and the disk remains well-formed • Assuming a well-formed disk should give us a basis for proving correctness of our operations • Proved that a newly-formatted disk is well-formed • Partially proved that allocation preserves well-formedness

  12. Well-formedness • Realization: well-formedness is irrelevant! • Well-formedness is defined by the observer (in this case, lookup) • lookup(key, insert(key, value, disk)) = value • If the observer can correctly interpret the data given to it, then that data is well-formed • Lesson: don't waste time proving things about well-formedness

  13. Proving insert • Many uses of let due to state-passing style • Exponential blowup of expression size • Sequents become pages long! • Side effects make proofs difficult • When an object is effected, the sequent clauses no longer apply, even if the change doesn't affect them • User has to prove that the sequent clauses still apply

  14. Main Problem: Side Effects • State Passing Style • Good for modelling state • Easy to implement, familiar • Bad for Proving!

  15. The problem with side effects • Effects Invalidate Assumptions • Given a property about a disk, we need to prove the same property about a modified disk • Example: • If P(disk) then P(write_block(block, disk)) • Even if the effect does not affect P, we have to prove that P still holds • This makes sense: it does not hold automatically!

  16. Obvious solution: Hoare Logic • Substitution enforces separation of variables • So P(x) => P(x) automatically as long as x isn't effected • Red herring: this only helps if we use Abstract Data Types • We serialize our ADT into a single disk object • Side-effecting one part will side-effect all parts, even if we use Hoare Logic

  17. Naive Solution • Prove that side effecting one part of the B-Tree, Free list, etc doesn't effect assumptions about other parts of the disk • Possible, but Impractical • For every algorithm • For every effect • For every clause of the sequent • Must prove that the assumption still holds after the effect • A few such basic proofs were accomplished • But even they were long and easy to get lost in

  18. What we want from a better solution • We want to write ADT style code • We want to write ADT style proofs • We want to push a button and have • Serialized style code • Serialized style proofs • Is it Possible???

  19. What a solution would look like • Serialization Theorems • Example: deserialize(serialize(n)) = n • Fairly easy to prove • Already done • Even grind could do it • Proof that changing one value doesn't effect other values • Hmm...

  20. Proof of effect independence • Language Run-Time for ADT is already doing this • Objects are serialized to memory • Language Run-Time Limitations • Language vs Programmer control of serialization • The Garbage collector • Known Hard Problem • Bad Idea on a Hard Disk

  21. How to avoid GC • We don't need general GC • Side-effect view: • Values only “modified” if only reference • Or not reachable from values used in theorems • ADT view: • Values only “allocated” if we are “freeing” another value • Solution: ...

  22. Linear Types!!!

  23. What are Linear Types? • Objects must always have exactly one reference • No duplication • No erasure • No GC needed • Look Ma, No Garbage! • “Modifying” something is “de-alloc” plus “alloc” • Our algorithms already treat objects as linear • Just need to teach PVS to take advantage of that

  24. Linear Types vs. Monads • Lost the battle of representing state to monads • Maybe could win the war for formal proofs • Pros and Cons • Monads are more General • Non-determinism, environments, etc. • Linear Types provide more guarantees • A reference to a linearly typed object is guaranteed to be the only reference

  25. Recap • File Systems are Full of Bugs • But it is critical that they be right • Verification could fix this • We designed and implemented a File System • B-Tree based • Modelled all the way to “disk” • Auxiliary structures needed • Free List • File Blocks • Root File System Block

  26. Recap • We proved linear search • Lesson: Make sure measures are well-founded • Lesson: Make measures well-founded if they aren't • Well-formedness • Red-herring • Actually defined by observers • Exponential blow-up due to let • Possible Improvement in how PVS presents sequents

  27. Recap • Side effects are hard in an unexpected way • Implementing side effects in PVS is easy • Use State Passing Style (e.g. State monad) • Proving side effects in a serialized common store is hard • Must prove that every effect keeps the theorems true • Number of Proofs exploded beyond our ability

  28. Recap • Linear Types to the Rescue!! • User writes ADT style proofs • System converts them to serialized proofs • Better than Monads • Need Theory for Linear Types in PVS

  29. Final Results • Ultimately had to declare failure • Code is fragmentary • But learned more from failure than success • Main deliverable is report and what not to do • We have good ideas for how to make future attempts • ... and we don't feel too bad because others have estimated verifying a file system to take 2-3 years to accomplish. • A mini-challenge: build a verifiable filesystem. Rajeev Joshi, Gerard J. Holzmann

More Related