310 likes | 324 Views
This study delves into the static verification of memory safety for device drivers, aiming to eliminate low-level bugs by annotating code and proving true predicates. It explores challenges, automatic and custom semantic models, soundness, and describing heap invariants.
E N D
Static Verification of Memory Safety for Device Drivers Scott McPeak George Necula UC Berkeley OSQ Retreat, 5/15/03
Goal: Eliminate Low-level Bugs • What properties? • Memory safety, including dangling references • Calling order restrictions (open/read, locks) • Why? • Implicit spec • Shallow correctness proofs • Difficult to test and debug • Deallocation: an interesting heap assertion
Why Annotate? • Exploit programmer knowledge • The information is there. Issue is convenience. • Record design decisions • Would you program w/o typing declarations? • Matter of cost/benefit • Annotation only needs to be more efficient than testing and debugging (for these kinds of bugs)
Why is verification hard? • Constructing models of the code • Choice of abstraction is very important • Proving true predicates • Automatic vs. interactive theorem provers • Knowing how to describe the program's heap invariants • Hard concepts: reachability, lack of sharing, topmost in some set, acyclic, ...
Automatic Modeling • Translate C into a language without pointers • Simulate them with sequences, etc. • Example: uniform semantic model M: int ! int [x = e] = x := [e] [*p] = sel(M, p) [*p = e] = M := upd(M, p, [e])
Custom Semantic Models • User provides the mapping from C as a set of pattern rules • Can supply multiple mappings for the same syntax; choose by: type, name, module, etc. • Can map from languages other than C • Bridges gap between hand-constructed models and the real code
Example Models class Foo { int x; int y;}; • Java-like model [p!x] = sel(Foo_x, p) [p!x = e] = Foo_x := upd(Foo_x, p, [e]) • Functional model [cons(x,y)] = cons([x], [y]) [cdr(p)] = cdr([p])
Integers • 1. Simplify/simplex: rationals with integer heuristics • Practical, but unsound upon overflow • 2. Same, but add no-overflow assertions • Sound, but perhaps little gain for effort • 3. Integers with arithmetic mod 232 • For specialized circumstances
Semantic Model Soundness • semantics: program ! abort / not abort • defines a language of non-aborting programs • Need a trusted model, e.g. uniform semantics derived from C99 standard • Custom model: uses higher-level concepts • Soundness: custom µ trusted
Proving True Predicates • Automatic theorem provers are incomplete • Our conclusion: not possible to avoid incompleteness through clever choice of model • Interactive provers are tedious • Combination proof system • Try with automatic prover • If it fails, prove with interactive prover • Yield result as a new lemma for automatic
Describing Heap Invariants • Label every object with a predicate name • At least one name for each type • Names for intermediate states, e.g. initialization • Break recursive invariants • For every pointer, make a back pointer • Powerful, natural, local • Tree structure: only one back pointer • "Threaded heap"
Example: Threaded Heap define global_inv() { forall(Scull_Dev *p). tag[p] == Scull_Dev_tag ==> p->next!=NULL ==> p->next->prev == p; /* ... */ }
Test case: "scull" driver • An example driver; ~500 lines of C • Extensive use of the heap, online allocation and deallocation • Polymorphic use of file.private_data • Array indices computed with div/mod • Reactive (state transitions)
"scull" results • Verified after two days of work: • Casts, array accesses, deallocations • Two bugs • Incorrect interpretation of return value • Security hole: read another proc's old data • As much annotation as code • Already have techniques to eliminate 75% of it
Future Work • Continued improvement in annotations • Aggregation, data hiding in "changes" clauses • Built-in back pointers • Left half / right half approach to array loops • Type qualifiers to distribute predicates • User-written annotation agents • Split memory into regions • Incorporate other heap shape formalisms
Conclusion • Reason for optimism in each problem area • modeling: user chooses the abstraction • proving: use interactive and automatic together • describing: predicate labels and back pointers are a start • The challenge is not one of technology, but of communication
Vision • Programmers know why their programs are (supposed to be) correct • Explanation will be in English, however • Offer a way to conveniently express these reasons, then check them • Make verification practical!
Our Approach • Symbolic execution and strongest postcondition, with non-uniform semantics • Explicit annotation at cutpoints: function boundaries, loop invariants • Sound • Linguistic innovation
Example Models class Foo { int x; int y;}; • Java-like model [p!x] = sel(Foo_x, p) [p!x = e] = Foo_x := upd(Foo_x, p, [e]) • Restricted form of interior pointers[y = f(&p!x)] = temp := f(sel(Foo_x, p)) y := first(temp); Foo_x := .. second(..) .. • Functional model [cons(x,y)] = cons([x], [y])
Basic Annotations • Function pre- and postconditions • post can refer to pre-state values, return value • Loop invariants • void scribble_fives(int *p, int len) pre(0 < p < objct && 0 <= objsize[p] <= len) post(forall(int i). 0 <= i < len ==> mem[p,i]=5) changes(mem);
Annotation Extensions • Global invariant: implicit in pre/post • Automatic invariant strengthening • Named predicates • Aggregation, hiding for 'changes' clauses • Left half / right half notation for arrays • Predicates associated with type qualifiers • User-written annotation assistants
Verification of scull • Linux device driver, implements random-access files backed by RAM; ~500 lines scull_devices
Allocation • Key concept: allocation boundary 0 objct Mem Foo_x tag allocated
Role/Type tags • tag: Addr ! int • [p = (Foo *)malloc(sizeof Foo)] = p := objct; objct := objct + 1; tag := tag{objct := Foo_tag} • C types are effectively first class in model
Role/Type tags • Data structure invariants:8 p. sel(tag, p)=Scull_Dev_tag ) ... p!next ... • Subtyping: sel(tag, p) <: My_Subclass_tag • Type-based disequality • Deallocation [free(p)] = tag := upd(tag, p, 0) • Initialization
Threaded Heap • Heap has a central spanning tree • Invariant: for every tree pointer, the referent object names that pointer • using specification or ghost variables as needed • Example:8 p. sel(tag, p)=Scull_Dev_tag )p!next¹NULL ) p!next!prev = p
scull bugs • Wrong return code interpretation if (pipe_init() > 0) { /* recover from error */ } • Leak trusted kernel data p = kmalloc(4000); memcpy(p+i, src, len);
scull results name paths preds time(s) description 117 1897 125.0
Future Work • Annotate and verify more examples • Implement a variety of abstraction mechanisms for annotation language • Automated assistance for the edit-verify-diagnose cycle • Prove the lemmas that we give to Simplify
Conclusion • It is feasible to verify difficult properties (like lack of dangling refs) in real code • Annotation burden is merely a symptom of inadequate annotation abstractions • Prover's incompleteness can be overcome with lemmas proven with a more powerful prover