390 likes | 597 Views
HAVOC: A precise and scalable verifier for systems software. Shaz Qadeer Microsoft Research. Collaborators. Researchers Jeremy Condit, Shuvendu Lahiri Interns Shaunak Chatterjee , Brian Hackett, Zvonimir Rakamaric , Ian Wehrman , Thomas Wies. HAVOC. Modular verifier for C programs
E N D
HAVOC: A precise and scalable verifier for systems software Shaz Qadeer Microsoft Research
Collaborators • Researchers • Jeremy Condit, Shuvendu Lahiri • Interns • ShaunakChatterjee, Brian Hackett, ZvonimirRakamaric, Ian Wehrman, Thomas Wies
HAVOC • Modular verifier for C programs • Verifies each procedure separately • Requires contracts: preconditions, postconditions, modifies clauses, loop invariants • Features • Accurate heap model • Expressive annotation language • Efficient checking using SMT solvers • Precise and efficient reasoning for loop-free and call-free code
Annotated C program Visual C Front End Control flow graph CtoBoogiePL Memory model Boogie program Boogie VCGenerator Verification condition Z3 SMT solver Verified Warning
Challenges for HAVOC • Concise and precise expression of non-aliasing and disjointness of heap values • Properties of unbounded collections • Lists, Arrays, … • Enable such reasoning for low-level software • pointer arithmetic • interior pointers • nested structures and unions • …
But will programmers ever write contracts? • In some cases, they might • security properties: thousands of buffer annotations in Windows code • maintenance of critical legacy code: the Windows NT file system • Automatic annotation inference • precise and efficient checking of annotated programs is a crucial first step
Roadmap • Novel features of the specification language • Dealing with low-level features of C • Concluding remarks
log_list.head log_list.tail next next next prev prev prev LinkNode data data data char * channel_name file_name logtype struct _logentry [muh: Internet Relay Chat (IRC) bouncer]
LinkNode *iter = log_list.head; while (iter != null) { struct _logentry *entry = iter->data; free (entry->channel_name); free (entry->file_name); free (entry); entry = NULL; iter = iter->next; } Ensure absence of double free Data structure invariant Reachability predicate For every node x in the list between log_list.head and null: x->data is a unique pointer, and x->data->channel_name is a unique pointer, and x->data->file_name is a unique pointer. Universal quantification
Limitations of SMT solvers • No support for precise reasoning with reachability predicate • Incompleteness in Floyd-Hoare proofs for straight line code • Brittle support for quantifiers • Complexity: NP-complete (ground) undecidable • Leads to unpredictable behavior of verifiers • Proof times, proof success rate • Requires user ingenuity to craft axioms/invariants with quantifiers
Contribution • Expressive and efficient logic for precise reasoning about reachability, unique pointers, and restricted quantification • A decision procedure for the logic built over an SMT solver
Simple Java-like memory model • Heap consists of a set of objects (obj) • Each field “f” is a mutable map • f: obj obj • g: obj int • h: obj bool • The sort obj may be refined into a collection of sorts
Reachability predicate: Btwnf next next next x y prev prev prev data data data Btwnnext(x,y) Btwnprev(y,x)
Inverse of a function: f-1 next next next x y prev prev prev data data data w data-1(w) = {x, y}
LinkNode *iter = log_list.head; while (iter != null) { struct _logentry *entry = iter->data; free (entry->channel_name); free (entry->file_name); free (entry); entry = NULL; iter = iter->next; } Data structure invariant For every node x in the list between log_list.head and null: x->data is a unique pointer, and …. x Btwnf(log_list.head, null) \ {null}. data-1(data(x)) = {x} ….
Expressive logic • Express properties of collections x Btwnf(f(hd), hd). state(x) = LOCKED //cyclic • Arithmetic reasoning on data (e.g. sortedness) x Btwnf(hd, null) \ {null}. yBtwnf(x, null) \ {null}. d(x) d(y)
Precise Need annotations/abstractions only at procedure/loop boundaries • Given the Floyd-Hoare triple X = {P} S {Q} • P and Q are expressed in our logic • S is a loop-free call-free program • We can construct a formula Y in our logic • Y is linear in the size of X • X is valid iff Y is valid
Efficient • Decision problem is NP-complete • Can’t expect any better with propositional logic! • Retains the complexity of current SMT logics • Provide a decision procedure for the logic on top of state-of-the-art Z3 SMT solver • Leverages powerful ground-theory reasoning (arithmetic, arrays, uninterpreted functions…)
Ground Logic Logic t Term ::= c | x | t1 + t2 | t1 - t2 | f(t) G GFormula ::= t = t’| t < t’ | t Btwnf(t1, t2) | G S Set ::= f-1(t) | Btwnf(t1, t2) F Formula ::= G | F1 F2 |F1 F2 | x S. F
Ground decision procedure • Provide a set of 10 rewrite rules for Btwnf • Sound, complete and terminating • E.g. Transitivity3 t1 Btwnf(t0, t2) t Btwnf(t0, t1) t Btwnf(t0, t2), t1 Btwnf(t, t2)
t Term ::= c | x | t1 + t2 | t1 - t2 | f(t) G GFormula ::= t = t’| t < t’ | t Btwnf(t1, t2) | G Logic Bounded quantification over interpreted sets S Set ::= f-1(t) | Btwnf(t1, t2) F Formula ::= G | F1 F2 |F1 F2 | x S. F
Lazy quantifier instantiation • Instantiation rule t Sx S. F F[t/x] • Lazy instantiation • Instantiate only when a term t belongs to the set S • Substantially reduces the number of terms to instantiate a quantified fact • Terminates if x S. F is sort-restricted • sort(x) is less than sort(t[x]) for any term t[x] in F
Experience • Compared with an earlier implementation • Unrestricted quantifiers, incomplete axiomatization of reachability, no f-1 • Small to medium sized benchmarks • Greatly improved the predictability of HAVOC • Reduced runtimes (2X – 100X) • Eliminate need for carefully crafted axioms and invariants • Can handle newer examples
Roadmap • Novel features of the specification language • Dealing with low-level features of C • Concluding remarks
p struct list { list *next; list *prev; }; struct record { int data1; list node; int data2; }; q record record data1 next prev data2 data1 next prev data2 q = CONTAINER(p, record, node) = (record *) ((int *) p – (int) (&(((record *)0)node))) = (record *) ((int *) p – 1)
void init_all_records(list *p) { while (p != NULL) { init_record(p); p = p->next; } } void init_record(list *p) { record *r = CONTAINER(p, record, node); r->data2 = 42; } • Type safety requires nontrivial reasoning • the container of every element in list has type record* • Use of memory model with field abstraction is unsound • Field abstraction is crucial to all property checkers • &a->data1 is not aliased to &b->data2 • init_all_records(p) preserves the assertion a->data1 == 0
Unify type checking and property checking • Harness the power of constraint solvers to enhance type checking • type safety often depends on program-specific invariants • Harness the strong guarantees provided by the type invriant to enhance property checking • non-aliasing, field abstraction
Mem:int int Type:int type Mutable Immutable 102 101 Ptr(Int) Ptr(List) Ptr(Record) 100 100 Int List Record 99 int type Type invariant: a:int. HasType(Mem(a), Type(a))
void init_record(list *p) { record *r = CONTAINER(p, record, node); r->data2 = 42; } struct list { list *next; list *prev; }; struct record { int data1; list node; int data2; }; requires a:int. HasType(Mem(a), Type(a)) requires HasType(p, Ptr(List)) ensures a:int. HasType(Mem(a), Type(a)) void init_record(int p) { var r:int; r := p-1; assert HasType(r, Ptr(Record)); Mem(r+3) := 42; assert a:int. HasType(Mem(a), Type(a)); }
struct list { list *next; list *prev; }; HasType(v, Int) true HasType(v, Ptr(t)) v = 0 (v > 0 Match(v, t)) struct record { int data1; list node; int data2; }; Match(a, Int) Type(a) = Int Match(a, Ptr(t)) Type(a) = Ptr(t) Match(a, List) Match(a, Ptr(List)) Match(a+1, Ptr(List)) Match(a, Record) Match(a, Int) Match(a+1, List) Match(a+3, Int)
void init_record(list *p) { record *r = CONTAINER(p, record, node); r->data2 = 42; } struct list { list *next; list *prev; }; struct record { int data1; list node; int data2; }; requires HasType(p-1, Ptr(Record)) p - 1 0 requires a:int. HasType(Mem(a), Type(a)) requires HasType(p, Ptr(List)) ensures a:int. HasType(Mem(a), Type(a)) void init_record(int p) { var r:int; r := p-1; assert HasType(r, Ptr(Record)); Mem(r+3) := 42; assert a:int. HasType(Mem(a), Type(a)); }
struct list { list *next; list *prev; }; HasType(v, Int) true HasType(v, Data1) true HasType(v, Data2) true HasType(v, Ptr(t)) v = 0 (v > 0 Match(v, t)) struct record { int data1; list node; int data2; }; Match(a, Int) Type(a) = Int Match(a, Data1) Type(a) = Data1 Match(a, Data2) Type(a) = Data2 Match(a, Ptr(t)) Type(a) = Ptr(t) Match(a, List) Match(a, Ptr(List)) Match(a+1, Ptr(List)) Match(a, Record) Match(a, Data1) Match(a+1, List) Match(a+3, Data2)
Other highlights • Decision procedure for type safety • suffices to instantiate the type invariant and definitions of Match and HasType on few terms • Extensions • unions • function pointers • parametric polymorphism • user-defined types • sub-word accesses (char, short)
Experience • Property checking on small benchmarks • list-manipulation: insertion, removal, multiple lists each with a different container type • sorting: bubble sort, merge sort, quick sort • intuitive and concise annotations • Type checking of four WDK drivers • cancel, event, kbfiltr, vserial • ~1 min to check each driver • ~5KLOC, ~225 annotations
Roadmap • Novel features of the specification language • Dealing with low-level features of C • Concluding remarks
Other case studies with HAVOC • Synchronization protocols protecting critical data structures in the NT file system (Brian Hackett) • ~300KLOC, 1500 procedures • reference count usage, lock usage, data races, teardown races • 45 confirmed bugs (out of 125 warnings) • most bugs fixed • Spin lock usage in Windows device drivers (Juan Pablo Galeotti, Thomas Wies) • flpydisk, kbdclass, daytona, serial (~50KLOC)
HAVOC is available • Download: • http://research.microsoft.com/projects/HAVOC
Future directions • Unified decision procedure for reachability, inverse, arrays, and types for the low-level memory model • Exploiting type invariant for property checking on device drivers • Annotation inference