HAVOC: A precise and scalable verifier for systems software

HAVOC: A precise and scalable verifier for systems software Shaz Qadeer Microsoft Research

Collaborators • Researchers • Jeremy Condit, Shuvendu Lahiri • Interns • ShaunakChatterjee, Brian Hackett, ZvonimirRakamaric, Ian Wehrman, Thomas Wies

HAVOC • Modular verifier for C programs • Verifies each procedure separately • Requires contracts: preconditions, postconditions, modifies clauses, loop invariants • Features • Accurate heap model • Expressive annotation language • Efficient checking using SMT solvers • Precise and efficient reasoning for loop-free and call-free code

Annotated C program Visual C Front End Control flow graph CtoBoogiePL Memory model Boogie program Boogie VCGenerator Verification condition Z3 SMT solver Verified Warning

Challenges for HAVOC • Concise and precise expression of non-aliasing and disjointness of heap values • Properties of unbounded collections • Lists, Arrays, … • Enable such reasoning for low-level software • pointer arithmetic • interior pointers • nested structures and unions • …

But will programmers ever write contracts? • In some cases, they might • security properties: thousands of buffer annotations in Windows code • maintenance of critical legacy code: the Windows NT file system • Automatic annotation inference • precise and efficient checking of annotated programs is a crucial first step

Roadmap • Novel features of the specification language • Dealing with low-level features of C • Concluding remarks

log_list.head log_list.tail next next next prev prev prev LinkNode data data data char * channel_name file_name logtype struct _logentry [muh: Internet Relay Chat (IRC) bouncer]

LinkNode *iter = log_list.head; while (iter != null) { struct _logentry *entry = iter->data; free (entry->channel_name); free (entry->file_name); free (entry); entry = NULL; iter = iter->next; } Ensure absence of double free Data structure invariant Reachability predicate For every node x in the list between log_list.head and null: x->data is a unique pointer, and x->data->channel_name is a unique pointer, and x->data->file_name is a unique pointer. Universal quantification

Limitations of SMT solvers • No support for precise reasoning with reachability predicate • Incompleteness in Floyd-Hoare proofs for straight line code • Brittle support for quantifiers • Complexity: NP-complete (ground)  undecidable • Leads to unpredictable behavior of verifiers • Proof times, proof success rate • Requires user ingenuity to craft axioms/invariants with quantifiers

Contribution • Expressive and efficient logic for precise reasoning about reachability, unique pointers, and restricted quantification • A decision procedure for the logic built over an SMT solver

Simple Java-like memory model • Heap consists of a set of objects (obj) • Each field “f” is a mutable map • f: obj obj • g: obj  int • h: obj  bool • The sort obj may be refined into a collection of sorts

Reachability predicate: Btwnf next next next x y prev prev prev data data data Btwnnext(x,y) Btwnprev(y,x)

Inverse of a function: f-1 next next next x y prev prev prev data data data w data-1(w) = {x, y}

LinkNode *iter = log_list.head; while (iter != null) { struct _logentry *entry = iter->data; free (entry->channel_name); free (entry->file_name); free (entry); entry = NULL; iter = iter->next; } Data structure invariant For every node x in the list between log_list.head and null: x->data is a unique pointer, and …. x Btwnf(log_list.head, null) \ {null}. data-1(data(x)) = {x} ….

Expressive logic • Express properties of collections x Btwnf(f(hd), hd). state(x) = LOCKED //cyclic • Arithmetic reasoning on data (e.g. sortedness) x Btwnf(hd, null) \ {null}. yBtwnf(x, null) \ {null}. d(x)  d(y)

Precise Need annotations/abstractions only at procedure/loop boundaries • Given the Floyd-Hoare triple X = {P} S {Q} • P and Q are expressed in our logic • S is a loop-free call-free program • We can construct a formula Y in our logic • Y is linear in the size of X • X is valid iff Y is valid

Efficient • Decision problem is NP-complete • Can’t expect any better with propositional logic! • Retains the complexity of current SMT logics • Provide a decision procedure for the logic on top of state-of-the-art Z3 SMT solver • Leverages powerful ground-theory reasoning (arithmetic, arrays, uninterpreted functions…)

Ground Logic Logic t  Term ::= c | x | t1 + t2 | t1 - t2 | f(t) G  GFormula ::= t = t’| t < t’ | t  Btwnf(t1, t2) | G S  Set ::= f-1(t) | Btwnf(t1, t2) F  Formula ::= G | F1 F2 |F1 F2 | x  S. F

Ground decision procedure • Provide a set of 10 rewrite rules for Btwnf • Sound, complete and terminating • E.g. Transitivity3 t1 Btwnf(t0, t2) t  Btwnf(t0, t1) t  Btwnf(t0, t2), t1 Btwnf(t, t2)

t  Term ::= c | x | t1 + t2 | t1 - t2 | f(t) G  GFormula ::= t = t’| t < t’ | t  Btwnf(t1, t2) | G Logic Bounded quantification over interpreted sets S  Set ::= f-1(t) | Btwnf(t1, t2) F  Formula ::= G | F1 F2 |F1 F2 | x  S. F

Lazy quantifier instantiation • Instantiation rule t  Sx  S. F F[t/x] • Lazy instantiation • Instantiate only when a term t belongs to the set S • Substantially reduces the number of terms to instantiate a quantified fact • Terminates if x  S. F is sort-restricted • sort(x) is less than sort(t[x]) for any term t[x] in F

Experience • Compared with an earlier implementation • Unrestricted quantifiers, incomplete axiomatization of reachability, no f-1 • Small to medium sized benchmarks • Greatly improved the predictability of HAVOC • Reduced runtimes (2X – 100X) • Eliminate need for carefully crafted axioms and invariants • Can handle newer examples

p struct list { list *next; list *prev; }; struct record { int data1; list node; int data2; }; q record record data1 next prev data2 data1 next prev data2 q = CONTAINER(p, record, node) = (record *) ((int *) p – (int) (&(((record *)0)node))) = (record *) ((int *) p – 1)

void init_all_records(list *p) { while (p != NULL) { init_record(p); p = p->next; } } void init_record(list *p) { record *r = CONTAINER(p, record, node); r->data2 = 42; } • Type safety requires nontrivial reasoning • the container of every element in list has type record* • Use of memory model with field abstraction is unsound • Field abstraction is crucial to all property checkers • &a->data1 is not aliased to &b->data2 • init_all_records(p) preserves the assertion a->data1 == 0

Unify type checking and property checking • Harness the power of constraint solvers to enhance type checking • type safety often depends on program-specific invariants • Harness the strong guarantees provided by the type invriant to enhance property checking • non-aliasing, field abstraction

Mem:int int Type:int type Mutable Immutable 102 101 Ptr(Int) Ptr(List) Ptr(Record) 100 100 Int List Record 99 int type Type invariant: a:int. HasType(Mem(a), Type(a))

void init_record(list *p) { record *r = CONTAINER(p, record, node); r->data2 = 42; } struct list { list *next; list *prev; }; struct record { int data1; list node; int data2; }; requires a:int. HasType(Mem(a), Type(a)) requires HasType(p, Ptr(List)) ensures a:int. HasType(Mem(a), Type(a)) void init_record(int p) { var r:int; r := p-1; assert HasType(r, Ptr(Record)); Mem(r+3) := 42; assert a:int. HasType(Mem(a), Type(a)); }

struct list { list *next; list *prev; }; HasType(v, Int)  true HasType(v, Ptr(t))  v = 0  (v > 0  Match(v, t)) struct record { int data1; list node; int data2; }; Match(a, Int)  Type(a) = Int Match(a, Ptr(t))  Type(a) = Ptr(t) Match(a, List)  Match(a, Ptr(List))  Match(a+1, Ptr(List)) Match(a, Record)  Match(a, Int)  Match(a+1, List)  Match(a+3, Int)

void init_record(list *p) { record *r = CONTAINER(p, record, node); r->data2 = 42; } struct list { list *next; list *prev; }; struct record { int data1; list node; int data2; }; requires HasType(p-1, Ptr(Record))  p - 1  0 requires a:int. HasType(Mem(a), Type(a)) requires HasType(p, Ptr(List)) ensures a:int. HasType(Mem(a), Type(a)) void init_record(int p) { var r:int; r := p-1; assert HasType(r, Ptr(Record)); Mem(r+3) := 42; assert a:int. HasType(Mem(a), Type(a)); }

struct list { list *next; list *prev; }; HasType(v, Int)  true HasType(v, Data1)  true HasType(v, Data2)  true HasType(v, Ptr(t))  v = 0  (v > 0  Match(v, t)) struct record { int data1; list node; int data2; }; Match(a, Int)  Type(a) = Int Match(a, Data1)  Type(a) = Data1 Match(a, Data2)  Type(a) = Data2 Match(a, Ptr(t))  Type(a) = Ptr(t) Match(a, List)  Match(a, Ptr(List))  Match(a+1, Ptr(List)) Match(a, Record)  Match(a, Data1)  Match(a+1, List)  Match(a+3, Data2)

Other highlights • Decision procedure for type safety • suffices to instantiate the type invariant and definitions of Match and HasType on few terms • Extensions • unions • function pointers • parametric polymorphism • user-defined types • sub-word accesses (char, short)

Experience • Property checking on small benchmarks • list-manipulation: insertion, removal, multiple lists each with a different container type • sorting: bubble sort, merge sort, quick sort • intuitive and concise annotations • Type checking of four WDK drivers • cancel, event, kbfiltr, vserial • ~1 min to check each driver • ~5KLOC, ~225 annotations

Other case studies with HAVOC • Synchronization protocols protecting critical data structures in the NT file system (Brian Hackett) • ~300KLOC, 1500 procedures • reference count usage, lock usage, data races, teardown races • 45 confirmed bugs (out of 125 warnings) • most bugs fixed • Spin lock usage in Windows device drivers (Juan Pablo Galeotti, Thomas Wies) • flpydisk, kbdclass, daytona, serial (~50KLOC)

HAVOC is available • Download: • http://research.microsoft.com/projects/HAVOC

Future directions • Unified decision procedure for reachability, inverse, arrays, and types for the low-level memory model • Exploiting type invariant for property checking on device drivers • Annotation inference

Questions

HAVOC: A precise and scalable verifier for systems software