SMT based predictable analysis of systems code

SMT based predictable analysis of systems code Shuvendu Lahiri Microsoft Research, Redmond Joint work with: S. Qadeer (MSR) J. Condit, B. Hackett, Z. Rakamaric, T. Wies, J. Voung, J. Galeotti

Problem Modular property checking of C modules • Device drivers, file systems, kernel components,… • Double-free, lock usage, memory safety, user-provided assertions • Goal: Predictable analysis using SMT solvers • Efficiently decidable logics

HAVOC • Property checker for C programs • Active [’06-’09] • Found 100+ errors in various kernel components

HAVOC modular checker C program Annotations C  Boogie Memory model Boogie program Boogie VC gen SMT formula SMT Solver (Z3) Decision Procedures for types, lists, arrays Verified Warning

Challenges imposed for analyzing C Additional challenges (over Java/C#) • Lack of type safety • Presence of low-level data structures • Explicit memory management (free) • Bit-wise operations • ……

Types

p Example: Type Checking IRP IRP ListEntry ListEntry Flink Flink Blink Blink typedefstruct _LIST_ENTRY{ LIST_ENTRY *Flink, *Blink; } LIST_ENTRY, *PLIST_ENTRY; typedefstruct _IRP{ …. LIST_ENTRY ListEntry; … } IRP, *PIRP;

q p Example: Type Checking IRP IRP ListEntry ListEntry Flink Flink Blink Blink q = CONTAINING_RECORD(p, IRP, ListEntry) = (IRP*)((char*)p - &((IRP*)0->ListEntry)) Type Checker:Does variable qhave type IRP*?

q r Example: Property Checking IRP IRP Data1 Data1 ListEntry ListEntry Flink Flink Blink Blink Data2 Data2 ... q->Data2 = 42; Property Checker: Is r->Data1 unchanged?

q r Example: Property Checking Data1 Data2 / Data1 ListEntry ListEntry Flink Flink For all we know, Data1 and Data2 could be aliased! Blink Blink Data2 Data2

Types in C programs • Types in C programs cannot be trusted • Unsafe type casts, pointer arithmetic • Typical type checking in C compilers cannot ensure memory safety • Lack of types hurts property checking • Disambiguation

Lists

Simple type-state property • Allocation type-state of DEV_OBJ • Device Objects (DEV_OBJ) allocated and freed • Property to check for a module • IoDeleteDevice() only called on MyDevObj ~MyDevObj IoCreateDevice() IoDeleteDevice() MyDevObj

Simple property  simple invariants NT_STATUS Unload(…){ …. iter = hd->First; while(iter != null) { RemoveEntryList(iter); iter = iter->Next; IoDeleteDevice(iter->Self); } …. } hd First DEV_EXT DEV_EXT DEV_OBJ DEV_OBJ Next Next Pointers from the list point to distinct objects Self Self DevExt DevExt

Lists • Prevalent in most systems code • Manipulated by explicit pointer operations • Updates to next fields

This talk • Focus on two of these challenges • Lack of type-safety • Presence of low-level data structures • Solution • New efficient SMT theories for the above problems

Overview • Motivation • Background • Exploiting types [POPL’09] • Logic for lists [POPL’08] • Application [CAV’09]

Program Correctness: Floyd-Hoare Triple • Floyd-Hoare triple {P} S {Q} P, Q : predicates/property S : a program • From a state satisfying P, if S executes, • No assertion in S fails, and • Terminating executions end up in a state satisfying Q

Program verification  Formula { b.f = 5 } a.f = 5 { a.f + b.f = 10 } is valid iff Select(f1,b) = 5  f2 = Store(f1,a,5)  Select(f2,a) + Select(f2,b) = 10 is valid theory of equality: f, = theory of arithmetic: 5, 10, + theory of arrays: Select, Store • [Nelson & Oppen ’79]

Satisfiability-Modulo-Theory (SMT) • Boolean satisfiabilitysolving + theoryreasoning • Ground theories • Equality, arithmetic, arrays, bit-vectors, …. • Powerful methods to combine decision procedures for theories • [Nelson & Oppen ’79] • Phenomenal progress in the past few years • Z3, Mathsat, Yices, …. Works best for NP-complete theories

Overview • Motivation • Background • Exploiting types • Logic for lists • Case study

Memory model for C • Each pointer is an integer • Heap as a map // Mutable Mem: intint Alloc: int {UNALLOCATED, ALLOCATED, FREED} // Immutable Base: int int //base address of each pointer

C  Boogie typedef struct { int g[10]; int f;} DATA; DATA *create() { int a; DATA *d = (DATA*) malloc(sizeof(DATA)); init(d->g, 10, &a); d->f = a; d->g[1] = 2; return d; } function f_DATA: int -> int; forall u: int:: f_DATA(u) = u + 40; procedure create() returns d:int{ var @a: int; @a := malloc(4); d := call malloc(44); call init(g_DATA(d),10, @a); Mem[f_DATA(d)] := Mem[@a]; Mem[g_DATA(d) + 1*4]:=2; free(@a); return; }

Missing part: Types? • Types in C programs can’t be trusted • Lack of types hurts property checking

Our Approach • [POPL’09] • Type checking  assertion checking • Provide formal semantics for C and its types • Use types to improve the property checker • Provide Java-style field disambiguation • Provide decision procedures for the assertion checking

Formalizing Type Safety A C program is type safe if the run-time value of every variable and heap location corresponds to its compile-time type. Mem : addr -> value Type : addr -> type HasType : value x type -> bool for all a in addr, HasType(Mem(a), Type(a))

Modeling the Heap • Gives value stored at each heap location • Values are integers • Gives declared type for each heap location • Types include Int, Ptr(Int), … Mem : addr -> value Type : addr -> type

“Match” Predicate Match: addr x type -> bool • Lifts the Type map to multi-word types • Match(a, t) holds iff Type[a … n] matches t C Type C Type HAVOC Axiom HAVOC Axiom structfoo { int n; int m; int *p; } int Match(a, Int) <==> Type[a] == Int Match(a, Foo) <==> Match(a, Int) && Match(a+1, Int) && Match(a+2, Ptr(Int)) int* Match(a, Ptr(Int)) <==> Type[a] == Ptr(Int) ¬Match(101, Foo) Match(99, Foo) Match(101, Ptr(Int)) Match(99, Int) Type Int Int Ptr(Int) Int Ptr(Foo) … 99 100 101 102 103 …

“HasType” Predicate HasType: value x type -> bool • Defines which values belong to each type • HasType(v, t) holds iff v is a value of type t C Type HAVOC Axiom int HasType(v, Int) <==> true t* HasType(v, Ptr(t)) <==> v == 0 || (v > 0 && Match(v, t)) HasType(99, Ptr(Foo)) ¬ HasType(101, Ptr(Foo)) Type Int Int Ptr(Int) Int Ptr(Foo) … 99 100 101 102 103 …

Type Safety Invariant • Part of preconditions, postconditions, loop invariants • Assert at every program point • Add similar assertions for locals (if desired) for all a in addr, HasType(Mem(a), Type(a))

Decision Procedure • Verification conditions refer to Mem, Type, Match, HasType, Type-safety invariant • Decision problem: NP-complete • Provide decision procedure using an SMT solver • Suffices to instantiate the quantifiers in these axioms on a fixed set of terms

q p Example: Type Checking IRP IRP ListEntry ListEntry Flink Flink Blink Blink q = CONTAINING_RECORD(p, IRP, ListEntry) = (IRP*)((char*)p - &((IRP*)0->ListEntry)) Type Checker:Does variable qhave type IRP*?

Solution: Add Preconditions #define ENCL(x) CONTAINING_RECORD(x, record, node) requires( HasType(ENCL(p), record*) && ENCL(p)!= NULL ) void init_record(list *p) { record *r = CONTAINING_RECORD(p, record, node); r->data2 = 42; }

Field Safety Invariant • Field safety • Refinement of type safety • Disambiguate two fields of same type • Change • HasType/Match are refined to distinguish different field names of same type

Adding Field Names struct list { list *prev; list *next; } struct record { int data1; list node; int data2; } Match(a, List) <==> Match(a, Ptr(List)) && Match(a+1, Ptr(List)) Match(a, Record) <==> Match(a, int) && Match(a+1, List) && Match(a+3, int) Match(a, Ptr(List)) <==> Type[a] == Ptr(List) HasType(v, Ptr(List))<==> v == 0 || (v > 0 && Match(v, List)) Match(a, int) <==> Type[a] == int HasType(v, int) <==> true same definition as Int … same for Next and Data2 …

Adding Field Names struct list { list *prev; list *next; } struct record { int data1; list node; int data2; } Match(a, List) <==> Match(a, Prev) && Match(a+1, Next) Match(a, Record) <==> Match(a, Data1) && Match(a+1, List) && Match(a+3, Data2) Match(a, Prev) <==> Type[a] == Prev HasType(v, Prev) <==> v == 0 || (v > 0 && Match(v, List)) Match(a, Data1) <==> Type[a] == Data1 HasType(v, Data1) <==> true same definition as Int … same for Next and Data2 …

Experiments • Implementation supports full C language • Supports polymorphism • Supports user-defined, dependent types • Annotated and checked four Windows drivers • Sample drivers provided with Windows DDK

Enables field splitting Disambiguates writes to fields + faster checking • Can split the heap for “field-safe” programs • One heap map per word-type field and pointer type (almost!) Mem_f: addrval Mem_g : addrval Mem_T*: addrval • Simple example • C code x->f = 1; • Boogie code Mem_f[x + Offset(f)] := 1;

Why almost? struct A {int a; int b; }; struct B {int c; int d; int e;} void P(struct B *x){ struct A *y = (struct A*) x; y->a = 1; assert (x->c == 1); } Field safety assertion will fail Have to merge {a, c} {b, d}

Summary • Types as addition part of the state • Type safety checking  assertion checking • Efficiently decidable (NP) logic • Separation of concern for property checking • Can exploit field disambiguation for “field-safe” programs

Overview • Motivation • Background • Exploiting types • Logic for lists • Case study

Logic for lists • SMT theory with new predicate symbols

Reachability predicate: Btwnf next next next x y prev prev prev data data data Btwnnext(x,y) Btwnprev(y,x)

Inverse of a function: f-1 next next next x y prev prev prev data data data w data-1(w) = {x, y}

Expressive logic • Express properties of collections x Btwnf(f(hd), hd). state(x) = LOCKED //cyclic • Arithmetic reasoning on data (e.g. sortedness) x Btwnf(hd, null) \ {null}. yBtwnf(x, null) \ {null}. d(x)  d(y) • Type/object invariants x Type-1(“__logentry”). logtype(x) > 0 file_name(x) != null

Can express desired invariants NT_STATUS Unload(…){ …. iter = hd->First; while(iter != null) { RemoveEntryList(iter); iter = iter->Next; IoDeleteDevice(iter->Self); } …. } hd First DEV_EXT DEV_EXT DEV_OBJ DEV_OBJ Next Next • x BtwnNext(hd->First,NULL). x->Self->DevExt = x Self Self OR • x BtwnNext(hd->First,NULL). Self-1(x->Self) = {&x->Self} DevExt DevExt

Precise and efficient • [POPL ‘08] • Precision • Given a Floyd-Hoare triple {P} S {Q}, • P/Q are in the assertion logic, and S is a loop-free, call-free code fragment • There is a formula in the assertion logic • Linear in the size of the triple • Valid iff the triple holds • Efficiency • The decision problem is NP-complete

Ground Logic Logic t  Term ::= c | x | t1 + t2 | t1 - t2 | f(t) G  GFormula ::= t = t’| t < t’ | t Btwnf(t1, t2) | G S  Set ::= f-1(t) | Btwnf(t1, t2) F  Formula ::= G | F1 F2 |F1 F2 | x  S. F

Ground decision procedure • Provide a set of 10 rewrite rules for Btwnf • Sound, complete and terminating • E.g. Transitivity3 t1Btwnf(t0, t2) t Btwnf(t0, t1) t Btwnf(t0, t2), t1Btwnf(t, t2)

t  Term ::= c | x | t1 + t2 | t1 - t2 | f(t) G  GFormula ::= t = t’| t < t’ | t  Btwnf(t1, t2) | G Logic Bounded quantification over interpreted sets S  Set ::= f-1(t) | Btwnf(t1, t2) F  Formula ::= G | F1 F2 |F1 F2 | x  S. F

SMT based predictable analysis of systems code

SMT based predictable analysis of systems code

Presentation Transcript

Predictable Development of Reliable Embedded Systems

Status of Systems Code Development

Analysis of GWAP-based Geospatial Tagging Systems

Model-based Analysis and Implementation of Embedded Systems

Predictable Integration of Safety-Critical Software on COTS- based Embedded Systems

Predictable Integration of Safety-Critical Software on COTS- based Embedded Systems

An SMT Based Method for Optimizing Arithmetic Computations in Embedded Software Code

SAT/SMT-Based Verification of Concurrent Systems

Requirements Decomposition Analysis, Model Based Testing of Sequential Code Properties

Non-contiguity phrase-based SMT

Code Analysis

Analysis of Systems Code Results

Predictable Design of Embedded Systems using Networked Architectures

Design and Mathematical Analysis of Agent-based Systems

Part 2: Reachability analysis of stack-based systems

Analysis of Technical and Programmatic Tradeoffs with Systems Code

Designing Predictable and Robust Systems

Predictable Design of Embedded Systems using Networked Architectures

Component Based Systems Analysis

Requirements Decomposition Analysis, Model Based Testing of Sequential Code Properties

Analysis of Systems Code Results