Verification of Java Programs using Symbolic Execution and Loop Invariant Generation

Verification of Java Programs using Symbolic Execution and Loop Invariant Generation Corina Pasareanu (Kestrel Technology LLC) Willem Visser (RIACS/USRA) Automated Software Engineering Group NASA Ames

Outline • Motivation and Overview • Examples • Symbolic Execution and Java PathFinder • Program Verification and Invariant Generation • Experiments • Related Work and Conclusions

Mars Polar Lander Ariane 501 Spirit Motivation More recently … Software errors can be very costly. Software verification is recognized as an important and difficult problem.

Java PathFinder with Symbolic Execution Previous work: • Java PathFinder (JPF) - explicit-state model checker for Java • Extended with symbolic execution [TACAS’03] • Motivation • Open systems, large input data domains • Complex data structures • Applications: test-input generation, error detection • Shortcoming: cannot prove properties of looping programs New: • Invariant generation to deal with loops

Verification Framework Overview • Uses symbolic execution • Requires annotations • method preconditions • loop invariants • Novel technique for invariant generation • uses invariant strengthening, approximation, and refinement • handles boolean and numeric constraints, dynamically allocated structures, arrays

Array Example 1 //precondition: a!=null; publicstaticvoidset(inta[]){ inti=0; while(i<a.length){ a[i]=0; i++; } assert a[0] == 0; } Loop invariants: 0≤i ¬(a[0] 0  0<i)

Array Example 2 //precondition: a!=null; publicstaticvoidset(inta[]){ inti=0; while(i<a.length){ a[i]=0; i++; } assertforall int j: a[j] == 0; } Loop invariant: ¬(a[j]  0  a.length ≤ i  0 ≤ j < a.length)  ¬(a[j]  0  j < i  0 ≤ i,j < a.length)

Symbolic Execution • Execute a program on symbolic input values • For each path, build a path condition • condition on inputs in order for the execution to follow that path • check satisfiability of path condition • Symbolic state • symbolic values/expressions for variables • path condition • program counter • Various applications • test case generation • program verification • Traditionally: sequential programs with fixed number of integers

Swap Example Code that swaps 2 integers: Concrete Execution Path: int x, y; if(x > y) { x = x + y; y = x – y; x = x – y; if(x > y) assert false; } x = 1, y = 0 1 > 0 ? true x = 1 + 0 = 1 y = 1 – 0 = 1 x = 1 – 1 = 0 0 > 1 ? false

path condition [PC:X>Y] x = X + Y – X = Y Swap Example Code that swaps 2 integers: Symbolic Execution Tree: int x, y; if (x > y) { x = x + y; y = x – y; x = x – y; if (x > y) assert false; } [PC:true] x = X, y = Y [PC:true] X > Y ? true false [PC:X≤Y] END [PC:X>Y] x = X+Y [PC:X>Y] y = X + Y – Y = X [PC:X>Y] Y > X ? false true [PC:X>YY>X] END [PC:X>YY≤X ] END

Generalized Symbolic Execution • Handles • dynamically allocated data structures, arrays • preconditions, concurrency • Uses JPF • to generate and explore the symbolic execution tree • Implementation via instrumentation • programs instrumented to enable JPF to perform symbolic execution • Omega library used to check satisfiability of numeric path conditions (for linear integer constraints) • lazy initialization for arrays and structures

Instrumentation Instrumented code void set (int a[]) { int i = 0; while (i < a.length) { a[i] = 0; i++; } assert a[0] == 0; } void set() { IntArrayStruct a = newIntArrayStruct(); Expression i = newIntConstant(0); while (i._LT(a.length)) { a._set(i,0); i = i._plus(1); } assert a._get(0)._EQ(0); }

Library classes class ArrayCell { Expression elem; Expression idx; } class IntArrayStruct { … Vector _v; Expression length; public Expression _get(Expression idx) { Verify.ignoreIf !inbounds; //assert inbounds ArrayCell cell = _ArrayCell(idx); return cell.elem; } ArrayCell _ArrayCell(Expression idx) { for(int i=0; i<_v.size(); i++) { ArrayCell cell=(ArrayCell)_v.elementAt(i); if(cell.idx._EQ(idx)) return cell; } ArrayCell ac = new ArrayCell(…); _v.add(ac); return ac; } } class Expression { … static PathCondition pc; Expression _plus(Expression e) {…} boolean _LT(Expression e) { return pc._update_LT(this,e); } } class PathCondition { … Constraints c; boolean _update_LT(Expression l, Expression r) { boolean result = Verify.choose_boolean(); if(result) c.add_constraint_LT(l, r); else c.add_constraint_GE(l, r); Verify.ignoreIf(!c.is_satisfiable()); return result; } }

Non-looping program: X = init; assert Inv(X); X = new symbolic values; assume Inv(X); if (C(X)) { X = B(X); assert Inv(X); } else assert P(X); Base Case Find loop invariantInv Induction Step Proving Properties of Programs Program execution: while … Looping program: true X = init; while (C(X)) X = B(X); assert P(X); while … true while … true … Hasfinite execution. Easy to reason about! Problem: How do we come up with Inv? Requires great user ingenuity. May beinfinite … How to reason about infinite executions?

Induction step violation: apply strengthening - counterexample path conditions: PC1, PC2 … PCn - strengthen invariant: Inv1 = Inv0¬ PCi - repeat Iterative Invariant Strengthening No errors: done, found loop invariant! Model check the program: Base case violation: error in the program! X = init; assert Inv(X); X = new symbolic values; assume Inv(X); if (C(X)) { X = B(X); assert Inv(X); } else assert P(X); Start with Inv0 = ¬(¬C ¬P)

State space: Inv0 Inv1 … Inv May result in an infinite sequence of exact invariants: Inv0, Inv1, Inv2 … (we may get infinitely many generated constraints) More precise invariants Iterative Strengthening

Iterative approximation Check the inductive step: • -it is also iterative strengthening, but … • -use oldPC instead of PC • - oldPC is weaker than PC (PC → oldPC) • obtains a stronger invariant: • Invkj+1 = Invkj¬  oldPCi X = init; assert Inv(X); X = new symbolic values; assume Inv_k(X); if (C(X)) { X = B(X); assert Inv_k(X); } else assert P(X); oldPC= q r PC= q r  v new constraint (encodes the effect of the loop) Heuristic for Termination At each step k, apply heuristic for current candidate Invk

Check the inductive step: X = init; assert Inv(X); X = new symbolic values; assume Inv_k(X); if (C(X)) { X = B(X); assert Inv_k(X); } else assert P(X); • Refinement: • if base case fails for Invkj • backtrack • compute Invk+1 • apply approximation Approximation too coarse Invk Invk+1 State space at step k: Inv Invk1 PC oldPC Iterative Approximation Results in finite sequence of approximate invariants: Invk1, Invk2 … Invkm Symbolic execution results in finite universe of constraintsUk New constraintsfrom Inv(B(X))

Inv0 Inv1 … Invk Invk+1 … Refinement - backtrack on base case violation Invk1Invk2…Invkm Invariant Generation Method Iterative strengthening Iterative approximation • If there is an error in the program, the method is guaranteed to terminate • If the program is correct wrt. the property, the method might not terminate

Array Example 1 //precondition: a!=null; publicstaticvoidset(inta[]){ inti=0; while(i<a.length){ a[i]=0; i++; } assert a[0] == 0; }

[PC: I<a.length  a[0]0  I=0] … [PC: I<a.length  a[0]0  I0  0≤I<a.length] [PC: 0<I<a.length  a[0]0] a[I]=0 Proof + tree Inv0 = ¬(i ≥ a.length  a[0]  0)= (i<a.length  a[0]  0) (i<a.length  a[0] = 0) (i ≥ a.length  a[0] = 0) publicstaticvoid set(int [] a) { int i = 0; assert Inv; //i,a = new symbolic values; assume Inv; if (i < a.length) { a[i]=0; i++; // oldPC if (!Inv) { // PC assert false; } } else assert a[0]==0; } [PC: I<a.length  a[0]0] i=I Error

[PC: I<a.length  a[0]0  I=0] … [PC: I<a.length  a[0]0  I0  0≤I<a.length] [PC: 0<I<a.length  a[0]0] a[I]=0 [PC: 0<I<a.length  a[0]0] i=I+1 Proof + tree Inv0 = ¬(i ≥ a.length  a[0]  0)= (i<a.length  a[0]  0) (i<a.length  a[0] = 0) (i ≥ a.length  a[0] = 0) publicstaticvoid set(int [] a) { int i = 0; assert Inv; //i,a = new symbolic values; assume Inv; if (i < a.length) { a[i]=0; i++; // oldPC if (!Inv) { // PC assert false; } } else assert a[0]==0; } [PC: I<a.length  a[0]0] i=I Error

[PC: I<a.length  a[0]0  I=0] … [PC: I<a.length  a[0]0  I0  0≤I<a.length] [PC: 0<I<a.length  a[0]0] a[I]=0 [PC: 0<I<a.length  a[0]0] i=I+1 [PC: 0<I<a.length  a[0]0] I+1≥a.length  a[0]0 ? … true [PC: 0<I<a.length a[0]0 I+1≥a.length] Error Proof + tree Inv0 = ¬(i ≥ a.length  a[0]  0)= (i<a.length  a[0]  0) (i<a.length  a[0] = 0) (i ≥ a.length  a[0] = 0) publicstaticvoid set(int [] a) { int i = 0; assert Inv; //i,a = new symbolic values; assume Inv; if (i < a.length) { a[i]=0; i++; // oldPC if (!Inv) { // PC assert false; } } else assert a[0]==0; } [PC: I<a.length  a[0]0] i=I Error

[PC: I<a.length  a[0]0  I=0] … [PC: I<a.length  a[0]0  I0  0≤I<a.length] oldPC: [PC: 0<I<a.length  a[0]0] a[I]=0 [PC: 0<I<a.length  a[0]0] i=I+1 [PC: 0<I<a.length  a[0]0] I+1≥a.length  a[0]0 ? true PC: [PC: 0<I<a.length a[0]0 I+1≥a.length] oldPC: 0<i <a.length a[0]  0 PC: 0<i <a.length a[0]  0(i + 1) ≥ a.length Iterative approximation: Inv01 =Inv0¬oldPC = ¬(i ≥ a.length  a[0]  0)¬(0<i <a.length a[0]  0) = ¬(i > 0 a[0]  0) Proof + tree Inv0 = ¬(i ≥ a.length  a[0]  0)= (i<a.length  a[0]  0) (i<a.length  a[0] = 0) (i ≥ a.length  a[0] = 0) publicstaticvoid set(int [] a) { int i = 0; assert Inv; //i,a = new symbolic values; assume Inv; if (i < a.length) { a[i]=0; i++; // oldPC if (!Inv) { // PC assert false; } } else assert a[0]==0; } [PC: I<a.length  a[0]0] i=I Error … Error

Array Example 2 //precondition: a!=null; publicstaticvoidset(inta[]){ inti=0; while(i<a.length){ a[i]=0; i++; } assert forall int j: a[j] == 0; }

Proof publicstaticvoidset(int [] a){ inti=0; assert Inv; //i,a = new symbolic values; //j = new symbolic value; assume Inv; if (i < a.length) { a[i]=0; i++; // oldPC if (!Inv) { // PC assert false; } } else assert a[j]==0; } Inv0 = ¬(i ≥ a.length  a[j]  0 0 ≤ j< a.length) oldPC: a[j]  0  j < i  0 ≤ i,j < a.length PC: a[j]  0  j<i  0 ≤ i,j <a.length (i + 1) ≥ a.length Iterative approximation: Inv01 =Inv0¬oldPC = ¬(i ≥ a.length  a[j]  0 0 ≤ j< a.length)  ¬(a[j]  0  j < i  0 ≤ i,j < a.length)

Partition Example class Cell { int val; Cell next; Cell partition (Cell l, int v) { Cell curr = l, prev = null; Cell nextCurr, newl = null; while (curr != null) { nextCurr = curr.next; if (curr.val > v) { if (prev != null) prev.next = nextCurr; if (curr == l) l = nextCurr; curr.next = newl; assert curr != prev; newl = curr; } else prev = curr; curr = nextCurr; } return newl; }} Loop invariant: ¬(curr=prev  curr≠null  curr.elem>v)  ¬(curr≠prev  prev≠null  curr ≠null  prev.elem>v  curr.elem>v  prev≠curr.next)

Pathological Example • First, attempt computation of invariant for loop 2 • Iterative invariant generation does not terminate • Constraint x=y is important, but not discovered • Using x=y as a hint we get two invariants: Loop 2: ¬(y  0 x=0)  ¬(y ≤ 0 x>0)  ¬(y>0 x y) Loop 1: ¬(x  y x ≥ n)  ¬(x<0)  ¬(x≥0  x<n  x y) void m (int n) { int x = 0; int y = 0; while (x <n) {/* loop 1 */ x++; y++; } /* hint: x == y */ while (x!=0) {/* loop 2 */ x--; y--; } assert y==0; }

Related Work • Invariant generation: • INVEST (Verimag), STEP (Stanford) • Graf & Saidi (CAV 1996), Havelund & Shankar (FME 1996), Tiwari et al. (TACAS 2001), Wegbreit (CACM 1974) • … (a lot of work) • Iterative forward/backward computations • Domain specific; focus on numeric invariants • Heuristics for termination, e.g. using auxiliary invariants • Abstract interpretation • Cousot & Cousot (CAV 2002), Cousot & Halbwachs (POPL 1978) • Widening operator to compute fixpoints systematically • Flanagan & Qadeer (POPL 2002) • Loop invariant generation for Java programs • Uses predicate abstraction • Predicates need to be provided by the user • Extended Static Checker (ESC) • Uses theorem proving to check partial correctness specifications of Java programs • Rely heavily on user provided specifications, such as loop invariants

Conclusion and Future Work • Framework for verification of light-weight specifications of Java programs: new use of JPF • Iterative technique for discovering (some) loop invariants automatically • Uses invariant strengthening, approximation, and refinement • Handles different types of constraints • Allows checking universally quantified formulas • … Very preliminary work • Future work: • Instead of dropping newly generated constraints, replace them with an appropriate boolean combination of exiting constraints from Uk • Similar to predicate abstraction • Use more powerful abstraction techniques in conjunction with our framework • Use heuristics/dynamic methods to discover useful constraints/hints (e.g. Daikon) • Study relationship to widening and predicate abstraction • Extend to multithreading and richer properties • …

Verification of Java Programs using Symbolic Execution and Loop Invariant Generation