340 likes | 349 Views
This case study explores the use of static analysis for program verification, specifically focusing on the challenges of handling pointers and dynamically allocated objects. The study presents a prototype implementation in TVLA (Three-Valued Logic Analyzer) and discusses the benefits and limitations of this approach.
E N D
Putting Static Analysis to Work for Verification A Case Study Tal Lev-Ami Thomas Reps Mooly Sagiv Reinhard Wilhelm
Program Verification • Mathematically prove that the program is “partially” correct on all inputs • Example: Hoare style verification x n {x := x + 1} x n + 1
Why Use Program Verification? • Debugging programs is hard • Testing can only show the presence of errors - not their absence • Can provide counter examples • ...
Obstacles to Program Verification • Hard to specify software • Does not “scale” • Limited program size • Programmer needs to provide loop invariants • Pointers and dynamically allocated objects are not handled
Our Goals • Handle pointers and dynamically allocated objects (unbounded memory and/or multi-threading) • No loop invariants • Input: pre {Procedure} post • Output: • A safe approximation to the strongest postcondition p • Issue a warning if p post • Conservative: • Never misses an error • May yield false warnings
L insert_sort(L x) { L r, pr, rn, l, pl; r = x; pr = NULL; while (r != NULL) { l = x; rn = r ->n; pl = NULL; while (l != r) { if (l->data > r->data) { pr->n = rn; r->n = l; if (pl == NULL) x = r; else pl->n = r; r = pr; break; } pl = l; l = l->n; } pr = r; r = rn; } return x; } list(x) typedef struct node { int data; struct node *n; *L; olist(x)
int main() { L x, y, z, w; L create(), insert_sort(L); L merge(L,L), reverse(L); x = create(); x = insert_sort(x); y = create(); y = insert_sort(y); z = merge(x,y); w = reverse(z); } list(x) olist(x) olist(y) list(y) olist(z) rolist(w)
Formulae over program variables express pre- and post-conditions The assignment rule is used to generate the strongest postcondition for non-destructive updates Programmer provides loop invariants Finite set of descriptors express pre- and post-conditions Predicate-update formulae specify safe set of descriptors (abstract semantics) Iteratively explore all the descriptors at every program point (abstract interpretation) The ADT designer can provide domain specific information via instrumentation Our Approach Conventional Verification
Outline of the Rest of this Talk • Concentrate on sorting • DescriptorsCompact representation of stores • State-space exploration via abstract interpretation • Prototype implementation in TVLAThree-Valued Logic Analyzer • Conclusions
19 7 96 null p[x](v) predicates n(v1, v2) dle(v1, v2) dle n n dle p[x]=1 p[x]=0 p[x]=0 dle dle dle dle Logical representation of stores x data n data n data n
½ Information order Three-Valued Logic • 1 - True • 0 - False • ½ = {1, 0} Unknown • A join semi-lattice 0 1 = ½
19 7 96 null n n dle • n • n p[x]=0 p[x]=1 dle dle dle Blurred Representation of Stores x data n data n data n dle p[x]=1 p[x]=0 p[x]=0 dle dle dle dle
Parametric Abstraction (Blur) • Merge all the nodes with the same unary “abstraction” predicate values into a single summary node • Join predicate values • Convert a structure of arbitrary size into a 3-valued structure of bounded size
Instrumentation • Explicitly maintains information about distinctions among cells • Leads to less blurring when used as abstraction predicates • Unary predicates defined via a first order formula+transitive closure • Example “local order” • inOrder[n](v) = v1: n(v, v1) dle(v, v1) • inROrder[n](v) = v1: n(v, v1) dle(v1, v)
dle n n • n • n p[x]=0 inOrder[n]=1 p[x]=1 inOrder[n]=1 dle dle dle Blurred Representation of Stores inOrder[n](v1) p[x](v) n(v1, v2) dle(v1, v2) dle p[x]=1 inOrder[n]=1 p[x]=0 inOrder[n]=1 p[x]=0 inOrder[n]=1 dle dle dle dle
vs. • n • n p[x]=0 inOrder[n]=1 p[x]=1 inOrder[n]=1 dle dle dle Arbitrary Lists • n p[x]=1 inOrder[n]=½ p[x]=0 inOrder[n]=½ • n dle dle dle
Abstract Interpretation • Iteratively compute a set of structures at every program location • Conservatively interpret statements (conditions) on blurred structures • Must terminate since the number of blurred structures is finite for a given program • Fully automatic • Guaranteed to be sound • But may be overly conservative
Abstract Interpretation of Insertion Sort • n • n p[x]=1 inOrder[n]=½ p[x]=0 inOrder[n]=½ dle dle dle • n • n p[x]=0 inOrder[n]=1 p[x]=1 inOrder[n]=1 dle dle dle
The Key Problem • How to interpret statements (conditions) on blurred structures? • Difficult to provide a conservative (and reasonably precise) interpretation • It is difficult to show that specific abstractions are conservative (Sagiv, Reps, Wilhelm, TOPLAS 98) • Long and intimidating proofs • Or no proofs (and bugs)
The 3 Valued-Logic Approach • Automatically derives a conservative interpretation of statements and conditions from: • structural operational semantics • written using logical formulae • global properties • abstraction predicates • An experimental system (TVLA) • Correct by construction
n n p[x]=1 p[y]=0 inOrder[n]=½ n p[x]=0 p[y]=1 inOrder[n]=½ p[x]=0 p[y]=0 inOrder[n]=½ dle dle dle dle dle dle • n n p[x]=1 p[y]=0 inOrder[n]=1 p[x]=0 p[y]=1 inOrder[n]=½ n p[x]=0 p[y]=0 inOrder[n]=½ dle dle dle dle x->d <= y->d v1, v2 :p[x] (v1 ) p[y]( v2) dle (v1 ,v2 ) true
From Local Outlook to Global Outlook • (Safety) Every time control reaches a given point: • there are no garbage memory cells • the list is acyclic • each cell is locally ordered • (History) The list is a permutation of the original list
Bugs Found • Pointer manipulations • null dereferences • memory leaks • Forget to sort the first element • Swap equal elements in bubble sort(non-termination)
L insert_sort_b2(L x) { L r, pr, rn, l, pl; if (x == NULL) return NULL; pr = x; r = x->n; while (r != NULL) { pl = x; rn = r->n; l = x->n; while (l != r) { if (l->d > r->d) { pr->n = rn; r->n = l ; pl->n = r; r = pr; break } pl = l; l = l->n; } pr = r; r = rn; } return x; } • n • n p[x]=1 inOrder[n]=½ p[x]=0 inOrder[n]=1 dle dle dle
Properties Not Proved • (Liveness) Termination • Stability
Related Work • Temporal-logic model checking • Manually extracts finite-state machine • Does not handle dynamically allocated data • But proves stronger properties, e.g., liveness • Bourdoncle 93 • Handles integer arithmetic • Cannot handle pointers
Further Work • Recursive programs (Quicksort) • Experiment with other ADTs (AVL trees) • Automatically derive predicate-update formulae for instrumentation predicates • Scaling to larger programs • User annotations • Class-level analysis • Modular analysis • Space optimizations • Smart front-end that precomputes “cheap” information
Conclusions • It is possible to automatically verify non-trivial properties of complex C programs that manipulate dynamically allocated memory w/o providing loop invariant • The implementation is automatically generated from TVLA • But scaling is an issue
Other Applications of TVLA • Verifying “cleanness” properties of C programs (Dor, Rodeh, Sagiv 2000) • null derefernces • memory leaks • Verifying safety properties of Mobile Ambients (Nielson, Nielson, Sagiv 2000) • Verifying safety programs of multithreaded Java programs (Yahav 2000) • Deadlocks • Nested monitors • Read/Write interference
The Operational Semantics of x = t->n x’(v1, v2) = v1 : t (v1) n (v1, v2)
The Operational Semantics of x->n = NULL n’(v1, v2) =n (v1, v2) x (v1) inOrder’[dle, n](v) =inOrder [dle, n](v)x (v) inROrder’[dle, n](v) =inROrder [dle, n](v)x (v)
The Operational Semantics of x->n = t n’(v1, v2) =n (v1, v2) (x (v1) t(v2)) inOrder’[dle, n](v) = (x(v)? v1 : t(v1) dle(v, v1): InOrder[dle, n](v) ) inROrder’[dle, n](v) = (x(v)? v1 : t(v1) dle(v1, v): inROrder[dle, n](v) )