710 likes | 852 Views
Modular Static Analysis with Sets and Relations for Verifying Data Structure Consistency. Viktor Kuncak Computer Science and Artificial Intelligence Lab MIT. Joint work with:. Huu Hai Nguyen Peter Schmitt Suhabe Bugrara. Martin Rinard Andreas Podelski Daniel Jackson. Patrick Lam
E N D
Modular Static Analysiswith Sets and Relationsfor Verifying Data Structure Consistency Viktor Kuncak Computer Science and Artificial Intelligence Lab MIT Joint work with: Huu Hai Nguyen Peter Schmitt Suhabe Bugrara Martin Rinard Andreas Podelski Daniel Jackson Patrick Lam Thomas Wies Karen Zee
Program analysis and verification • Discover/verify properties of software systems • Practical relevance: programmer productivity • performance: compiler optimizations • reliability: discovering and preventing errors • maintainability: understanding code • Broader implications • automated analysis of formal artifacts(implications for XML documents, formal proofs)
Spectrum of analysis techniques • Broad research area, many dimensions • bug finding versus bug prevention • control-intensive versus data-intensive systems • generic versus application-specific properties Original ideal: full program verification Reality: verify partial correctness properties • success story: type systems • active area: temporal properties (typestate) trend: towards complex properties
first next next size 3 Data structure consistency properties unbounded number of objects, dynamically allocated next next next x.next.prev = x root acyclicity of next prev prev prev shape not given by types, but by structural properties; may change over time graph is a tree right left left right class Node { Node f1, f2; } size field is consistent withthe number of stored objects
next next next Inconsistent data structures • Can cause program crashes next next next prev prev Unexpected outcome of operations • removing two instead of one element Looping internal consistency
External data structure consistency If a person has borrowed a book, then • person is registered with library, and • book is in the catalog Person [0..1] Two persons cannot borrow the same book borrows A person can borrow at most 4 books at a time [0..4] Book • correlate different data structures - global • meaningful to users of the system • capture design constraints (object models) • inconsistency can lead to policy violations relies on internal consistency to be even meaningful
Goal • Prove data structure consistency • for all program executions (sound) • with high level of automation • both internal and external consistency • both implementation and use of data structures
Using static analysis to enforce data structure consistency data structures are consistent source code of a program . . . proc remove(x : Node) { Node p=x.prev; n=x.next; if (p!=null) p.next = n; else root = n; if (n!=null) n.prev = p; } . . . static analyzer x.next.prev = x consistency properties ! A B r error in program
Challenges in verifying consistency precision no single approach will work complexheterogenous data structures, in the context of application; developer-defined properties scalability communication with developers
Outline • Goal: verify data structure consistency • Our approach through an example • Bohne: one of the analyses in our system • Current status and ongoing work • Future work
Example: Minesweeper Game (actual screenshot) Analyzed using our system (based on Java version)
true init next next prev prev isExposed false Minesweeper game data structures Cell object
true init next next prev prev Minesweeper consistency properties isExposed true prev is inverse of next isExposed 1 next is acyclic false object is in hidden cells list iff initialized and isExposed is false
Complex consistency properties • object is in hidden cells list iff its init flag is true and its isExposed flag is false • Formalization as an invariant • Difficulties • need to track exact reachability properties • correlate linking information with stored data • Need a way to deal with complexity expression that is true whenever program reaches certain points = {x | next*(HiddenListRoot,x) } {x | x.init & ! x.isExposed }
Towards factoring out complexity object is in hidden cells list iff its init flag is true and its isExposed flag is false Formalization as an invariant = {x | next*(HiddenListRoot,x) } {x | next*(HiddenListRoot,x) } ListContent abstract reasoning in terms of sets UnexposedCells {x | x.init & ! x.isExposed } {x | x.init & ! x.isExposed } ListContent = UnexposedCells = How to enable such reasoning in our program?
init isExposed next next prev prev Board module List module Encapsulating complexity in modules Minesweeper source code record Cell { partial record Cell { } List.content = Board.UnexpCells init : bool; isExposed : bool; next, prev : Cell; encapsulate state } proc expose(c:Cell) { remove(c); setFlag(c); } encapsulate operations proc remove(c : Cell) { Cell p=x.prev; n=x.next; if (p!=null) p.next = n; else root = n; if (n!=null) n.prev = p; } partial record Cell { } partial record Cell { replace implementations (in the analysis only) with set specifications } proc setFlag(c : Cell) { c.isExposed = true; }
content UnexpCells Reasoning in terms of sets Minesweeper source code partial record Cell { } equality is preserved: proc expose(c:Cell) { remove(c); setFlag(c); } List.content = Board.UnexpCells Board module List module proc setFlag(c : Cell) proc remove(c : Cell) content ’ = content - c UnexpCells’= UnexpCells - c No need to reason about data structure details! can use more scalable analyses
Justifying reasoning in terms of sets Three sections of a module Minesweeper source code modularized the invariant! partial record Cell { } proc expose(c:Cell) { remove(c); setFlag(c); } List.content = Board.UnexpCells List module proc setFlag(c : Cell) proc remove(c : Cell) specification section content ’ = content - c UnexpCells’= UnexpCells - c abstraction section content = {x | next*(root, x) } UnexpCells={x|x.init&!x.isExposed} proc remove(c : Cell) { ... if (p!=null) p.next = n; ... } proc setFlag(c : Cell) { c.isExposed = true; } implementation section
List module Verification of List has dual benefits: • justify analysis of clients • prove partial correctness of List operations impl module List { partial record Cell { next, prev : Cell; } var root : Cell;proc remove(c : Cell) { Cell p=c.prev; n=c.next; if (p!=null) p.next = n; else root = n; if (n!=null) n.prev = p; } } spec module List { specvar content : Cell set; proc remove(c : Cell) requires c in content & c != null modifies content ensures content ’ = content - c; } reasoning about List invariants is confined to List module abstmodule List { content = { x : Cell | next* root x} ; invariant tree [next]; invariant ALLx y. prev x = y ! (x null Æ y null ! next y = x); } showing conformance:use precise analyses butonly inside the List module
Summary of our approach: two steps Reasoning about program in terms of simpler interfaces - uses of interfaces - global consistency scalable analyses Application (Data Structure Client) A interface B interface A implementation B implementation Checking that interfaces reflect implementations and internal consistencyis preserved - precise analyses
This approach addresses challenges Used in manual verification, VDM, ESC/Java as data abstraction Reasoning about program in terms of simpler structures Application (Data Structure Client) scalability analysis3 heterogeneity: multiple analyses developers communicatewith system via interfaces A interface B interface analysis1 analysis2 precision: within data structures A implementation B implementation Checking that abstract structures reflect implementations
Key question in automating approach(while keeping it useful) Application (Data Structure Client) How to choose interface language? analysis3 A interface B interface analysis1 analysis2 A implementation B implementation Our solution: set algebra
Set algebra as interface language • Useful: express key data structure properties • disjointness (A Å B = ;), inclusion (A µ B) • insertion (S’ = S [ x), removal (S’ = S \ x) • conceptual object state • initialization, sequencing of API operations • symbolic notations for hierarchical state charts • Verifiable: on both sides of set abstraction • typestate techniques for interface uses • shape analyses for interface implementations
Two systems based on this insight Omega solver for linear arithmetic field constraint analysis Isabelle MONA CVC Lite SVV'05 VMCAI'06 decision procedure dispatcher BAPA decision procedures and theorem provers CADE'05, JAR POPL’02 SAS’03 VMCAI’04 verification condition generator Bohne invariant inference Flag analysis for high-level properties VMCAI’06 VMCAI'05 annotation inference algorithms Hob data structure analysis system Jahob data structure analysis system CC'05 AOSD'05 VSTTE’05
Outline • Goal: verify data structure consistency • Our approach through an example • Bohne: one of the analyses in our system • Current status and ongoing work • Future work
right left next next next root prev prev prev Bohne analysis properties • Analyzes linked data structures • Precisely handles reachability properties can define set of elements reachable from root: content = { x | next*(root,x) } • Predictable: based on decision procedures
Starting point Omega solver for linear arithmetic field constraint analysis Isabelle MONA CVC Lite SVV'05 VMCAI'06 decision procedure dispatcher BAPA decision procedures and theorem provers CADE'05, JAR POPL’02 SAS’03 VMCAI’04 verification condition generator Bohne invariant inference Flag analysis for high-level properties VMCAI’06 VMCAI'05 invariant inference algorithms Hob data structure analysis system Jahob data structure analysis system CC'05 AOSD'05 VSTTE’05
basic verifier = vcgen + decision procedure data structures are consistent Bohne analysis pre, body , post valid VC: pre wlpbody(post) specification verification condition generator decision procedure abstraction implementation syntactic translation(as in symbolic execution) invalid ! Verification condition (VC) – a logical formula saying: “If precondition holds at entry, then postcondition holds in the final state, invariants are preserved, and there are no run-time errors” error in program
Decision procedure • Goal: precise reasoning about reachability • Reachability properties in trees are decidable • Monadic Second-Order Logic over Trees • existing MONA decision procedure • construct a tree automaton for each formula • check emptiness of the language of automaton Using this approach: We can analyze implementations of trees right left But only trees.Even parent links would introduce cycles!
Beyond trees Omega solver for linear arithmetic field constraint analysis Isabelle MONA CVC Lite SVV'05 VMCAI'06 decision procedure dispatcher BAPA decision procedures and theorem provers CADE'05, JAR POPL’02 SAS’03 VMCAI’04 verification condition generator Bohne invariant inference Flag analysis for high-level properties VMCAI’06 VMCAI'05 invariant inference algorithms Hob data structure analysis system Jahob data structure analysis system CC'05 AOSD'05 VSTTE’05
Field constraint analysis • Enables reasoning about non-tree fields • Can handle broader class of data structures • doubly-linked lists, trees with parent pointers • skip lists treebackbone next next next next next constrainedfields nextSub nextSub Constrained fields satisfy constraint invariant: ALL x y. nextSub(x) = y next+(x,y)
Elimination of constrained fields valid valid soundness VC1(next,nextSub) VC2(next) field constraint analysis MONA completeness VMCAI'06 invalid invalid (for useful class including preservation of field constraints) treebackbone next next next next next constrainedfields nextSub nextSub Constrained fields satisfy constraint invariant: ALL x y. nextSub(x) = y next+(x,y)
Elimination of constrained fields • Previous approaches • constraining formula must be deterministic • We allow arbitrary constraint formulas • fields need not be uniquely given by backbone treebackbone next next next next next constrainedfields nextSub nextSub Constrained fields satisfy constraint invariant: ALL x y. nextSub(x) = y next+(x,y)
Inferring invariants would need loop invariants Omega solver for linear arithmetic field constraint analysis Isabelle MONA CVC Lite SVV'05 VMCAI'06 decision procedure dispatcher BAPA decision procedures and theorem provers CADE'05, JAR POPL’02 SAS’03 VMCAI’04 verification condition generator Bohne invariant inference Flag analysis for high-level properties VMCAI’06 VMCAI'05 invariant inference algorithms Hob data structure analysis system Jahob data structure analysis system CC'05 AOSD'05 VSTTE’05
Loop invariant synthesis Possible states at entry to List.remove(c) . . . root c root c Problem: unbounded number of objects c Solution: partition objects into sets root
Partitioning with reachability abstract heap (represents unbounded number of concrete heaps) Pc... ! Pc & Rc ... ! Proot & ! Pc & ! Rc Proot ... . . . . . . root c 8 x. (Proot(x) & !Pc(x) & !Rc(x)) | !Proot(x) & !Pc(x) & !Rc(x)) | !Proot(x) & Pc(x) & Rc(x)) | !Proot(x) & !Pc(x) & Rc(x))) Partitioning properties of objects: Proot – pointed to by root Pc – pointed to by c Rc – reachable from c Group nodes according to whether properties hold Pc... ! Pc & Rc ... | 8 x. (Proot(x) & Pc(x) & !Rc(x)) | !Proot(x) & !Pc(x) & Rc(x))) . . . c root
Domain for inferring loop invariants Pc ! Pc & Rc ! Proot & ! Pc & ! Rc Proot POPL’02: graph-basedSAS’03: undecidabilityVMCAI’04: formulasSAS’05 (Podelski, Wies) • Çc 8 x. ÇbÆaPa(a,b,c)(x) . . . ... partitioning properties and their negations (Rx, !Rx) ! Proot & ! Pc & ! Rc a summary node C2 C4 C3 C1 abstract heap set of possible abstract heaps at a given program point
Domain for inferring loop invariants Pc ! Pc & Rc ! Proot & Rroot & ! Px & ! Rx Proot • Compared to predicate abstraction . . . ... Çc 8 x. ÇbÆaPa(a,b,c)(x) • predicates on object x and state, not just state • enables needed precision and efficiency ÇbÆaPa(a,b)
Propagating abstract heaps initial heaps F1 n = c.next F2 p = c.prev Finite state space - explore using a worklist algorithm How to compute if heap is a successors? Use verification condition generator! . . .
Computing transitions Bohne analysis F1 wlp(F2) verification condition generator Decision procedure valid F1 , basic block , F2 transition F1 F2 is possible invariant synthesis
Making invariant synthesis feasible • Naive algorithm: 2^2n queries • Reducing number of queries • transform each summary node independently(Cartesian abstraction) • avoid recomputation • precompute abstractions of transitions(generalization of Boolean programs) • precompute unsatisfiable conjunctions • ‘semantic’ caching of queries • auxiliary analysis to propagate true conjuncts • Improvements crucial for making analysis feasible
Analyses Developed in Hob depends on graduate student 10 line / sec Omega solver for linear arithmetic field constraint analysis Isabelle MONA CVC Lite SVV'05 VMCAI'06 decision procedure dispatcher BAPA 100 lines / sec using MONA(but could use SAT) CADE'05, JAR POPL’02 SAS’03 VMCAI’04 verification condition generator Bohne invariant inference Flag analysis for high-level properties 1 line / sec VMCAI’06 VMCAI'05 invariant inference algorithms Hob data structure analysis system Jahob data structure analysis system CC'05 AOSD'05 VSTTE’05
Outline • Goal: verify data structure consistency • Our approach through an example • Bohne: one of the analyses in our system • Current status • analyzed programs • ongoing work • Future work
true init next next prev prev Minesweeper experience prev is inverse of next isExposed false next is acyclic object is in hidden cells list iff initialized and isExposed is false
Verified properties meaningful to designers and end users disjoint(Hidden.content, Exposed.content) • “A cell is never both hidden and exposed” • consistency needed to understand the game • ! disjoint(Mined.content,Exposed.content) => gameOver • “If a mined cell is exposed, the game is over” • defining property of the game
List with a cursor spec module IterList { specvar Content : Node set specvar Iter : Node set; invariant Iter in Content; impl module IterList { var root, current : Node; proc remove(n : Node) { if (n==root) { root = root.next; } Node prv, nxt; prv = n.prev; nxt = n.next; if (prv!=null) { prv.next = nxt; } if (nxt!=null) { nxt.prev = prv; } n.next = null; n.prev = null; } } if (n==current) { current = current.next; } BUG proc remove(n : Node) requires n in Content & n != null ensures Content’= Content – n & } Content Iter ’ = Iter – n Iter current root
Verifying use of cursors spec module IterList { specvar Content, Iter : Node set; invariant Iter in Content; proc isLastIter() returns b : bool ensures b' <=> (Iter ' = {}); proc nextIter() returns n : Node requires Iter != { } modifies Iter ensures (n != null) & (n in Iter) & (Iter ' = Iter - n) & (n in Content); } • List.openIter(); • bool b = List.isLastIter(); • while (!b) { • c = List.nextIter(); • View.drawCell(c); • b = List.isLastIter(); • } iterator initialized before use no iteration past the end each cell visited exactly once
Further analyzed programs • Water particle simulation: ordering of computation phases • Web server: initialization, ordering, data structures • serving http://hob.csail.mit.edu • High-level properties • relationships between different data structures • none of individual analysis could handle alone • Individual data structures: • trees (w/ parents), doubly-linked lists (w/ cursors) • skip lists, lists with cross pointers, array, priority queue • Ongoing work: • turn-based strategy game, collection classes • operating system data structures
Jahob system Omega solver for linear arithmetic field constraint analysis Isabelle MONA CVC Lite SVV'05 VMCAI'06 decision procedure dispatcher BAPA decision procedures and theorem provers CADE'05, JAR POPL’02 SAS’03 VMCAI’04 verification condition generator Bohne invariant inference Flag analysis for high-level properties VMCAI’06 VMCAI'05 invariant inference algorithms Hob data structure analysis system Jahob data structure analysis system CC'05 AOSD'05 VSTTE’05
Jahob system • Successor to Hob • Goal: check data structures in more scenarios • richer interfaces and invariants • maps to specify association lists, hash tables • relations to specify unbounded number of instances • symbolic cardinality constraints on sets • future extension to other properties • Implementation language: Java subset • Specification language: Isabelle subset • New specialized decision procedures