560 likes | 583 Views
Modular Data Structure Verification. Viktor Kuncak. Supervisor: Martin Rinard Committee members: Arvind , Daniel Jackson. Program analysis and verification. Discover/verify properties of software systems Practical relevance: programmer productivity performance: compiler optimizations
E N D
Modular Data Structure Verification Viktor Kuncak Supervisor: Martin Rinard Committee members: Arvind, Daniel Jackson
Program analysis and verification • Discover/verify properties of software systems • Practical relevance: programmer productivity • performance: compiler optimizations • reliability: discovering and preventing errors • maintainability: understanding code • Ultimate impact: • make it easier to produce working software • create more sophisticated systems
Spectrum of analysis techniques • Broad research area, many dimensions • bug finding versus bug prevention • control-intensive versus data-intensive systems • generic versus application-specific properties Original ideal was automated full verification Reality: verify partial correctness properties • success story: type systems • active area: temporal properties (typestate) trend: towards more complex properties
My research verifying properties of data structures
Data structure consistency properties unbounded number of objects, dynamically allocated next next next x.next.prev == x root acyclicity of next prev prev prev shape not given by types, but by structural properties; may change over time graph is a tree right left left right elements are sorted class Node { Node f1, f2; }
first next next size 3 Data structure consistency properties dynamically allocated arrays table node is stored in the bucket given by the hash of node’s key instances do not share array value key hashCode numerical quantities value of size field is the number of stored objects Examples of internal data structure consistency properties
External data structure consistency If a book is loaned to a person, then • book is in the catalog • person is registered with library Person [0..1] Can loan a book to at most one person at a time loanedTo A person can loan at most 4 books at a time [0..4] Book • correlate different data structures - global • meaningful to users of the system • capture design constraints (object models) • inconsistency can lead to policy violations Simple Library System relies on internal consistency to be even meaningful
Both static and dynamic properties • Invariant properties: talk about single state • data structure invariants hold • State change properties: correlate multiple states • operations have the expected effect • add operation inserts element into a set • removal removes all elements with a given key • operations have no unintended side effects • expected sequencing of operations • can remove only after adding elems to data structure
Goal • Prove data structure properties • for all program executions (sound) • both internal and external consistency • both invariant and state change properties • both implementation and use of data structures • also absence of run-time errors • with high level of automation
Proving data structure properties program satisfies the properties Java source code of a program . . . proc remove(x : Node) { Node p=x.prev; n=x.next; if (p!=null) p.next = n; else root = n; if (n!=null) n.prev = p; } . . . automated verifier (x,y) 2 r ! x 2 A Æ y 2 B data structure properties (Isabelle/HOL) ! A B r error in program (or property)
Challenges in verifying consistency precision no single approach will work complexheterogenous data structures, in the context of application; developer-defined properties scalability communication with developers
Contributions: Jahob verification system program property modular verification methodology parsing, type checking, intermediate forms, variable dependencies front end, verification condition generator 1 method to deploy multiple reasoning techniques decision procedure dispatcher splitting proof obligations, dispatching each result 2 4 3 5 Translation to first-order logic Boolean Algebra with Presburger Arithmetic (BAPA) Field constraint analysis three complementary reasoning techniques
Library system example 1. 2. Person ListCollection implementation p If a person has borrowed a book, then • book is in the catalog • person is registered with library loanedTo Map implementation i n b p b p Book TreeCollection implementation b 8b 8 p. (9 i < M.9 n. (loanedTable[i], n) 2 next* Æ n.key = b Æ n.value = p) ! (bookTreeRoot, b) 2 (left [ right)* Æ (personListRoot, p) 2 next* Isolate data structure complexity into separate Java classes Then verify: 1. properties hold for simplified system w/ sets and relations 2. classes correctly implement sets and relations 8b 8 p. (b, p) 2 loanedTo ! b 2 Book Æ p 2 Person
1. Verifying high-level properties class LibrarySystem { Person ListCollection person;TreeCollection book;Map loanedTo; public void decomissionBook(Book b) { books.remove(b); loanedTo.remove(b); } loanedTo b Book class TreeCollection { specvar tcontent :: obj set; public void remove(Object x) ensures tcontent = old tcontent – {x} class Map { specvar mcontent ::(obj*obj)set; public void remove(Object key) ensures mcontent = old mcontent – {(k,v).k=key} 8 b8 p. (b, p) 2 loanedTo ! b 2 Book Æ p 2 Person
2. Verifying Map implementation class Map { // Implemented as a hash table private AssocList[] table; public specvar mcontent ::“(obj*obj) set”; invariant contentDef: mcontent = {(k,v). 9i· M.(k,v)2 table[i].acontent} invariant correctBucket: 8 k v. 8i·M.(k,v)2 table[i].acontent ! hash k M = i public void remove(Object key) requires key null ensures mcontent = old mcontent – {(k,v).k=key}{ int hash = compute_hash(key); table[hash] = AssocList.removeAll(key, table[hash]); mcontent := old mcontent – {(k,v).k=key} } ...
2. Verifying association list implementation class AssocList { // Functional linked list private Object key, data; private AssocList next; specvar acontent ::(obj*obj) set; invariant contentDef2: this null ! acontent={(key,data)} [ next.acontent static AssocList removeAll(Object key, AssocList list) requires key null modifies content ensures result.acontent = list.acontent – {(k,v).k=key}{ if (list==null) return null; if (key==list.key) return removeAll(key,list.next); else return cons(list.key,list.data, removeAll(key,list.next)); }
Modular verification summary ListCollection implementation Library example Map implementation Association list implementation TreeCollection implementation Key benefits of modular verification • each individual verification task simpler • verification results for collections and maps are reusable (repositories of verified data structures)
Jahob verification system modular verification methodology parsing, type checking, intermediate forms, variable dependencies front end, verification condition generator method to deploy multiple reasoning techniques decision procedure dispatcher splitting proof obligations, dispatching each result Translation to first-order logic Boolean Algebra with Presburger Arithmetic (BAPA) Field constraint analysis three complementary reasoning techniques
Reducing verification to validity of formulas annotated code front end, verification condition generator verification condition formula validity checker ! program satisfies properties error in programor property invalid valid Verification condition (VC) – a logical formula saying: “If precondition holds at entry, then postcondition holds in the final state, invariants are preserved, and there are no run-time errors”
Formula validity checking in Jahob modular verification methodology parsing, type checking, intermediate forms, variable dependencies front end, verification condition generator method to deploy multiple reasoning techniques splitting proof obligations, dispatching each result, approximating HOL formulas decision procedure dispatcher Translation to first-order logic Boolean Algebra with Presburger Arithmetic (BAPA) Field constraint analysis formula validity checker
What do verification conditions look like? invariant 8 b. b 2 books ! b nullinvariant 8 b p.(b,p)2 loanedTo ! b2 books Æ p2 persons public void decomissionBook(Book b1)requires b1 2 books { books.remove(b1); loanedTo.remove(b1); } annotated code (8 b. b 2 books ! b null) Æ(8 b p. (b,p) 2 loanedTo ! b 2 books Æ p 2 persons) Æb1 2 books ! (books1 = books - {b1} ! b1 null Æ (loanedTo1 = loanedTo - {(b,p).b=b1} ! (8 b. b 2 books1 ! b null ) Æ (8 b p. (b,p) 2 loanedTo1 ! b 2 books1 Æ p 2 persons ))) verification condition - an Isabelle formula
Interactively proving VCs in Isabelle lemma verificationCondition:“(8 b. b 2 books ! b null) Æ(8 b p. (b,p) 2 loanedTo ! b 2 books Æ p 2 persons) Æb1 2 books ! (books1 = books - {b1} ! b1 null Æ (loanedTo1 = loanedTo - {(b,p).b=b1} ! (8 b. b 2 books1 ! b null ) Æ (8 b p. (b,p) 2 loanedTo1 ! b 2 books1 Æ p 2 persons )))” apply (rule_tacimpI) Interactive = user supplies proof script apply (rule_tacimpI) apply (rule_tacconjI) ... Isabelle checks manually supplied proof done Automation limited for larger formulas
Can we check VCs with more automation? (8 b. b 2 books ! b null) Æ(8 b p. (b,p) 2 loanedTo ! b 2 books Æ p 2 persons) Æb1 2 books ! (books1 = books - {b1} ! b1 null Æ (loanedTo1 = loanedTo - {(b,p).b=b1} ! (8 b. b 2 books1 ! b null ) Æ (8 b p. (b,p) 2 loanedTo1 ! b 2 books1 Æ p 2 persons ))) verification condition - an Isabelle formula 1 RT 3 2 valid S2 S1 4 3 splitting into conjuncts S4 RT 2 Sequent3: A1 Æ...Æ An! G RT 4 (8 b. b 2 books ! b null) Æ(8 b p. (b,p) 2 loanedTo ! b 2 books Æ p 2 persons) Æb1 2 books Æbooks1 = books - {b1} ÆloanedTo1 = loanedTo - {(b,p).b=b1} Æ(b0,p0) 2 loanedTo1 ! b02 books1 valid valid Reasoning Technique 1 multiple reasoning techniques 3 valid
Constructing a reasoning technique sequent - an Isabelle formulabelongs to an undecidable class A1 Æ...Æ An! G • How can a specialized technique accept Isabelle formulas? valid formula approximation soundly approximates formulawith a simpler formula A1’Æ A3! G’ valid expects as input e.g. formula in a decidable class (or otherwise “easier” class) specialized algorithm Jahob reasoning technique
Range of sound approximations • Worst: a(F) = False (useless) General idea of our approximations:a(F) = a1(simplify(F))ap(F1ÆF2) = ap(F1) Æ ap(F2)ap(F1Ç F2) = ap(F1) Ç ap(F2)ap(: F) = : a:p(F) ap(goodF) = translation of goodFa1(badF) = Falsea0(badF) = True Best: a(F) = if “F is valid” then True else False (impossible)
Jahob verification system modular verification methodology parsing, type checking, intermediate forms, variable dependencies front end, verification condition generator method to deploy multiple reasoning techniques decision procedure dispatcher splitting proof obligations, dispatching each result Isabelle Boolean Algebra with Presburger Arithmetic (BAPA) Translation to first-order logic Field constraint analysis MONA decision procedure first-order theorem prover Presburger Arithmetic decision procedure w/ Charles Bouillaguet w/ Thomas Wies three complementary reasoning techniques
Translation to first-order logic • Motivation: FOL provers effective, fully automated • decades of research in resolution, paramodulation • solved open problems (e.g. axiomatization of BAs) • Approach: approximate HOL by FOL • substitute, beta-reduce definitions • sets and relations become predicates • flattening, function updates • eliminate tuples • linear arithmetic axioms • approximate otherwise: avoid full encoding (using combinators S, K, or encoding set theory)
Encoding types • Translated formulas have two types: obj,int • Input to resolution-based provers is untyped! • Standard solution: types as unary predicates • makes formulas larger, provers much slower • Faster solution: omit them! • not sound in general • Theorem: Omitting types is sound if • sorts are disjoint, and • sorts have equal cardinality • Orders of magnitude speedup
Results obtained using first-order provers • Instantiable set and relation implementations: • Hash table (120 sec) • Association list (12 sec) • Functional sorted binary search tree (178 sec) • Imperative list (18 sec) • Library example (20 sec)
Hash table insertion public void add(Object key, Object value) ... { int hash = compute_hash(key); table[hash] = AssocList.cons(key,value, table[hash]); mcontent := (old mcontent) [ {(key,value)} if (size > (4 *table.length)/5) rehash(table.length + table.length);} public void rehash(int m) ...ensures “mcontent = old mcontent”{ AssocList[] t = table; init(m);rehash_aux(0,t);}private void rehash_aux(int i, AssocList[] t) ... {addAll(t[i]); if (j < t.length) rehash_aux(j,t);}public addAll(AssocList[] pairs) ... { AssocList lst = pairs; while inv “...” (!AssocList.is_nil(lst)) { Pair p = AssocList.getOne(lst); lst = AssocList.remove(p.key, p.value, lst);add(p.key, p.value); }}
Verifying imperative lists private Node first;private ghost specvar con :: obj set;public specvar lcontent :: obj set;vardefs lcontent = first.con; invariant this null ! con = {data} [ next.con & : data2 next.con; public void remove(Object x)modifies lcontentensures lcontent = old lcontent – {x} x=3 first next next 4 next 3 2 1 con con con con {4} {3,4} {2,3,4} {1,2,3,4} Loop searching for 3 must also remove 3 from preceding con fields During search, invariant defining con temporarily violated We really want is something that can express reachability
Jahob verification system modular verification methodology parsing, type checking, intermediate forms, variable dependencies front end, verification condition generator method to deploy multiple reasoning techniques decision procedure dispatcher splitting proof obligations, dispatching each result Boolean Algebra with Presburger Arithmetic (BAPA) Translation to first-order logic Field constraint analysis MONA decision procedure first-order theorem prover Presburger Arithmetic decision procedure three complementary reasoning techniques
Imperative list using reachability private static Node first;public static specvar content :: obj setvardefs content=={x.x null Æ (first,x) 2 {(a,b).b=next a}* }invariant tree [next]invariant 8x y. prev x = y ! next y = x (almost)public void remove(Object x)requires n 2 contentmodifies contentensures content = old content – {x}{ if (n==first) root = root.next else n.prev.next = n.next; if (n.next != null) n.next.prev = n.prev; n.next = null; n.prev = null;} content is dependent variable – no need to update it in remove reachability expressed directly – not using induction
Proving formulas with reachability • Reachability properties in trees are decidable • Monadic Second-Order Logic over Trees • existing MONA decision procedure • constructs a tree automaton for each formula • checks emptiness of the language of automaton Using simple MONA approximation: Can analyze list, tree implementations right left But not doubly-linked lists or trees with parent pointers
Field constraint analysis • Enables reasoning about non-tree fields • Can handle broader class of data structures • doubly-linked lists, trees with parent pointers • skip lists treebackbone next next next next next constrainedfields prev prev prev prev prev Constrained fields satisfy constraint invariant: 8 x y. prev y = x !nextx = y
Elimination of constrained fields valid valid soundness VC1(next,prev) VC2(next) field constraint analysis MONA completeness VMCAI'06 invalid invalid (for useful class including preservation of field constraints) substitute (prev a = b) with (next b = a) treebackbone next next next next next constrainedfields prev prev prev prev prev Constrained fields satisfy constraint invariant: 8 x y. prev y = x !nextx = y
Field constraints: a comparison • Previous approaches • constraining formula must be deterministic • We allow arbitrary constraint formulas • fields need not be uniquely given by backbone treebackbone next next next next next constrainedfields nextSub nextSub Constrained fields satisfy constraint invariant: 8 x y. nextSub x = y next+ x y
Field constraint analysis results • Results within Jahob • lists • trees with parent pointer (insertion) • two-level skip list • Proved sound and complete* • High automation level • no need to for specification variable updates • Symbolic shape analysis (Thomas Wies) • infers loop invariants
Jahob verification system modular verification methodology parsing, type checking, intermediate forms, variable dependencies front end, verification condition generator method to deploy multiple reasoning techniques decision procedure dispatcher splitting proof obligations, dispatching each result Boolean Algebra with Presburger Arithmetic (BAPA) Translation to first-order logic Field constraint analysis MONA decision procedure first-order theorem prover Presburger Arithmetic decision procedure three complementary reasoning techniques
BAPA: Sets with cardinality bounds • Imposing constraints on abstract content • card(content) = size • 2 card(circulatedBooks) · card(books) first next next size field is consistent withthe number of stored objects size 3
Boolean Algebra with Presburger Arithmetic S ::= V | S1[ S2 | S1Å S2 | S1n S2T ::= k | C |T1 + T2 | T1 – T2 | C¢T | card(S)A ::= S1 = S2 | S1µ S2 | T1 = T2 | T1 < T2F ::= A | F1Æ F2 | F1Ç F2 | :F |9S.F | 9k.F • Not widely known, but natural extension of BAs • I gave first complexity bound (CADE'05, JAR) • quantifier elimination algorithm (as in LICS’03)
From BAPA to PA • If A,B are disjoint, then |A [ B| = |A| + |B| • Make them disjoint: Venn diagram • Reduce set vars to integer vars • For quantifiers, use quantifier elimination • Preserves alternations complexity same as for PA y x 6 5 8 2 1 3 |xcÅ yÅ zc| 4 z
Quantifier-free BAPA • Previous technique gives NEXPTIME • We show it can be done in PSPACE: • analyze resulting integer linear equations • exponentially many variables • polynomially many equations small model property: solutions singly exponential • guess sizes of sets • use alternating PTIME algorithm to check them • Real-valued relaxation is NP-complete|x| + |y| = |x\y| + |x\y| + 1 • - satisfiable in relaxation
Summary of BAPA results • Application within Jahob • verified updates to size field • library example: at most ½ books in circulation • Observations • clarified that problem is not undecidable (!) • first formalization of algorithm • showed complexity identical to PA • QFBAPA bound from NEXPTIME to PSPACE • QFBAPA fragments in P (with Bruno Marnette) • real-value version of QFBAPA is NP-complete
Jahob verification system modular verification methodology parsing, type checking, intermediate forms, variable dependencies front end, verification condition generator method to deploy multiple reasoning techniques decision procedure dispatcher splitting proof obligations, dispatching each result Boolean Algebra with Presburger Arithmetic (BAPA) Translation to first-order logic Field constraint analysis MONA decision procedure first-order theorem prover Presburger Arithmetic decision procedure three complementary reasoning techniques
there is more toJahob verification system Huu Hai Nguyen Omega front end, verification condition generator symbolic shape analysis Karen, Thomas CVC Lite Thomas Wies Coq Charles syntactic loop invariant inference decision procedure dispatcher Isabelle Karen Zee Boolean Algebra with Presburger Arithmetic (BAPA) Translation to first-order logic Field constraint analysis MONA decision procedure first-order theorem prover Presburger Arithmetic decision procedure w/ Charles Bouillaguet w/ Thomas Wies
Synergy of reasoning techniques ListCollection implementation Field constraint analysis Library example Map implementation Association list implementation Translation to first-order logic BAPA
How Jahob addresses challenges reduce verification to formulas in logic precision multiple reasoning techniques no single approach will work complexheterogenous data structures, in the context of application; developer-defined properties scalability modular verification communication with developers Isabelle as specification language
Verified data structures • Lists implementing sets and relations • Trees implementing sets and relations • List with a cursor (simplified iterator) • Hash table • Two-level skip list • Insertion sort • Library benchmark • In progress: small game; part of file system
Future work • Case studies • Methodology for encapsulation • Inference of specifications, specialized analyses • New specification annotations and their power • Finer-grained combination techniques • Executing and under-approximating formulas • counterexamples for formulas (FSE’05) • testing, run-time checking of specifications • efficient execution of declarative specifications design appropriate specification language