250 likes | 425 Views
Verification as Learning Geometric Concepts. Rahul Sharma , Saurabh Gupta, Bharath Hariharan , Alex Aiken, and Aditya Nori (Stanford, UC Berkeley, Microsoft Research India). Invariants. assume x<0; while ( x<0 ) { x = x+y ; y = y+1; } assert y>0;. while Find invariants
E N D
Verification as Learning Geometric Concepts Rahul Sharma, SaurabhGupta, BharathHariharan, Alex Aiken, and AdityaNori (Stanford, UC Berkeley, Microsoft Research India)
Invariants assume x<0; while ( x<0 ) { x = x+y; y = y+1; } assert y>0; • while • Find invariants • Quantifier free arithmetic • Disjunctive invariants
Disjunctive invariants assume n > 0; x = 0 ; while ( x < n ) { x = x+1; } assert x = n; • Blow up • Restrict disjunctions • Templates • Heuristics • This work • No templates, no blowup
Classification positive examples + + + + + + negative examples + + +
From invariants to classifiers • Safety properties define bad states • Invariants separate reachable states from bad states • Possible to obtain some examples of states • Invariants -> classifiers • Examples of reachable/good states -> positive examples • Examples of bad states -> negative examples • Use a classifier to separate ALL good and bad states
Sample, guess, and check • Generate examples of good and bad concrete states • Guess an invariant using learner • Check if verification succeeds • If yes, then done • If no, then guess again with more examples • Use counter-examples to verification task
Sample good states assume x<0; while ( x<0 ) { print(x,y); x = x+y; y = y+1; } assert y>0; • Reachable states • Run the program
Sample bad states assume P; while ( B ) { S } assert Q; • Unreachable for correct programs • Backward analysis
From program to data y assume x<0; while ( x<0 ) { x = x+y; y = y+1; } assert y>0; x = -1, y = 0 x>=0 && y<=0 + + x +
Learner • Bshouty, Goldman, Mathias, Suri, Tamaki in STOC’96 • Learn arbitrary boolean combinations of inequalities • Create a large enough candidate set of planes • Intelligently select from candidates • Separate given examples of good and bad states • Use only a few planes
Candidate planes y • Candidate set: • One plane per partition • planes x
Example y y • Candidate set: • One plane per partition • planes • Connect every good with every bad state • Bipartite graph • Select candidates • Cut all edges • Minimize #selected candidates + + x x +
Guarantees If the invariant has planes in dimensions, and the candidate set is adequate then produce an output of size • Independent of the number of samples! • ( • Output is only logarithmically larger than invariant
From planes to predicates y • Planes tessellate space • Label regions • Return simplest predicate • Contains all good regions • No bad regions • Some don’t cares • Logic minimization + + x +
Efficiency? • Candidate planes in number • Abstract interpretation over polyhedra is exponential • Assume that inequalities are of specific form • Intervals: , Octagons: • Restrict inequalities for efficient learners • But find arbitrary boolean combinations
Small candidate sets • Invariants are arbitrary boolean combinations of intervals • Need adequate candidate sets • Intervals through every state are sufficient • Size of candidate set: • Octagons: , TCM, …
Guarantees on generalization • Programs have unbounded behaviors • Analyze some finite behaviors and generalize • SLAM/BLAST: Ask for predicates to discard spurious cexs • Impact: Unwind loops and interpolate • Abstract interpretation: iterate and widen • Need a formal definition of generalization • Need generalization guarantees for useful tools
A step: PAC • Probably approximately correct • Assume an oracle that knows the invariant • Oracle draws samples from and labels using • A PAC learner given enough samples () • With high probability outputs a classifier • Misclassifies a new sample with low probability ()
Summary of results Given sufficient good and bad samples, with high probability, the learner generates a predicate, that has high accuracy for unseen samples • The generated classifier is expressive • Arbitrary boolean combinations of linear inequalities • #Planes in classifier independent of samples • Worst case only logarithmically more than invariant
Non-linear invariants • Arbitrary boolean combinations of polynomial inequalities of a given degree • Create a new variable for every monomial using • , vars=, introduce new vars for • The whole machinery carries over • With increased (number of variables/dimension)
Implementation • 100 lines of MATLAB for learner • Havoc+Boogie for checking • Input: annotated C programs • Boogie internally uses SMT solver Z3 • Example invariants
Related work • Invariant inference • Abstract interpretation – disjunctive completion • Constraint based (Sting, InvGen, GSV’ 08) • Use tests to help static analysis: Yogi, InvGen, … • Guess and check: Daikon, SAN’ 12, SGHALN’ 13
Conclusion • Connections between verification and learning • Generalization is a fundamental problem for both • Possible to obtain invariant generators with guarantees • Handling disjunctions and non-linearities is easy • Difficult for symbolic approaches • Need data, which is available • Future work, beyond numerical