SAT, Interpolants and Software Model Checking

SAT, Interpolantsand Software Model Checking Ken McMillan Cadence Berkeley Labs

Applications of SAT solvers • BMC of programs using SAT (e.g., CBMC) • SAT solvers in decision procedures • Eager approach (e.g., UCLID) • Lazy approach (Verifun, ICS, many others) • SAT-based image computation • Applied to predicate abstraction (Lahiri, et al) • ... SAT solvers have been applied in many ways in software verification We will consider instead the lessons learned from solving SAT that can be applied to software verification.

Outline • SAT solvers • How do they work • What general lessons can we learn from the experience • Software model checking survey • How various methods do or do not embody lessons from SAT • A modest proposal • An attempt to apply the lessons of SAT to software verification

SAT solvers • Solvers charactized by • Exhaustive BCP • Conflict-driven learning (resolution) • Deduction-based decision heuristics SATO, GRASP,CHAFF,etc DPLL DP DLL variable elimination backtrack search

Lesson #1: Be Lazy • DP approach • Eliminate variables by exhaustive resolution • Extremely eager: deduces all facts about remaining variables • Essentially quantifier elimination -- explodes. • DPLL approach • Lazy: only resolves clauses when model search fails • Resolution use as a form of failure generalization • Learns general facts from model search failure • Implications: • Make expensive deductions only when their relevance can be justified. • Don't do quantifier elimination.

Lesson #2: Be Eager • In a DPLL solver, we always close deduction under unit resolution (BCP) before making a decision. • Guides decision making model search • Guides resolution steps in failure generalization • BCP updated after decision making and clause learning • Implications: • Be eager with inexpensive deduction. • Deduce all the cheap facts before trying any expensive ones. • Let the expensive deduction drive the cheap deduction

Lesson #3: Learn from the Past • Facts useful in one particular case are likely to be useful in other cases. • This principle is embodied in • Clause learning • Deduction-based decision heuristics (e.g., VSIDS) Implication: Deduce facts that have been useful in the past.

Static Analysis • Compute the least fixed-point of an abstract transformer • This is the strongest invariant the analysis can provide • Inexpensive analyses: • value set analysis • affine equalities, etc. • These analyses lose information at a merge: x = y x = z T

Predicate abstraction • Abstract transformer: • strongest Boolean postcondition over given predicates • Advantage: does not lose information at a merge • join is disjunction x = y x = z x=y Ç x=z • Disadvantage: • Abstract post is very expensive! • Computes information about predicates with no relevance justification

PA with CEGAR loop • Choose predicates to refute cex's • Generalizes failures • Some relevance justification • Still performs expensive deduction without justification • strongest Boolean postcondition • Fails to learn from past • Start fresh each iteration • Forgets expensive deductions Choose initial T# Model check abstraction T# true, done Cex yes, Cex Can extend Cex from T# to T? no Add predicates to T#

Boolean Programs • Abstract transformer • Weaker than predicate abstraction • Evaluates predicates independently -- loses correlations Predicate abstraction Boolean programs {T} x=y; {T} {T} x=y; {x=0 , y=0} • Advantages • Computes less expensive information eagerly • Disadvantages • Still computes expensive information without justification • Still uses CEGAR loop

Lazy Predicate Abstraction • Unwind the program CFG into a tree • Refine paths as needed to refute errors x=y x=y Add predicates along path to allow refutation of error y=0 ERR! • Refinement is local to an error path • Search continues after refinement • Do not start fresh -- no big CEGAR loop • Previously useful predicates applied to new vertices

Lazy Predicate Abstraction x=y x=y Add predicates along path to allow refutation of error y=0 ERR! • Refinement is local to an error path • Search continues after refinement • Do not start fresh -- no big CEGAR loop • Previously useful predicates applied to new vertices

SAT-based BMC • Inherits all the properties of SAT • Deduction limited to propositional logic • Cannot directly infer facts like x · y • Inexpensive deduction limited to BCP Convert to Bit Level Loop Unwinding Program SAT

decision x=y; x=z; x=z SAT-based with Static Analysis • Allows richer class of inexpensive deductions • Inexpensive deductions not updated after decisions and clause learning • Coupling could be tighter • Perhaps using lazy decision procedures? Static Analysis Loop Unwinding Convert to Bit Level Program SAT

Lazy abstraction and interpolants • A way to apply the lessons of SAT to lazy abstraction • Keep the advantages of lazy abstraction... • Local refinement (be lazy) • No "big loop" as in CEGAR (learn from the past) • ...while avoiding the disadvantages of predicate abstraction... • no eager image computation • ...and propagating inexpensive deductions eagerly • as in static analysis

Interpolation Lemma (Craig,57) • Notation: L() is the set of FO formulas over the symbols of  • If A Ù B = false, there exists an interpolant A' for (A,B) such that: A Þ A' A' Ù B = false A' 2L(A) ÅL(B) • Example: • A = p Ù q, B = Øq Ù r, A' = q • Interpolants from proofs • in certain quantifier-free theories, we can obtain an interpolant for a pair A,B from a refutation in linear time. [McMillan 05] • in particular, we can have linear arithmetic,uninterpreted functions, and restricted use of arrays

... A1 A2 A3 Ak True False ... ) ) ) ) A'1 A'2 A'3 A'k-1 Interpolants for sequences • Let A1...An be a sequence of formulas • A sequence A’0...A’n is an interpolant for A1...An when • A’0 = True • A’i-1Æ Ai) A’i, for i = 1..n • An = False • and finally, A’i2L (A1...Ai) ÅL(Ai+1...An) In other words, the interpolant is a structured refutation of A1...An

True x1= y0 x=y; 1. Each formula implies the next ) x1=y0 y1=y0+1 y++; ) y1>x1 x1=y1 [x=y] ) False Path refinement procedure proof structured proof SSA sequence Path Refinement Prover Interpolation Interpolants as Floyd-Hoare proofs 2. Each is over common symbols of prefix and suffix 3. Begins with true, ends with false

L=0 [L!=0] do{ lock(); old = new; if(*){ unlock; new++; } } while (new != old); L=1; old=new L=0; new++ [new!=old] program fragment [new==old] control-flow graph Lazy abstraction -- an example

T 0 L=0 F T L=0 T [L!=0] 2 1 Label error state with false, by refining labels on path Unwinding the CFG L=0 [L!=0] L=1; old=new L=0; new++ [new!=old] [new==old] control-flow graph

L=1; old=new T 3 L=0; new++ T L=0 4 [new!=old] F T T L=0 [L!=0] 5 6 Covering: state 5 is subsumed by state 1. Unwinding the CFG T 0 L=0 L=0 [L!=0] F [L!=0] L=0 2 1 L=1; old=new L=0; new++ [new!=old] [new==old] control-flow graph

old=new T old=new 8 [new==old] [new!=old] T F T [L!=0] T 10 7 11 9 F T Another cover. Unwinding is now complete. Unwinding the CFG T 0 L=0 L=0 [L!=0] F [L!=0] L=0 2 1 L=1; old=new L=1; old=new T 3 L=0; new++ L=0; new++ [new!=old] L=0 4 [new!=old] [new==old] F L=0 [L!=0] 5 6 control-flow graph

x=y x· y X Covering step • If y(x) )y(y)... • add covering arc x B y • remove all z B w for w descendant of y We restict covers to be descending in a suitable total order on vertices. This prevents covering from diverging.

x=0 y=0 y¹0 X F Refinement may remove covers Refinement step • Label an error vertex False by refining the path to that vertex with an interpolant for that path. • By refining with interpolants, we avoid predicate image computation. T x = 0 T [x¹y] [x=y] T T y++ y=2 T T [y=0] T

refine this path y¹0 Forced cover • Try to refine a sub-path to force a cover • show that path from nearest common ancestor of x,y proves y(x) at y T x = 0 x=0 T [x¹y] [x=y] T T y=0 y++ y=2 T y¹0 T [y=0] T F Forced cover allow us to efficiently handle nested control structure

refine this path [x¹z] [x=z] x=z y=1 y=2 y2{1,2} T x=z y=2 [y=1Æ xz] F y=2 value set refined from value set analysis Incremental static analysis • Update static analysis of unwinding incrementally • Static analysis can prevent many interpolant-based refinements • Interpolant-based refinements can refine static analysis T x = 0 x=0 T [x¹y] [x=y] T T y=0 y++ y=2 T y¹0 T [y=0] T F

Applying the lessons from SAT • Be lazy with epensive deductions • All path refinements justified • No eager predicate image computation • Be eager with inexpensive deductions • Static anlalysis updated after all changes • Refinement and static analysis interact • Learn from the past • Refinements incremental – no “big CEGAR loop” • Re-use of historically useful facts by forced covering

Experiments • Windows device driver benchmarks from BLAST benchmark suite • programs flattened to "simple goto programs" • Compare performance against BLAST, a lazy predicate abstraction tool • No static analysis. Almost all BLAST time spent in predicate image operation.

The Saga Continues • After these results, Ranjit Jhala modified BLAST • vertices inherit predicates from their parents, reducing refinements • fewer refinements allows more predicate localization • Impact also made more eager, using value set analysis

Conclusions • Caveats • Comparing different implementations is dangerous • More and better software model checking benchmarks are needed • Tentative conclusions • For control-dominated codes, predicate abstraction is too "eager“ • better to be more lazy about expensive deductions • Propagate inexpensive deductions can produce substantial speedup • roughly one order of magnitude for Windows examples • Perhaps by applying the lessons of SAT, we can obtain the same kind of rapid performance improvements obtained in that area • Note 2-3 orders of magnitude speedup in lazy model checking in 6 months!

Future work • Procedure summaries • Many similar subgraphs in unwinding due to procedure expansions • Cannot handle recursion • Can we use interpolants to compute approximate procedure summaries? • Quantified interpolants • Can be used to generate program invariants with quantifiers • Works for simple examples, but need to prevent number of quantifiers from increasing without bound • Richer theories • In this work, all program variables modeled by integers • Need an interpolating prover for bit vector theory • Concurrency...

Mv Me 0 L=0 [L!=0] 2 1 L=1; old=new 3 L=0; new++ 4 8 Unwinding the CFG • An unwinding is a tree with an embedding in the CFG L=0 [L!=0] L=1; old=new L=0; new++ [new!=old] [new==old]

Mv Me Expansion • Every non-leaf vertex of the unwinding must be fully expanded... If this is not a leaf... 0 L=0 L=0 ...and this exists... ...then this exists. 1 ...but we allow unexpanded leaves (i.e., we are building a finite prefix of the infinite unwinding)

T F L=0 T L=0 F L=0 T These two nodes are covered. (have a ancestor at the tail of a covering arc) Labeled unwinding • A labeled unwinding is equiped with... • a lableing function y : V !L(S) • a covering relation Bµ V £ V 0 L=0 [L!=0] 2 1 L=1; old=new 3 L=0; new++ ... 4 [new!=old] [new==old] [L!=0] 5 6 7 ...

T F L=0 T L=0 F L=0 T Well-labeled unwinding • An unwinding is well-labeled when... • y(e) = True • every edge is a valid Hoare triple • if x B y then y not covered 0 L=0 [L!=0] 2 1 L=1; old=new 3 L=0; new++ 4 [new!=old] [new==old] [L!=0] 5 6 7

old=new T old=new 8 [new==old] [new!=old] T F T [L!=0] T 9 7 10 9 F T Safe and complete • An unwinding is • safe if every error vertex is labeled False • complete if every nonterminal leaf is covered T 0 L=0 F [L!=0] L=0 2 1 L=1; old=new T 3 L=0; new++ L=0 4 [new!=old] F L=0 [L!=0] 5 6 ... ... Theorem: A CFG with a safe complete unwinding is safe.

Unwinding steps • Three basic operations: • Expand a nonterminal leaf • Cover: add a covering arc • Refine: strengthen labels along a path so error vertex labeled False

Overall algorithm • Do as much covering as possible • If a leaf can't be covered, try forced covering • If the leaf still can't be covered, expand it • Label all error states False by refining with an interpolant • Continue until unwinding is safe and complete

SAT, Interpolants and Software Model Checking

SAT, Interpolants and Software Model Checking

Presentation Transcript

SAT and Model Checking

SAT Based Abstraction/Refinement in Model-Checking

Software Model Checking

Software Model Checking

Software Model Checking

SAT-based unbounded model checking using interpolation

Model Checking of Software

SAT -based Bounded and Unbounded Model Checking

Tuning SAT-checkers for Bounded Model-Checking

Tuning SAT-checkers for Bounded Model-Checking

Model Checking Software Artifacts

SAT for Software Model Checking Introduction to SAT-problem for newbie

SAT Based Abstraction/Refinement in Model-Checking

SAT-based Bounded Model Checking

SAT-based Model Checking

Model Checking Concurrent Software

Blast: Software Model Checking

Tuning SAT-checkers for Bounded Model-Checking

On bounded model checking, abstract interpretation, interpolants, and induction

CSEP590 – Model Checking and Software Verification

Software Model Checking

Tuning SAT-checkers for Bounded Model-Checking