550 likes | 704 Views
Satisfiability modulo the Theory of Bit Vectors. Alessandro Cimatti IRST, Trento, Italy cimatti@irst.itc.it. Joint work with R. Bruttomesso, A. Franzen, A. Griggio, R. Sebastiani. We gratefully acknowledge support from the Academic Research Program of Intel. Index of the talk.
E N D
Satisfiability modulothe Theory of Bit Vectors Alessandro Cimatti IRST, Trento, Italy cimatti@irst.itc.it Joint work with R. Bruttomesso, A. Franzen, A. Griggio, R. Sebastiani We gratefully acknowledge support from the Academic Research Program of Intel
Index of the talk • Satisfiability Modulo Theory • The theory of Bit Vectors • Satisfiability Modulo BV • Bit blasting • Eager encoding into Linear Integer Arithmetic • A lazy approach • Conclusions • ( A preview of QF_UFBV32 at SMT-COMP )
SMT in a nutshell • Satisfiability Modulo Theory • or: beyond boolean SAT • Decide the satisfiability of a first order formula with respect to a background theory • Examples of relevant theories • uninterpreted functions: x=y & f(x) != f(y) • difference logic: x – y < 7 • linear arithmetic: 3x + 2y < 12 • arrays: read(write(M, a0, v0) a1) • their combinations • bit vectors
Why SMT • From SAT-based to SMT-based verification • Representation of interesting problems • timed automata • hybrid automata • pipelines • software • Efficient solving • leverage availability of structural information • hopefully retaining efficiency of boolean SAT
Satisfiability Modulo Theory • Satisfiability: • is there a truth-assignment to boolean variables • and a valuation to individual variables • such that formula evaluates to true? • Standard semantics for FOL • Assignment to individual variables • Induces truth values to atoms • Truth assignment to boolean atoms • Induced value to whole formula
Propositionalstructure + - + + - - + - + + - - TA TA TA TA P P P x y z w x x y z w x
Eager Approach to SMT • Main idea: compilation to SAT • STEP1: Theory part compiled to equisatisfiable pure SAT problem • STEP2: run propositional SAT solver
Propositionalstructure TA TA TA TA P P P x y z w x x y z w x
Lifted theory Propositionalstructure TATATATA P P P
The Lazy approach • Ingredients • a boolean SAT solver • a theory solver • The boolean solver is modified to enumerate boolean (partial) models • The theory solver is used to Check for theory consistency
Propositionalstructure TA TATATATA TA P P P TA TA x y z w x x y z w x
MathSAT: intuitions • Two ingredients: boolean search and theory reasoning • find boolean model • theory atoms treated as boolean atoms • truth values to boolean and theory atoms • model propositionally satisfies the formula • check consistency wrt theory • set of constraints induced by truth values to theory atoms • existence of values to theory variables • Example: (P v (x = 3)) & (Q v (x – y < 1)) & (y < 2) & (P xor Q) • Boolean model • !P, (x = 3), Q, (x – y < 1), (y < 2) • Check (x = 3), (x – y < 1), (y < 2) • Theory contradiction! • Another boolean model • P , !(x = 3) , !Q, (x – y < 1), (y < 2) • Check !(x = 3), (x – y < 1), (y < 2) • Consistent: e.g. x := 0, y := 0
Boolean SAT: search space P • The DPLL procedure • Incremental construction of satisfying assignment • Backtrack/backjump on conflict • Learn reason for conflict • Splitting heuristics Q Q R S S T S T R R T SAT!
MathSAT: approach • DPLL-based enumeration of boolean models • Retain all propositional optimizations • Conflict-directed backjumping, learning • No overhead if no theory reasoning • Tight integration between • boolean reasoning and • theory reasoning
MathSAT: search space P Many boolean models are not theory consistent! Q Q R S S T S T R R Bool Bool T Math Bool Bool T Math T SAT! Bool T Math Bool
Early pruning Check theory consistency of partial assignments P EP:Math EP:Math T Q EP:Math T S Pruned away in the EP step EP:Math T T EP:Math T R Bool Bool T Math T SAT!
LTmp0 = a; LTmp1 = 2 * b; LTmp2 = LTmp0 + LTmp1; LTmp3 = 4 * c; LTmp4 = LTmp2 + LTmp3; LTmp5 = 8 * d; LOut = LTmp4 + LTmp5; Are they equivalent? ((a + 2b) + 4c) + 8d RTmp0 = d; RTmp1 = RTmp0 << 1; RTmp2 = c + RTmp1; RTmp3 = RTmp2 << 1; RTmp4 = b + RTmp3; RTmp5 = RTmp4 << 1; ROut = a + RTmp5; a + ((b + ((c + (d<<1)) <<1)) <<1) Bit Vectors: Example input a, b, c, d : reg[N]; I.e. LOut = ROut ?
Fixed Width Bit Vectors • Constants • 0b00001111, 0xFFFF, … • Variables • valued over BitVectors of corresponding width • implicit restriction to finite domain • Function symbols • selection: x[15:0] • concatenation: y :: z • bitwise operators: x && y, z || w, … • arithmetic operators: x + y, z * w, … • shifting: x << 2, y >> 3 • Predicate symbols • comparators: =, ≠ , > , < , ≥ , ≤
Fragments of BV theory • Core • selection • concatenation • Bitwise operators • x && y, x || y, x ^ y • Arithmetic operators • x +y, x – y, c * x • Core + Bitwise + Arithmetic • Complexity of equality between BV terms • Core is in P • Core + B + A in NP • Variable width bit vectors: not covered here • core is in NP • small additions yield undecidability
Decision procedures for BV • Many approaches • Cyrluk, Moeller, Ruess • Moeller, Ruess • Bjørner, Pichora • Barrett, Dill, Levitt • Focus on deciding conjunctions of literals • Emphasis on proof obligations in ITP • some emphasis on variable width, generic wrt N • Shostak-style integration • canonization • solving
Satisfiability modulo Bit Vectors • Applications of interest • RTL hardware descriptions essentially bit vectors • assembly-level programs • software with finite precision arithmetic • Key feature • combination of control flow and data flow • In principle, boolean logic can be encoded into BV • control (boolean logic) encoded into width 1 BVs. • Likely inefficient in comparison to SAT • More natural to keep them separate at modeling • structural info can be exploited for verification
Approaches to SMT(BV) • Bit blasting • Eager Encoding into LA • Lazy approach
SMT(BV) via Bit Blasting • Boolean variables: untouched • Bit vector variables as collections of (unrelated) boolean variables • [x0, x1, …, x63] • Selection/concatenations are trivial • static detection • Equalities / Assignments: x = y • (x0 <-> y0) & (x1 <-> y1) & … & (x63 <-> y63) • Bitwise operators: x && y • [x0 & y0, x1 & y1, …, x63 & y63] • Arithmetic operators: x + y • BVADD([x0, …, x63], [y0, …, y63])
LTmp0 = a; LTmp1 = 2 * b; LTmp2 = LTmp0 + LTmp1; LTmp3 = 4 * c; LTmp4 = LTmp2 + LTmp3; LTmp5 = 8 * d; LOut = LTmp4 + LTmp5; Are they equivalent? ((a + 2b) + 4c) + 8d RTmp0 = d; RTmp1 = RTmp0 << 1; RTmp2 = c + RTmp1; RTmp3 = RTmp2 << 1; RTmp4 = b + RTmp3; RTmp5 = RTmp4 << 1; ROut = a + RTmp5; a + ((b + ((c + (d<<1)) <<1)) <<1) Comparison of Data Paths input a, b, c, d : reg[N]; I.e. LOut = ROut ?
Bit Blasting Words • a,b,c,d,… • blasted to [a1,…aN], [b1,…bN], [c1,…cN], [d1,…dN], … • LTmp6 != RTmp6 • (LOut.1 != ROut.1) or … or (LOut.N != ROut.N) • LTmp1 = 2 * b • formula in 2N vars, conjunction of N iffs • LTmp2 = LTmp0 + LTmp1 • formula relating 3N vars • possibly additional vars required (e.g. carries) • N = 16 bits? • 13 secs • N = 32 bits? • 170 secs • “But obviously N = 64 bits!” • stopped after 2h CPU time Scalabilitywith respect to N???
Bit-Blasting: Pros and Conses • Bottlenecks • dependency on word width • “wrong” level of abstraction • boolean synthesis of arithmetic circuits • assignments are pervasive • conflicts are very fine grained • e.g. discover x < y • Advantages • let the SAT solver do all the work • and nowadays SAT solvers are tough nuts to crack • amalgamation of the decision process • no distinction between control and data • conflicts can be as fine grained as possible • built-in capability to generate “new atoms”
Enhancements to BitBlasting • Tuning SAT solver on structural information • e.g. splitting heuristic for adders • Preprocessing + SAT [GBD05] • rewrite and normalize bit vector terms • bit blasting to SAT
From BV to LIA • RTL-Datapath Verification using Integer Linear Programming [BD01] • BV constants as integers • 0b32_1111 as 15 • BV variables as integer valued variables, with range constraints • reg x [31:0] as x in range [0, 2^32) • Assignments treated as equality, e.g. x = y • Arithmetic, e.g. z = x + y • Linear arithmetic? not quite! BV Arithmetic is modulo 2^N • z = x + y - 2^N s, with z in [0, 2^N) • Concatenation: x :: y as 2^n x + y • Selection: relational encoding (based on integrity) • x[23:16] as xm, where • x = 2^24 xh + 2^16 xm + xl, xl in [0, 2^16), xm in [0, 2^8), xl in [0, 2^8) • Bitwise operators • based on selection of individual bits • SOLVER • the omega test
From SMT(BV) into SMT(LIA) • Generalizes [BD01] to deal with boolean structure • Eager encoding into SMT(LIA) • Unfortunately, not very efficient • More precisely, a failure
Retrospective Analysis • Crazy approach? • Arithmetic • Linear arithmetic? not quite! BV Arithmetic is modulo 2^N • Selection and Concatenation • an easy problem becomes expensive! • Bitwise operators • HARD!!! • Available solvers not adequate • integers with infinite precision • reasoning with integers may be hard (e.g. BnB within real relaxation) • Functional dependencies are lost! • A clear culprit: static encoding • depending on control flow, same signal is split in different parts • z = if P then x[7:0] :: y[3:0] else x[5:2] :: y[10:3] • x, y and also z are split more than needed • the notion of “maximal chunk” depends on P !!!
A lazy approach • Based on standard MathSAT schema • DPLL-based model enumeation • Dedicated Solver for Bit vectors • The encoding leverages information resulting from decisions • Given values to control variables, the data path is easier to deal with (e.g. maximal chunks are bigger) • Layering in the theory solver • equality reasoning • limited simplification rules • full blown bit vector solver only at the end
The architecture Boolean enumeration BV solver EUF reasoning LIAencoding BV rewriter
Rewriting rules • evaluation of constant terms • 0b8_01010101[4:2] becomes 0b3_101 • rules for equality • x = y and Phi(x) becomes Phi(y) • based on congruence closure • splitting concatenations • (x :: y) = z becomes x = z[h_n] && y == z[l_n]
Rewriting rules • pushing selections • (x && y)[7:0] becomes (x[7:0] && y[7:0]) • (x :: y)[23:8] becomes (x[7:0] :: y[15:8]) • “pigeon-hole” rules • from (x != 0 & x != 1 & x != 2 & x < 3) derive false
BV rewriter • Rules are applied until • fix point reached • contradiction found • Implementation based on EUF reasoner • rules as merges between eq classes • Open issues • incrementality/backtrackability • selective rule activation • conflic set reconstruction • When it fails …
LIA encoding (the last hope) • LIA encoding • idenfication of maximal slices • “purification”: separating out arithmetic and BW by introduction of additional variables • NB: on resulting problems • LIA encoding always superior to bit blasting!!! • cfr [DB01]
Status of Implementation • Implementation still in prototypical state • “Does a lot of stupid things” • conflict minimization by deletion filtering • checking that conflict are in fact minimal • unnecessary calls to LA for SAT clusters • calling LA solver implemented as dump on file, and run external MathSAT • huge conflict sets
Competitors • Run against MiniSAT 1.14 • ~ winner of SAT competition in 2005 • KEY REMARK: • boolean methods are very mature • A good reason for giving up?
Test benches • 74 benchmarks from industrial partner • would have been ideal for SMT-COMP • QF_UFBV32 • Unfortunately • can not be disclosed • “will have to be destroyed after the collaboration” • hopefully our lives will be spared
Conclusions • A “market need” for SMT(BV) solvers • Bit Blasting: tough competitors • After a failure, … • Preliminary results are encouraging • Future challenges • optimize BV solver • better conflict sets • tackle some RTL verification cases • extension to memories
QF_UFBV[32] at SMT-COMP • the MathSAT you will see there IS NOT the one I described • We currently have no results for QF_UFBV • Easy benchmarks: • QF_UFBV[32] not particularly “SMT” • the boolean component is nearly missing • the BV part is “easily” solvable by bit blasting • We entered SMT-COMP QF_UFBV32 • MathSAT based on BIT BLASTING to SAT • NuSMV based on bit blasting to BDDs