410 likes | 508 Views
SAT-Based Decision Procedures for Linear Arithmetic and Uninterpreted Functions. Randal E. Bryant. Carnegie Mellon University. http://www.cs.cmu.edu/~bryant. OK. Verification. Error. Decision Procedure for Decidable Fragment of First-Order Logic.
E N D
SAT-Based Decision Procedures for Linear Arithmetic and Uninterpreted Functions Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant
OK Verification Error Decision Procedure for Decidable Fragment of First-Order Logic Decision Procedure for Decidable Fragment of First-Order Logic Decision Procedures in Formal Verification RTL/ Source Code + Specifi-cation Formal Model + Specifi-cation Abstraction Applications: Out-of-order, Pipelined Microprocessors; Cache Coherence Protocols; Device Drivers; Compiler Validation; …
Input Formula Input Formula additional clause unsatisfiable Approximate Boolean Encoder Satisfiability-preserving Boolean Encoder First-order Conjunctions SAT Checker Boolean Formula Boolean Formula satisfiable SAT Solver SAT Solver satisfying assignment satisfiable unsatisfiable satisfiable unsatisfiable LAZY ENCODING EAGER ENCODING SAT-based Decision Procedures
Uninterpreted Functions Linear Arithmetic Theory Combiner Bit Vectors • • • First-order Conjunctions SAT Checker Theory N Lazy Encoding Characteristics • Can be extended to handle wide variety of theories • Clean & modular design • Does not scale well • Number of calls to conjunction checker typically exponential in formula size • Each call independent: nothing learned in one call can be exploited by another
Input Formula Satisfiability-preserving Boolean Encoder Boolean Formula SAT Solver unsatisfiable satisfiable Eager Encoding Characteristics • Must encode all information about domain properties into Boolean formula • Some properties can give exponential blowup • Lets SAT solver do all of the work Good Approach for Some Domains • Modern SAT solvers have remarkable capacity • Good at extracting relevant portions out of very large formulas • Learns about formula properties as search proceeds • Focus of this talk
Common Operations x0 x1 p x x2 ALU x 1 0 ITE(p, x, y) xn-1 y If-then-else Bit-vectors to (unbounded) Integers x = x = y Test for equality y f Functional units to Uninterpreted Functions a = x b = y f(a,b) = f(x,y) Data and Function Abstraction
IF/ID ID/EX EX/WB PC Control Control Op Instr Mem Rd Ra = Adat Reg. File ALU Imm +4 = Rb Abstract Modeling of Microprocessor • For any Block that Transforms or Evaluates Data: • Replace with generic, unspecified function • Also view instruction memory as function F3 F2 F1
EUF: Equality with Uninterp. Functs • Decidable fragment of first order logic • Formulas (F ) Boolean Expressions F, F1F2, F1F2 Boolean connectives T1 = T2 Equation P (T1, …, Tk) Predicate application • Terms (T ) Integer Expressions ITE(F, T1, T2) If-then-else Fun (T1, …, Tk) Function application • Functions (Fun) Integer Integer f Uninterpreted function symbol Read, Write Memory operations • Predicates (P) Integer Boolean p Uninterpreted predicate symbol
e e 1 1 f f T T F F Ù Ù Ø e e Ø 0 0 = = x x f f Ú 0 0 T T Ú = = F F T T F F d d 0 0 EUF Decision Problem • Circuit Representation of Formula • Truth Values • Dashed Lines • Model Control • Logical connectives • Equations • Integer Values • Solid lines • Model Data • Uninterpreted functions • If-Then-Else operation • Task • Determine whether formula F is universally valid • True for all interpretations of variables and function symbols • Often expressed as (un)satisfiability problem • Prove that formula F is not satisfiable
e e 1 1 f f T T F F x0 d0 f(x0) f(d0) Ù Ù Ø e e Ø 0 0 = = x x f f Ú 0 0 T T Ú = = F F T T F F d d 0 0 Finite Model Property for EUF • Observation • Any formula has limited number of distinct expressions • Only property that matters is whether or not different terms are equal
Boolean Encoding of Integer Values • For Each Expression • Either equal to or distinct from each preceding expression • Boolean Encoding • Use Boolean values to encode integers over small range • EUF formula can be translated into propositional logic • Logic circuit with multiplexors, comparators, logic gates • Tautology iff original formula valid
Some History of EUF Decision Procedures • Ackermann, 1954 • Quantifier-free decision problem can be decided based on finite instantiations • Burch & Dill, CAV ‘94 • Automatic decision procedure • Davis-Putnam enumeration • Congruence closure to enforce functional consistency • Boolean approaches • Goel, et al, CAV ‘98 • Attempted with BDDs, but didn’t get good results • Bryant, German, Velev, CAV ‘99 • Could verify microprocessor using BDDs • Velev & Bryant, DAC 2001 • Demonstrated power of modern SAT procedures
Exploiting Positive Equality • Bryant, German, Velev CAV ‘99 • First successful use of Boolean methods for EUF • Positive Equality • Equations that appear in unnegated form • Exploiting • Can greatly reduce number of cases required to show validity • Only need to consider maximally diverse interpretations • Reduce number of Boolean variables in bit-level encoding
Diverse Interpretations: Illustration • Task • Verify someone’s obscure code for 4X4 array transpose void trans(int a[4][4]) { int t; for (t = 4; t < 15; t++) if (~t&2|| t&8 && ~t&1) { int r = t&0x3; int c = t>>2; int val = a[r][c]; a[r][c] = a[c][r]; a[c][r] = val; } } Only operations on array elements Observation • Array elements altered only by copying one to another • Just need to make sure right set of copies performed
Verifying Array Code • Test for trans4 a’ a trans4 Single Test Adequate • Unique value for each possible source element • “Maximally Diverse” • If a’[r][c]=a[c][r], then must have copied proper value
Characteristics of Array Verification • Correctness Condition • a’[0][0] = a[0][0] a’[0][1] = a[1][0] • a’[0][2] = a[2][0] … • … • a’[3][2] = a[2][3] a’[3][3] = a[3][3] • Properties • All equations are in positive form • Worst case test is one that tends to make things unequal • Maximally diverse interpretation: use as many different values as possible • All maximally diverse interpretations isomorphic • Only need to try one to prove all handled correctly
IF/ID ID/EX EX/WB PC Control Control Op Instr Mem Rd Ra = Adat Reg. File ALU Imm +4 = Rb Equations in Processor Verification • Data Types Equations • Register Ids Control stalling & forwarding • Instruction Address Only top-level verification condition • Program Data Only top-level verification condition
Exploiting Equation Structure • Positive Equations • In top-level verification condition • Can use maximally diverse interpretation • Negative Equations • PIpeline control logic • Between register IDs • Operation depends on whether or not two IDs are equal • Must use general encoding • Encode with Boolean variables • All possibility of IDs that match and/or don’t match
e e 1 1 5 6 7 8 f f T T F F Ù Ù Ø e e 7 Ø 0 0 5 = = x x f 0 f Ú 0 0 T T 5 5 6 6 Ú 1 6 7 7 6 8 = = F F T T 5 7 0 1 8 6 F F d d 0 0 Application of Positive Equality • Observation • All equations are positive in this formula • Can consider single, diverse interpretation for terms x0 d0 f(x0) f(d0) 1
= F x1 vf1 = vf2 x2 f f Function Elimination: Ackermann’s Method • Replace All Function Applications by Integer Variables • Introduce new domain variable • Enforce functional consistency by global constraints • Unclear how to restrict evaluation to diverse interpretations
f vf1 x1 = f x2 T F vf2 = = x3 f T F T F vf3 Function Elimination: ITE Method • General Technique • Introduce new domain variable • Nested ITE structure maintains functional consistency
f 5 x1 = f x2 T F 6 = = x3 f T F T F 7 Generating Diverse Encoding • Replacing Application • Use fixed values rather than variables • Application results equal iff arguments equal
Benefits of Positive Equality Velev & Bryant, JSC ‘02 • Microprocessor Benchmarks • 1xDLX: Single issue, RISC processor • 2xDLX-EX-BP: Dual issue processor with exception handling & branch prediction • 9VLIW-BP: 9-way VLIW processor with branch prediction • Measurements • Using BerkMin SAT solver
Transitivity Constraints eyz ezx exy exy eyz exz exy exz eyz Revisiting Encoding Techniques x = y y = z z x Satisfiable? • Small Domain (SD) • Use bit-level encodings of bounded integers • Implicitly encode properties of equality logic • Per-Constraint Encoding (EIJ) • Introduce explicit Boolean variable for each equation • Additional transitivity constraints to express properties of equality logic x1x0=y1y0y1y0=z1z0z1z0x1x0 exy eyzexz
Per-Constraint Encoding • Introduced by Goel et al., CAV ‘98 • Exploiting sparse structure by Bryant & Velev, CAV 2000 • Procedure • Initial formula F • Want to prove valid • Prove that F is not satisfiable • Replace each equation x = y by Boolean variable exy • Gives formula Fsat • Generate formula expressing transitivity constraints • Gives formula Ftrans • Use SAT solver to show that Fsat Ftrans not satisfiable • Motivation • Provides SAT solver with more direct representation of underlying problem
= = = = = = = Graph Interpretation of Transitivity • Transitivity Violation • Cycle in graph • Exactly one edge has ei,j= false
Exploiting Chords • Chord • Edge connecting two non-adjacent vertices in cycle Property • Sufficient to enforce transitivity constraints for all chord-free cycles • If transitivity holds for all chord-free cycles, then holds for arbitrary cycles
Enumerating Chord-Free Cycles • Strategy • Enumerate chord-free cycles in graph • Each cycle of length k yields k transitivity constraints Problem • Potentially exponential number of chord-free cycles 1 2 k • • • 2k+k chord-free cycles • • •
2k+1 chord-free cycles Adding Chords • Strategy • Add edges to graph to reduce number of chord-free cycles 1 2 k • • • 2k+k chord-free cycles • • • Trade-Off • Reduces formula size • Increases number of relational variables
Chordal Graph • Definition • Every cycle of length > 3 has a chord • Goal • Add minimum number of edges to make graph chordal • Relation to Sparse Gaussian Elimination • Choose pivot ordering that minimizes fill-in • NP-hard • Simple heuristics effective
1xDLX-C Equation Structure • Vertices • For each vi • 13 different register identifiers • Edges • For each equation • Control stalling and forwarding logic • 27 relational variables • Out of 78 possible
Original 27 relational variables 286 cycles 858 clauses Augmented 33 relational variables 40 cycles 120 clauses Adding Chordal Edges to 1xDLX-C
2DLX-CCt Equation Structure • Equations • Between 25 different register identifiers • 143 relational variables • Out of 300 possible
Original 143 relational variables 2,136 cycles 8,364 clauses Augmented 193 relational variables 858 cycles 2,574 clauses Adding Chordal Edges to 2xDLX-CCt
Choosing Encoding Method • Comparison • Formula length n with m integer variables & function applications • Worst-case complexity • Per-Constraint Encoding Works Well in Practice • Generates slightly larger formulas than small domain • Better performance by SAT solver
Encoding Comparison Velev & Bryant, JSC ‘02 • Benchmarks • Superscalar, out-of-order datapath • 2–6 instructions issued in parallel • Measurements • Using BerkMin SAT solver
Extensions • Difference logic • Predicates of form x ≤ y + C • Original logic of UCLID • Use integer variables to represent pointers into buffers • C = 1 • Linear constraints • Predicates of from a1x1 + a2x2 + … + anxn ≤ b • Used in applying UCLID to software verification and software security problems
Difference Logic • Predicates of form x ≤ y + C • C generally a small integer • Encoding Methods • Small domain • Range bound n · max |C| • Per constraint encoding • Variables of form ex,,yC • Can have exponential blowup in number of variables • Choosing Encoding Method • Per constraint better, as long as it doesn’t blow up • Predicting blowup • Successfully used classifier trained by machine learning (Seshia, Lahiri & Bryant, DAC ’03)
Linear Constraints • Predicates of from a1x1 + a2x2 + … + anxn ≤ b • Common Case • All but k predicates are difference predicates • ai = +1, aj = –1, rest = 0 • Rest are sparse • At most w coefficients nonzero • Coefficient values small
Linear Constraints • Small Domain Encoding (Seshia & Bryant, LICS ’04) • Find value D such that only need to consider solutions with 0 ≤ xi < D, for all i • Bounds on D: • Encode as SAT problem with log(D) bits / integer variable • Practical for real applications (n+2) ¢ n ¢ (bmax+1) ¢ ( w¢amax ) k
Some Lessons We’ve Learned • Preserve Boolean Structure • Other approaches require collapsing to conjunctions of predicates • Exploit Problem Characteristics • Sparseness • Tighten bounds and/or reduce number of constraints • Polarity structure • Positive equality • Let SAT Solver Do the Work • Eager encoding: provide sufficient set of constraints to prove / disprove formula • They are good at digesting large volume of information