570 likes | 584 Views
Modeling Data in Formal Verification Bits, Bit Vectors, or Words. Randal E. Bryant Carnegie Mellon University. http://www.cs.cmu.edu/~bryant. Overview. Issue How should data be modeled in formal analysis? Verification, test generation, security analysis, … Approaches
E N D
Modeling Data in Formal VerificationBits, Bit Vectors, or Words Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant
Overview • Issue • How should data be modeled in formal analysis? • Verification, test generation, security analysis, … • Approaches • Bits: Every bit is represented individually • Basis for most CAD, model checking • Words: View each word as arbitrary value • E.g., unbounded integers • Historic program verification work • Bit Vectors: Finite precision words • Captures true semantics of hardware and software • More opportunities for abstraction than with bits
Data Path Com. Log. 1 Com.Log. 2 Bit-Level Modeling Control Logic • Represent Every Bit of State Individually • Behavior expressed as Boolean next-state over current state • Historic method for most CAD, testing, and verification tools • E.g., model checkers
Bit-Level Modeling in Practice • Strengths • Allows precise modeling of system • Well developed technology • BDDs & SAT for Boolean reasoning • Limitations • Every state bit introduces two Boolean variables • Current state & next state • Overly detailed modeling of system functions • Don’t want to capture full details of FPU • Making It Work • Use extensive abstraction to reduce bit count • Hard to abstract functionality
x Word-Level Abstraction #1: Bits → Integers x0 • View Data as Symbolic Words • Arbitrary integers • No assumptions about size or encoding • Classic model for reasoning about software • Can store in memories & registers x1 x2 xn-1
Data Path Data Path Com. Log. 1 Com. Log. 1 ? Com.Log. 2 Com. Log. 1 ? What do we do about logic functions? Abstracting Data Bits Control Logic
ALU Word-Level Abstraction #2: Uninterpreted Functions • For any Block that Transforms or Evaluates Data: • Replace with generic, unspecified function • Only assumed property is functional consistency: a = x b = y f(a, b) = f(x, y) f
F1 F2 Abstracting Functions Control Logic • For Any Block that Transforms Data: • Replace by uninterpreted function • Ignore detailed functionality • Conservative approximation of actual system Data Path Com. Log. 1 Com. Log. 1
Word-Level Modeling: History • Historic • Used by theorem provers • More Recently • Burch & Dill, CAV ’94 • Verify that pipelined processor has same behavior as unpipelined reference model • Use word-level abstractions of data paths and memories • Use decision procedure to determine equivalence • Bryant, Lahiri, Seshia, CAV ’02 • UCLID verifier • Tool for describing & verifying systems at word level
Pipeline Verification Example Pipelined Processor Reference Model
Abstracted Pipeline Verification Pipelined Processor Reference Model
Experience with Word-Level Modeling • Powerful Abstraction Tool • Allows focus on control of large-scale system • Can model systems with very large memories • Hard to Generate Abstract Model • Hand-generated: how to validate? • Automatic abstraction: limited success • Andraus & Sakallah, DAC 2004 • Realistic Features Break Abstraction • E.g., Set ALU function to A+0 to pass operand to output • Desire • Should be able to mix detailed bit-level representation with abstracted word-level representation
Bit Vectors: Motivating Example #1 • Do these functions produce identical results? • Strategy • Represent and reason about bit-level program behavior • Specific to machine word size, integer representations, and operations int abs(int x) { int mask = x>>31; return (x ^ mask) + ~mask + 1; } int test_abs(int x) { return (x < 0) ? -x : x; }
Motivating Example #2 void fun() { char fmt[16]; fgets(fmt, 16, stdin); fmt[15] = '\0'; printf(fmt); } • Is there an input string that causes value 234 to be written to address a4a3a2a1? • Answer • Yes: "a1a2a3a4%230g%n" • Depends on details of compilation • But no exploit for buffer size less than 8 • [Ganapathy, Seshia, Jha, Reps, Bryant, ICSE ’05]
Motivating Example #3 bit[W] popSpec(bit[W] x) { int cnt = 0; for (int i=0; i<W; i++) { if (x[i]) cnt++; } return cnt; } bit[W] popSketch(bit[W] x) { loop (??) { x = (x&??) + ((x>>??)&??); } return x; } • Is there a way to expand the program sketch to make it match the spec? • Answer • W=16: • [Solar-Lezama, et al., ASPLOS ‘06] x = (x&0x5555) + ((x>>1)&0x5555); x = (x&0x3333) + ((x>>2)&0x3333); x = (x&0x0077) + ((x>>8)&0x0077); x = (x&0x000f) + ((x>>4)&0x000f);
Motivating Example #4 Sequential Reference Model Pipelined Microprocessor • Is pipelined microprocessor identical to sequential reference model? • Strategy • Represent machine instructions, data, and state as bit vectors • Compatible with hardware description language representation • Verifier finds abstractions automatically
Bit Vector Formulas • Fixed width data words • Arithmetic operations • Add/subtract/multiply/divide, … • Two’s complement, unsigned, … • Bit-wise logical operations • Bitwise and/or/xor, shift/extract, concatenate • Predicates • ==, <= • Task • Is formula satisfiable? • E.g., a > 0 && a*a < 0 50000 * 50000 = -1794967296 (on 32-bit machine)
Decision Procedure Satisfying solution Formula Unsatisfiable (+ proof) Decision Procedures • Core technology for formal reasoning • Boolean SAT • Pure Boolean formula • SAT Modulo Theories (SMT) • Support additional logic fragments • Example theories • Linear arithmetic over reals or integers • Functions with equality • Bit vectors • Combinations of theories
BV Decision Procedures:Some History • B.C. (Before Chaff) • String operations (concatenate, field extraction) • Linear arithmetic with bounds checking • Modular arithmetic • Limitations • Cannot handle full range of bit-vector operations
BV Decision Procedures:Using SAT • SAT-Based “Bit Blasting” • Generate Boolean circuit based on bit-level behavior of operations • Convert to Conjunctive Normal Form (CNF) and check with best available SAT checker • Handles arbitrary operations • Effective in Many Applications • CBMC [Clarke, Kroening, Lerda, TACAS ’04] • Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’05] • CVC-Lite [Dill, Barrett, Ganesh], Yices [deMoura, et al]
Bit-Vector Challenge • Is there a better way than bit blasting? • Requirements • Provide same functionality as with bit blasting • Find abstractions based on word-level structure • Improve on performance of bit blasting • Observation • Must have bit blasting at core • Only approach that covers full functionality • Want to exploit special cases • Formula satisfied by small values • Simple algebraic properties imply unsatisfiability • Small unsatisfiable core • Solvable by modular arithmetic • …
Some Recent Ideas • Iterative Approximation • UCLID: Bryant, Kroening, Ouaknine, Seshia, Strichman, Brady, TACAS ’07 • Use bit blasting as core technique • Apply to simplified versions of formula • Successive approximations until solve or show unsatisfiable • Using Modular Arithmetic • STP: Ganesh & Dill, CAV ’07 • Algebraic techniques to solve special case forms • Layered Approach • MathSat: Bruttomesso, Cimatti, Franzen, Griggio, Hanna, Nadel, Palti, Sebastiani, CAV ’07 • Use successively more detailed solvers
+ Overapproximation More solutions: If unsatisfiable, then so is + Fewer solutions: Satisfying solution also satisfies − Underapproximation − Iterative Approach Background: Approximating Formula • Example Approximation Techniques • Underapproximating • Restrict word-level variables to smaller ranges of values • Overapproximating • Replace subformula with Boolean variable Original Formula
Starting Iterations • Initial Underapproximation • (Greatly) restrict ranges of word-level variables • Intuition: Satisfiable formula often has small-domain solution 1−
1+ UNSAT proof: generate overapproximation If SAT, then done First Half of Iteration • SAT Result for 1− • Satisfiable • Then have found solution for • Unsatisfiable • Use UNSAT proof to generate overapproximation 1+ • (Described later) 1−
If UNSAT, then done SAT: Use solution to generate refined underapproximation 2− Second Half of Iteration • SAT Result for 1+ • Unsatisfiable • Then have shown unsatisfiable • Satisfiable • Solution indicates variable ranges that must be expanded • Generate refined underapproximation 1+ 1−
Iterative Behavior 2+ 1+ • Underapproximations • Successively more precise abstractions of • Allow wider variable ranges • Overapproximations • No predictable relation • UNSAT proof not unique k+ k− 2− 1−
2+ 1+ UNSAT k+ k− 2− 1− SAT Overall Effect • Soundness • Only terminate with solution on underapproximation • Only terminate as UNSAT on overapproximation • Completeness • Successive underapproximations approach • Finite variable ranges guarantee termination • In worst case, get k−
1+ UNSAT proof: generate overapproximation 1− 2− Generating Overapproximation • Given • Underapproximation 1− • Bit-blasted translation of 1− into Boolean formula • Proof that Boolean formula unsatisfiable • Generate • Overapproximation 1+ • If 1+ satisfiable, must lead to refined underapproximation • Generate 2− such that 1− 2−
x = y x + 2 z 1 a w & 0xFFFF = x x % 26 = v Bit-Vector Formula Structure • DAG representation to allow shared subformulas
x = y x + 2 z 1 w Range Constraints x y a z w & 0xFFFF = x x % 26 = v Æ Structure of Underapproximation • Linear complexity translation to CNF • Each word-level variable encoded as set of Boolean variables • Additional Boolean variables represent subformula values −
Range Constraints w x Encoding Range Constraints • Explicit • View as additional predicates in formula • Implicit • Reduce number of variables in encoding Constraint Encoding 0 w 8 0 0 0 ··· 0 w2w1w0 −4x 4 xsxsxs··· xsxsx1x0 • Yields smaller SAT encodings 0 w 8 −4x 4
w Range Constraints x y z Ç Æ : Æ Ç UNSAT Proof • Subset of clauses that is unsatisfiable • Clause variables define portion of DAG • Subgraph that cannot be satisfied with given range constraints x = y x + 2 z 1 a Æ w & 0xFFFF = x Ç x % 26 = v
w Range Constraints x y z Ç Æ : Æ Ç Extracting Circuit from UNSAT Proof • Subgraph that cannot be satisfied with given range constraints • Even when replace rest of graph with unconstrained variables x = y x + 2 z 1 UNSAT a Æ b1 b2
Ç Æ : Ç Generated Overapproximation • Remove range constraints on word-level variables • Creates overapproximation • Ignores correlations between values of subformulas x = y x + 2 z 1 a 1+ Æ b1 b2
w Range Constraints x y z Ç Æ : Æ Ç Refinement Property • Claim • 1+ has no solutions that satisfy 1−’s range constraints • Because 1+ contains portion of 1− that was shown to be unsatisfiable under range constraints x = y x + 2 z 1 UNSAT a Æ 1+ b1 b2
Ç Æ : Ç Refinement Property (Cont.) • Consequence • Solving 1+ will expand range of some variables • Leading to more exact underapproximation 2− x = y x + 2 z 1 a 1+ Æ b1 b2
SAT: Use solution to generate refined underapproximation 2− Effect of Iteration • Each Complete Iteration • Expands ranges of some word-level variables • Creates refined underapproximation 1+ UNSAT proof: generate overapproximation 1−
Approximation Methods • So Far • Range constraints • Underapproximate by constraining values of word-level variables • Subformula elimination • Overapproximate by assuming subformula value arbitrary • General Requirements • Systematic under- and over-approximations • Way to connect from one to another • Goal: Devise Additional Approximation Strategies
x * y Function Approximation Example • Motivation • Multiplication (and division) are difficult cases for SAT • §: Prohibit Via Additional Range Constraints • Gives underapproximation • Restricts values of (possibly intermediate) terms • §: Abstract as f(x,y) • Overapproximate as uninterpreted function f • Value constrained only by functional consistency
Results: UCLID BV vs. Bit-blasting • UCLID always better than bit blasting • Generally better than other available procedures • SAT time is the dominating factor [results on 2.8 GHz Xeon, 2 GB RAM]
Challenges with Iterative Approximation • Formulating Overall Strategy • Which abstractions to apply, when and where • How quickly to relax constraints in iterations • Which variables to expand and by how much? • Too conservative: Each call to SAT solver incurs cost • Too lenient: Devolves to complete bit blasting. • Predicting SAT Solver Performance • Hard to predict time required by call to SAT solver • Will particular abstraction simplify or complicate SAT? • Combination Especially Difficult • Multiple iterations with unpredictable inner loop
STP: Linear Equation Solving • Ganesh & Dill, CAV ’07 • Solve linear equations over integers mod 2w • Capture range of solutions with Boolean Variables • Example Problem • Variables: 3-bit unsigned integers x: [x2 x1 x0] y: [y2 y1 y0] z: [z2 z1 z0] • Linear equations: conjunction of linear constraints • General Form • A x = b mod 2w
Solution Method • Equations • Some Number Theory • Odd number has multiplicative inverse mod 2w • Mod 8: 3−1 = 3 • Additive inverse mod 2w: −x = 2w− x • Mod 8: −4 = 4 −2 = 6 • Solve first equation for x:
Solution Method (cont.) • Substitutions
What if All Coefficients Even? • Result of Substitutions • Even numbers do not have multiplicative inverses mod 8 • Observation • Can divide through and reduce modulus
General Solutions • Original variables: 3-bit unsigned integers x: [x2 x1 x0] y: [y2 y1 y0] z: [z2 z1 z0] • Solutions • Constrained variables y: [y2 1 1] z: [z2 1 0] • Back Substitution • Constrained variables x: [0 0 0]
Linear Equation Solutions • Equations • Encoding All Possible Solutions • x: [0 0 0] y: [y2 1 1] z: [z2 1 0] • y2, z2 arbitrary Boolean variables • 4 possible solutions (out of original 512) • General Principle • Form of LU decomposition • Polynomial time algorithm • Boolean variables in solution to express set of solutions • Only works when have conjunction of linear constraints
Layered Solver • Bruttomesso, et al, CAV ’07 • Part of MathSAT project • DPLL(T) Framework • SAT solver coupled with solver for mathematical theory T • BV theory solver works with conjunctions of constraints