1 / 57

Modeling Data in Formal Verification Bits, Bit Vectors, or Words

Modeling Data in Formal Verification Bits, Bit Vectors, or Words. Randal E. Bryant Carnegie Mellon University. http://www.cs.cmu.edu/~bryant. Overview. Issue How should data be modeled in formal analysis? Verification, test generation, security analysis, … Approaches

claree
Download Presentation

Modeling Data in Formal Verification Bits, Bit Vectors, or Words

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Data in Formal VerificationBits, Bit Vectors, or Words Randal E. Bryant Carnegie Mellon University http://www.cs.cmu.edu/~bryant

  2. Overview • Issue • How should data be modeled in formal analysis? • Verification, test generation, security analysis, … • Approaches • Bits: Every bit is represented individually • Basis for most CAD, model checking • Words: View each word as arbitrary value • E.g., unbounded integers • Historic program verification work • Bit Vectors: Finite precision words • Captures true semantics of hardware and software • More opportunities for abstraction than with bits

  3. Data Path Com. Log. 1 Com.Log. 2 Bit-Level Modeling Control Logic • Represent Every Bit of State Individually • Behavior expressed as Boolean next-state over current state • Historic method for most CAD, testing, and verification tools • E.g., model checkers

  4. Bit-Level Modeling in Practice • Strengths • Allows precise modeling of system • Well developed technology • BDDs & SAT for Boolean reasoning • Limitations • Every state bit introduces two Boolean variables • Current state & next state • Overly detailed modeling of system functions • Don’t want to capture full details of FPU • Making It Work • Use extensive abstraction to reduce bit count • Hard to abstract functionality

  5. x Word-Level Abstraction #1: Bits → Integers x0 • View Data as Symbolic Words • Arbitrary integers • No assumptions about size or encoding • Classic model for reasoning about software • Can store in memories & registers x1 x2 xn-1

  6. Data Path Data Path Com. Log. 1 Com. Log. 1 ? Com.Log. 2 Com. Log. 1 ? What do we do about logic functions? Abstracting Data Bits Control Logic

  7. ALU Word-Level Abstraction #2: Uninterpreted Functions • For any Block that Transforms or Evaluates Data: • Replace with generic, unspecified function • Only assumed property is functional consistency: a = x b = y f(a, b) = f(x, y) f

  8. F1 F2 Abstracting Functions Control Logic • For Any Block that Transforms Data: • Replace by uninterpreted function • Ignore detailed functionality • Conservative approximation of actual system Data Path Com. Log. 1 Com. Log. 1

  9. Word-Level Modeling: History • Historic • Used by theorem provers • More Recently • Burch & Dill, CAV ’94 • Verify that pipelined processor has same behavior as unpipelined reference model • Use word-level abstractions of data paths and memories • Use decision procedure to determine equivalence • Bryant, Lahiri, Seshia, CAV ’02 • UCLID verifier • Tool for describing & verifying systems at word level

  10. Pipeline Verification Example Pipelined Processor Reference Model

  11. Abstracted Pipeline Verification Pipelined Processor Reference Model

  12. Experience with Word-Level Modeling • Powerful Abstraction Tool • Allows focus on control of large-scale system • Can model systems with very large memories • Hard to Generate Abstract Model • Hand-generated: how to validate? • Automatic abstraction: limited success • Andraus & Sakallah, DAC 2004 • Realistic Features Break Abstraction • E.g., Set ALU function to A+0 to pass operand to output • Desire • Should be able to mix detailed bit-level representation with abstracted word-level representation

  13. Bit Vectors: Motivating Example #1 • Do these functions produce identical results? • Strategy • Represent and reason about bit-level program behavior • Specific to machine word size, integer representations, and operations int abs(int x) { int mask = x>>31; return (x ^ mask) + ~mask + 1; } int test_abs(int x) { return (x < 0) ? -x : x; }

  14. Motivating Example #2 void fun() { char fmt[16]; fgets(fmt, 16, stdin); fmt[15] = '\0'; printf(fmt); } • Is there an input string that causes value 234 to be written to address a4a3a2a1? • Answer • Yes: "a1a2a3a4%230g%n" • Depends on details of compilation • But no exploit for buffer size less than 8 • [Ganapathy, Seshia, Jha, Reps, Bryant, ICSE ’05]

  15. Motivating Example #3 bit[W] popSpec(bit[W] x) { int cnt = 0; for (int i=0; i<W; i++) { if (x[i]) cnt++; } return cnt; } bit[W] popSketch(bit[W] x) { loop (??) { x = (x&??) + ((x>>??)&??); } return x; } • Is there a way to expand the program sketch to make it match the spec? • Answer • W=16: • [Solar-Lezama, et al., ASPLOS ‘06] x = (x&0x5555) + ((x>>1)&0x5555); x = (x&0x3333) + ((x>>2)&0x3333); x = (x&0x0077) + ((x>>8)&0x0077); x = (x&0x000f) + ((x>>4)&0x000f);

  16. Motivating Example #4 Sequential Reference Model Pipelined Microprocessor • Is pipelined microprocessor identical to sequential reference model? • Strategy • Represent machine instructions, data, and state as bit vectors • Compatible with hardware description language representation • Verifier finds abstractions automatically

  17. Bit Vector Formulas • Fixed width data words • Arithmetic operations • Add/subtract/multiply/divide, … • Two’s complement, unsigned, … • Bit-wise logical operations • Bitwise and/or/xor, shift/extract, concatenate • Predicates • ==, <= • Task • Is formula satisfiable? • E.g., a > 0 && a*a < 0 50000 * 50000 = -1794967296 (on 32-bit machine)

  18. Decision Procedure Satisfying solution Formula Unsatisfiable (+ proof) Decision Procedures • Core technology for formal reasoning • Boolean SAT • Pure Boolean formula • SAT Modulo Theories (SMT) • Support additional logic fragments • Example theories • Linear arithmetic over reals or integers • Functions with equality • Bit vectors • Combinations of theories

  19. Recent Progress in SAT Solving

  20. BV Decision Procedures:Some History • B.C. (Before Chaff) • String operations (concatenate, field extraction) • Linear arithmetic with bounds checking • Modular arithmetic • Limitations • Cannot handle full range of bit-vector operations

  21. BV Decision Procedures:Using SAT • SAT-Based “Bit Blasting” • Generate Boolean circuit based on bit-level behavior of operations • Convert to Conjunctive Normal Form (CNF) and check with best available SAT checker • Handles arbitrary operations • Effective in Many Applications • CBMC [Clarke, Kroening, Lerda, TACAS ’04] • Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’05] • CVC-Lite [Dill, Barrett, Ganesh], Yices [deMoura, et al]

  22. Bit-Vector Challenge • Is there a better way than bit blasting? • Requirements • Provide same functionality as with bit blasting • Find abstractions based on word-level structure • Improve on performance of bit blasting • Observation • Must have bit blasting at core • Only approach that covers full functionality • Want to exploit special cases • Formula satisfied by small values • Simple algebraic properties imply unsatisfiability • Small unsatisfiable core • Solvable by modular arithmetic • …

  23. Some Recent Ideas • Iterative Approximation • UCLID: Bryant, Kroening, Ouaknine, Seshia, Strichman, Brady, TACAS ’07 • Use bit blasting as core technique • Apply to simplified versions of formula • Successive approximations until solve or show unsatisfiable • Using Modular Arithmetic • STP: Ganesh & Dill, CAV ’07 • Algebraic techniques to solve special case forms • Layered Approach • MathSat: Bruttomesso, Cimatti, Franzen, Griggio, Hanna, Nadel, Palti, Sebastiani, CAV ’07 • Use successively more detailed solvers

  24. + Overapproximation More solutions: If unsatisfiable, then so is    + Fewer solutions: Satisfying solution also satisfies  −   Underapproximation − Iterative Approach Background: Approximating Formula • Example Approximation Techniques • Underapproximating • Restrict word-level variables to smaller ranges of values • Overapproximating • Replace subformula with Boolean variable  Original Formula

  25. Starting Iterations • Initial Underapproximation • (Greatly) restrict ranges of word-level variables • Intuition: Satisfiable formula often has small-domain solution  1−

  26. 1+ UNSAT proof: generate overapproximation If SAT, then done First Half of Iteration • SAT Result for 1− • Satisfiable • Then have found solution for  • Unsatisfiable • Use UNSAT proof to generate overapproximation 1+ • (Described later)  1−

  27. If UNSAT, then done SAT: Use solution to generate refined underapproximation 2− Second Half of Iteration • SAT Result for 1+ • Unsatisfiable • Then have shown  unsatisfiable • Satisfiable • Solution indicates variable ranges that must be expanded • Generate refined underapproximation 1+  1−

  28. Iterative Behavior 2+ 1+ • Underapproximations • Successively more precise abstractions of  • Allow wider variable ranges • Overapproximations • No predictable relation • UNSAT proof not unique    k+  k−    2− 1−

  29. 2+ 1+    UNSAT k+  k−    2− 1− SAT Overall Effect • Soundness • Only terminate with solution on underapproximation • Only terminate as UNSAT on overapproximation • Completeness • Successive underapproximations approach  • Finite variable ranges guarantee termination • In worst case, get k−  

  30. 1+ UNSAT proof: generate overapproximation  1− 2− Generating Overapproximation • Given • Underapproximation 1− • Bit-blasted translation of 1− into Boolean formula • Proof that Boolean formula unsatisfiable • Generate • Overapproximation 1+ • If 1+ satisfiable, must lead to refined underapproximation • Generate 2− such that 1−  2−  

  31. x = y x + 2 z 1 a w & 0xFFFF = x x % 26 = v Bit-Vector Formula Structure • DAG representation to allow shared subformulas 

  32. x = y x + 2 z 1 w Range Constraints x y a z w & 0xFFFF = x x % 26 = v Æ Structure of Underapproximation • Linear complexity translation to CNF • Each word-level variable encoded as set of Boolean variables • Additional Boolean variables represent subformula values −

  33. Range Constraints w x Encoding Range Constraints • Explicit • View as additional predicates in formula • Implicit • Reduce number of variables in encoding Constraint Encoding 0 w 8 0 0 0 ··· 0 w2w1w0 −4x  4 xsxsxs··· xsxsx1x0 • Yields smaller SAT encodings 0 w 8  −4x  4

  34. w Range Constraints x y z Ç Æ : Æ Ç UNSAT Proof • Subset of clauses that is unsatisfiable • Clause variables define portion of DAG • Subgraph that cannot be satisfied with given range constraints x = y x + 2 z 1 a Æ w & 0xFFFF = x Ç x % 26 = v

  35. w Range Constraints x y z Ç Æ : Æ Ç Extracting Circuit from UNSAT Proof • Subgraph that cannot be satisfied with given range constraints • Even when replace rest of graph with unconstrained variables x = y x + 2 z 1 UNSAT a Æ b1 b2

  36. Ç Æ : Ç Generated Overapproximation • Remove range constraints on word-level variables • Creates overapproximation • Ignores correlations between values of subformulas x = y x + 2 z 1 a 1+ Æ b1 b2

  37. w Range Constraints x y z Ç Æ : Æ Ç Refinement Property • Claim • 1+ has no solutions that satisfy 1−’s range constraints • Because 1+ contains portion of 1− that was shown to be unsatisfiable under range constraints x = y x + 2 z 1 UNSAT a Æ 1+ b1 b2

  38. Ç Æ : Ç Refinement Property (Cont.) • Consequence • Solving 1+ will expand range of some variables • Leading to more exact underapproximation 2− x = y x + 2 z 1 a 1+ Æ b1 b2

  39. SAT: Use solution to generate refined underapproximation 2− Effect of Iteration • Each Complete Iteration • Expands ranges of some word-level variables • Creates refined underapproximation 1+ UNSAT proof: generate overapproximation  1−

  40. Approximation Methods • So Far • Range constraints • Underapproximate by constraining values of word-level variables • Subformula elimination • Overapproximate by assuming subformula value arbitrary • General Requirements • Systematic under- and over-approximations • Way to connect from one to another • Goal: Devise Additional Approximation Strategies

  41. x * y Function Approximation Example • Motivation • Multiplication (and division) are difficult cases for SAT • §: Prohibit Via Additional Range Constraints • Gives underapproximation • Restricts values of (possibly intermediate) terms • §: Abstract as f(x,y) • Overapproximate as uninterpreted function f • Value constrained only by functional consistency

  42. Results: UCLID BV vs. Bit-blasting • UCLID always better than bit blasting • Generally better than other available procedures • SAT time is the dominating factor [results on 2.8 GHz Xeon, 2 GB RAM]

  43. Challenges with Iterative Approximation • Formulating Overall Strategy • Which abstractions to apply, when and where • How quickly to relax constraints in iterations • Which variables to expand and by how much? • Too conservative: Each call to SAT solver incurs cost • Too lenient: Devolves to complete bit blasting. • Predicting SAT Solver Performance • Hard to predict time required by call to SAT solver • Will particular abstraction simplify or complicate SAT? • Combination Especially Difficult • Multiple iterations with unpredictable inner loop

  44. STP: Linear Equation Solving • Ganesh & Dill, CAV ’07 • Solve linear equations over integers mod 2w • Capture range of solutions with Boolean Variables • Example Problem • Variables: 3-bit unsigned integers x: [x2 x1 x0] y: [y2 y1 y0] z: [z2 z1 z0] • Linear equations: conjunction of linear constraints • General Form • A x = b mod 2w

  45. Solution Method • Equations • Some Number Theory • Odd number has multiplicative inverse mod 2w • Mod 8: 3−1 = 3 • Additive inverse mod 2w: −x = 2w− x • Mod 8: −4 = 4 −2 = 6 • Solve first equation for x:

  46. Solution Method (cont.) • Substitutions

  47. What if All Coefficients Even? • Result of Substitutions • Even numbers do not have multiplicative inverses mod 8 • Observation • Can divide through and reduce modulus

  48. General Solutions • Original variables: 3-bit unsigned integers x: [x2 x1 x0] y: [y2 y1 y0] z: [z2 z1 z0] • Solutions • Constrained variables y: [y2 1 1] z: [z2 1 0] • Back Substitution • Constrained variables x: [0 0 0]

  49. Linear Equation Solutions • Equations • Encoding All Possible Solutions • x: [0 0 0] y: [y2 1 1] z: [z2 1 0] • y2, z2 arbitrary Boolean variables • 4 possible solutions (out of original 512) • General Principle • Form of LU decomposition • Polynomial time algorithm • Boolean variables in solution to express set of solutions • Only works when have conjunction of linear constraints

  50. Layered Solver • Bruttomesso, et al, CAV ’07 • Part of MathSAT project • DPLL(T) Framework • SAT solver coupled with solver for mathematical theory T • BV theory solver works with conjunctions of constraints

More Related