930 likes | 1.09k Views
Modeling Data in Formal Verification Bits , Bit Vectors, or Words. Karam AbdElkader Based on: Presentations form Randal E. Bryant - Carnegie Mellon University Decision Procedures An Algorithmic Point of View D.Kroening – Oxsoford Unversity , O.Strichman - Technion. Agenda.
E N D
Modeling Data in Formal VerificationBits, Bit Vectors, or Words • Karam AbdElkader • Based on: Presentations form • Randal E. Bryant - Carnegie Mellon University • Decision Procedures An Algorithmic Point of View D.Kroening – OxsofordUnversity, O.Strichman - Technion
Agenda • Overview and Examples. • Introduction to Bit-Vector Logic • Syntax • Semantics • Decision procedures for Bit-Vector Logic • Flattening Bit-Vector Logic • Incremental Flattening • Bit-Vector Arithmetic With Abstraction 2
Over View • Issue • How should data be modeled in formal analysis? • Verification, test generation, security analysis, … • Approaches • Bits: Every bit is represented individually • Basis for most CAD, model checking • Words: View each word as arbitrary value • E.g., unbounded integers • Historic program verification work • Bit Vectors: Finite precision words • Captures true semantics of hardware and software • More opportunities for abstraction than with bits
Data Path Com. Log. 1 Com.Log. 2 Bit-Level Modeling Control Logic • Represent Every Bit of State Individually • Behavior expressed as Boolean next-state over current state • Historic method for most CAD, testing, and verification tools • E.g., model checkers
Bit-Level Modeling in Practice • Strengths • Allows precise modeling of system • Well developed technology • BDDs & SAT for Boolean reasoning • Limitations • Every state bit introduces two Boolean variables • Current state & next state • Overly detailed modeling of system functions • Don’t want to capture full details of FPU • Making It Work • Use extensive abstraction to reduce bit count • Hard to abstract functionality
x Word-Level Abstraction #1: Bits → Integers x0 • View Data as Symbolic Words • Arbitrary integers • No assumptions about size or encoding • Classic model for reasoning about software • Can store in memories & registers x1 x2 xn-1
Data Path Data Path Com. Log. 1 Com. Log. 1 ? Com.Log. 2 Com. Log. 1 ? What do we do about logic functions? Abstracting Data Bits Control Logic
ALU Word-Level Abstraction #2:Uninterpreted Functions • For any Block that Transforms or Evaluates Data: • Replace with generic, unspecified function • Only assumed property is functional consistency: a = x b = y f(a, b) = f(x, y) f
F1 F2 Abstracting Functions Control Logic • For Any Block that Transforms Data: • Replace by uninterpreted function • Ignore detailed functionality • Conservative approximation of actual system Data Path Com. Log. 1 Com. Log. 1
Word-Level Modeling: History • Historic • Used by theorem provers • More Recently • Burch & Dill, CAV ’94 • Verify that pipelined processor has same behavior as unpipelined reference model • Use word-level abstractions of data paths and memories • Use decision procedure to determine equivalence • Bryant, Lahiri, Seshia, CAV ’02 • UCLID verifier • Tool for describing & verifying systems at word level
Pipeline Verification Example Pipelined Processor Reference Model
Abstracted Pipeline Verification Pipelined Processor Reference Model
Experience with Word-Level Modeling • Powerful Abstraction Tool • Allows focus on control of large-scale system • Can model systems with very large memories • Hard to Generate Abstract Model • Hand-generated: how to validate? • Automatic abstraction: limited success • Andraus & Sakallah, DAC 2004 • Realistic Features Break Abstraction • E.g., Set ALU function to A+0 to pass operand to output • Desire • Should be able to mix detailed bit-level representation with abstracted word-level representation
Bit Vectors: Motivating Example #1 • Do these functions produce identical results? • Strategy • Represent and reason about bit-level program behavior • Specific to machine word size, integer representations, and operations int abs(int x) { int mask = x>>31; return (x ^ mask) + ~mask + 1; } int test_abs(int x) { return (x < 0) ? -x : x; }
Motivating Example #2 void fun() { char fmt[16]; fgets(fmt, 16, stdin); fmt[15] = '\0'; printf(fmt); } • Is there an input string that causes value 234 to be written to address a4a3a2a1? • Answer • Yes: "a1a2a3a4%230g%n" • Depends on details of compilation • But no exploit for buffer size less than 8 • [Ganapathy, Seshia, Jha, Reps, Bryant, ICSE ’05]
Motivating Example #3 bit[W] popSpec(bit[W] x) { int cnt = 0; for (int i=0; i<W; i++) { if (x[i]) cnt++; } return cnt; } bit[W] popSketch(bit[W] x) { loop (??) { x = (x&??) + ((x>>??)&??); } return x; } • Is there a way to expand the program sketch to make it match the spec? • Answer • W=16: • [Solar-Lezama, et al., ASPLOS ‘06] x = (x&0x5555) + ((x>>1)&0x5555); x = (x&0x3333) + ((x>>2)&0x3333); x = (x&0x0077) + ((x>>8)&0x0077); x = (x&0x000f) + ((x>>4)&0x000f);
Motivating Example #4 Sequential Reference Model Pipelined Microprocessor • Is pipelined microprocessor identical to sequential reference model? • Strategy • Represent machine instructions, data, and state as bit vectors • Compatible with hardware description language representation • Verifier finds abstractions automatically
Decision Procedures for System-Level Software • What kind of logic do we need for system-level software? • We need bit-vector logic - with bit-wise operators, arithmeticoverflow • We want to scale to large programs - must verify largeformulas
Decision Procedures for System-Level Software • What kind of logic do we need for system-level software? • We need bit-vector logic - with bit-wise operators, arithmeticoverflow • We want to scale to large programs - must verify largeformulas • Examples of program analysis tools that generate bit-vectorformulas: • CBMC • SATABS • SATURN (Stanford, Alex Aiken) • EXE (Stanford, Dawson Engler, David Dill) • Variants of those developed at IBM, Microsoft
Bit-Vector Logic: Syntax formula : formula ∨ formula | ¬formula | atom atom : term rel term | Boolean-Identifier | term[ constant ] : = | < rel ∼ term | constant | term : term op term | identifier | atom?term:term | term[ constant : constant ] | ext ( term ) << | >> | & | | | ⊕ | ◦ op : +| − | · |/| ∼ x: bit-wise negation of x ext (x): sign- or zero-extension of xx << d: left shift with distance dx ◦ y: concatenation of x and y
Semantics Danger! (x − y > 0) if and only if (x > y) Valid over R/N, but not over the bit-vectors. (Many compilers have this sort of bug)
Width and Encoding The meaning depends on the width and encoding of thevariables. 7
Width and Encoding The meaning depends on the width and encoding of thevariables. Typical encodings: Binary encoding Two’s complement But maybe also fixed-point, floating-point, . . . 7
Width and Encoding Notation to clarify width and encoding:
Bit-vectors Made Formal Definition (Bit-Vector) A bit-vector is a vector of Boolean values with a given length l: b : {0,...,l − 1} → {0,1}
Bit-vectors Made Formal Definition (Bit-Vector) Definition (Bit-Vector) A bit-vector is a vector of Boolean values with a given length l: b : {0,...,l − 1} → {0,1} The value of bit number i of x is x(i). We also writefor
Lambda-Notation for Bit-Vectors λ expressions are functions without a name
Lambda-Notation for Bit-Vectors Examples: λ expressions are functions without a name • The vector of length l that consists of zeros: • A function that inverts (flips all bits in) a bit-vector: • A bit-wise OR: ⇒ we now have semantics for the bit-wise operators.
Semantics for Arithmetic Expressions What is the output of the following program? unsigned char number = 200; number = number + 100; printf("Sum: %d\n", number);
Semantics for Arithmetic Expressions Semantics for Arithmetic Expressions What is the output of the following program? unsigned char number = 200; number = number + 100; printf("Sum: %d\n", number); On most architectures, this is 44! = 200 11001000 + = 100 01100100 = 00101100 = 44
Semantics for Arithmetic Expressions Semantics for Arithmetic Expressions What is the output of the following program? unsigned char number = 200; number = number + 100; printf("Sum: %d\n", number); On most architectures, this is 44! = 200 11001000 + = 100 01100100 = 00101100 = 44 ⇒ Bit-vector arithmetic uses modular arithmetic!
Semantics for Arithmetic Expressions Semantics for addition, subtraction:
Semantics for Arithmetic Expressions Semantics for addition, subtraction: We can even mix the encodings:
Semantics for Relational Operators Semantics for <, ≤, ≥, and so on: Mixed encodings: Note that most compilers don’t support comparisons with mixed encodings.
Complexity Complexity Satisfiability is undecidable for an unbounded width, evenwithout arithmetic.
Complexity Satisfiability is undecidable for an unbounded width, evenwithout arithmetic. It is NP-complete otherwise.
Decision Procedure Satisfying solution Formula Unsatisfiable (+ proof) Decision Procedures • Core technology for formal reasoning • Boolean SAT • Pure Boolean formula • SAT Modulo Theories (SMT) • Support additional logic fragments • Example theories • Linear arithmetic over reals or integers • Functions with equality • Bit vectors • Combinations of theories
BV Decision Procedures:Some History • B.C. (Before Chaff) • String operations (concatenate, field extraction) • Linear arithmetic with bounds checking • Modular arithmetic • Limitations • Cannot handle full range of bit-vector operations
BV Decision Procedures:Using SAT • SAT-Based “Bit Blasting” • Generate Boolean circuit based on bit-level behavior of operations • Convert to Conjunctive Normal Form (CNF) and check with best available SAT checker • Handles arbitrary operations • Effective in Many Applications • CBMC [Clarke, Kroening, Lerda, TACAS ’04] • Microsoft Cogent + SLAM [Cook, Kroening, Sharygina, CAV ’05] • CVC-Lite [Dill, Barrett, Ganesh], Yices [deMoura, et al]
A Simple Decision Procedure Transform Bit-Vector Logic to Propositional LogicMost commonly used decision procedureAlso called ’bit-blasting’
A Simple Decision Procedure A Simple Decision Procedure • Transform Bit-Vector Logic to Propositional LogicMost commonly used decision procedureAlso called ’bit-blasting’ • Bit-Vector Flattening 1. Convert propositional part as before 1 2. Add a Boolean variable for each bit of each sub-expression(term) 2 3. Add constraint for each sub-expression 3 We denote the new Boolean variable for biti of term t by. 17
Bit-vector Flattening What constraints do we generate for a given term?
Bit-vector Flattening What constraints do we generate for a given term? This is easy for the bit-wise operators. Example for (read x = y over bits as x y)
Bit-vector Flattening What constraints do we generate for a given term? This is easy for the bit-wise operators. Example for (read x = y over bits as x y) We can transform this into CNF using Tseitin’s method.
Flattening Bit-Vector Arithmetic How to flatten a + b?
Flattening Bit-Vector Arithmetic Flattening Bit-Vector Arithmetic How to flatten a + b? → we can build a circuit that adds them! The full adder in CNF: