260 likes | 381 Views
Swerve: Semester in Review. Topics. Symbolic pointer analysis Model checking C programs Abstract counterexamples Symbolic simulation and execution Cousot: the Galois connection. Pointer Analysis (in 2001). P.A. Terminology. Context-sensitivity: do we take calling context into account
E N D
Topics • Symbolic pointer analysis • Model checking • C programs • Abstract counterexamples • Symbolic simulation and execution • Cousot: the Galois connection
P.A. Terminology • Context-sensitivity: do we take calling context into account • Doing so leads to very precise but very non-polynomial algorithms • Flow-sensitivity: sensitive to control flow • Equality = unification-based = Steensgaard • Almost linear, but not very precise • Subset = inclusion-based = Anderson • Polynomial but more precise • Sensitive analyses even more expensive
P.A.: Problem Formulation • Phase one: find constraints in the code • Depends on sensitivities (context, flow) • Examine stores, loads, etc. • Phase two: solve system of constraints for the complete points-to relation • Explicit: Steensgaard using union-find • Implicit: Anderson-style using BDDs
Pointer Analysis Example Input Relations vPointsTo(v1,h1) vPointsTo(v2,h2) Store(v1,f,v2) Load(v1,f,v3) Output Relations hPointsTo(h1,f,h2) vPointsTo(v3,h2) h1: v1 = new Object(); h2: v2 = new Object(); v1.f = v2; v3 = v1.f; v1 h1 f v2 h2 v3
Zhu: Symbolic P.A. • Points-to relation can be huge, but BDDs are great at implicitly representing relations
Berndl et al: Symbolic P.A. • Subset-based formulation using BDDs • Variable ordering experiments • Sets of heap objects (“pointed to”) tend to be large and regular: putting them together at the end of the ordering helps • Interleaving the bits for sets of variables (“pointers”) helps a little • In general, important to partition the bits of the different sets in the relations
Whaley & Lam: Datalog, bddbddb • All these symbolic pointer analyses are devoting a lot of implementation time to get the BDD part correct and fast • Datalog: a declarative language for expressing (possibly recursive) relations • bddbddb: a tool to convert Datalog operations (join, project, rename, recursion) into BDD operations • Points-to analyses can now be described much more concisely in Datalog
Inference Rule in Datalog Stores: hPointsTo(h1, f, h2) :- Store(v1, f, v2), vPointsTo(v1, h1), vPointsTo(v2, h2). v1.f = v2; v1 h1 f v2 h2
Whaley & Lam: With Context • Context sensitive analysis by cloning methods and doing a context insensitive analysis on the new call graph • Can use Datalog to express constraints necessary to determine the call graph • Cloned call graph is exponentially bigger, but clever encoding lets BDDs handle it well
CBMC: Prototype Tool ANSI-CModel convert = * + Chaff = CNF * + VHDL/VerilogProduct convert = * BV LogicDecisionProblem + Parsing andtype checking BV Logic(Tree) • Equivalence reduced to bit vector logic decision problem • Tool requires decision procedure for large bit vector problems • BV problems are HUGE – directly passed to Chaff in CNF
Explaining Counterexamples • Counterexamples provided by model checkers are often difficult to understand and locate within the code • Previous work: find a concrete execution “close to” the counterexample by some distance metric • This work: find an abstract execution—provides more meaningful explanations
Distance Metric • Execution = (state, action) sequence • State = (control location, predicate) • Metric: compare two executions a and b • Don’t just compare ai to bi since small changes in control flow can yield “misalignment” • Distance is defined as the number of changes (in predicates and actions) to convert a to b
Quasi-symbolic simulation • Symbolic simulation externally • scalar values internally • simulation run requires constant memory. • Key ideas • Don’t compute exact value unless necessary. • many don’t cares in large designs. • Trade time for memory. • Multiple runs to generate exact values. • Reliability of directed testing with efficiency closer to that of symbolic methods
Obeys law of excluded middle! Symbolic variable 0 b a -a a c X X X X X X X X X 0 b -a a c a Don’t care variables Conservative approximation “traditional” X value Don’t care logic Basic Algorithm & & & &
X 0 1 b 0 X b b a & b a=1 X X X X X X a=0 b b a 0 0 ? & 0 0 0 O Decision Procedure ? Test is Unsatisfiable! Variable selection heuristic: pick relevant variable by propagating from inputs.
BDDs with Approximate Values • Generic Approximate BDD apply algorithm. Approx_Apply(F,G) find top variable V compute L=left(F,G), R=right(F,G) if node(V,L,R) exists, return itelse if (want_exact(V,L,R)) create node (V,L,R) return nodeelse /* approximate */ return X
Classification Algorithm • Simulator’s classification • Care • Don’t Care • Algorithm • Initially, all variables are Don’t Care. • Simulate using sub-domain values only. • Re-classify 1 variable as Care. • Repeat until sufficient variables classified.
Review • What we’ve done: • Symbolic pointer analysis • Symbolic simulations and executions • Model checking • C programs • Abstract explanations • Where do we go from here?