930 likes | 1.04k Views
Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II. Roman Manevich Ben-Gurion University. Syllabus. Previously. Semantic domains Preorders Partial orders ( posets ) Pointed posets Ascending/descending chains The height of a poset
E N D
Spring 2014Program Analysis and Verification Lecture 10: Abstract Interpretation II Roman Manevich Ben-Gurion University
Previously • Semantic domains • Preorders • Partial orders (posets) • Pointed posets • Ascending/descending chains • The height of a poset • Join and Meet operators • Complete lattices • Constructing new lattices from old • Abstract Interpretation package – domains
A taxonomy of semantic domain types Join/Meet exist for every subset of D Join/Meet exist for every finite subset of D (alternatively, binary join/meet) Complete Lattice(D, , , , , ) Lattice(D, , , , , ) Meet of the empty set Join of the empty set Join semilattice(D, , , ) Meet semilattice(D, , , ) poset with LUB for all ascending chains Complete partial order (CPO)(D, , ) reflexivetransitiveanti-symmetric: d d’ and d’ d implies d = d’ Partial order (poset)(D, ) • reflexive: d dtransitive: d d’, d’ d’’ implies d d’’ Preorder(D, )
Cartesian product of complete lattices • For two complete lattices L1 = (D1, 1, 1, 1, 1, 1) L2 = (D2, 2, 2, 2, 2, 2) • Define the posetLcart = (D1D2, cart, cart, cart, cart, cart)as follows: • (x1, x2) cart (y1, y2) iffx1 1 y1 andx2 2 y2 • cart = ? cart = ? cart = ? cart = ? • Lemma: L is a complete lattice • Define the Cartesian constructor Lcart = Cart(L1, L2)
Disjunctive completion • For a complete lattice L = (D, , , , , ) • Define the powerset latticeL = (2D, , , , , ) = ? = ? = ? = ? = ? • Lemma: L is a complete lattice • L contains all subsets of D, which can be thought of as disjunctions of the corresponding predicates • Define the disjunctive completion constructorL = Disj(L)
Relational product of lattices • L1 = (D1, 1, 1, 1, 1, 1)L2 = (D2, 2, 2, 2, 2, 2) • Lrel = (2D1D2, rel, rel, rel, rel, rel)as follows: • Lrel = Disj(Cart(L1, L2)) • Lemma: L is a complete lattice
Finite maps • For a complete latticeL = (D, , , , , )and finite set V • Define the posetLVL = (VD, VL, VL, VL, VL, VL)as follows: • f1 VLf2iff for all vVf1(v) f2(v) • VL = ? VL = ? VL = ? VL = ? • Lemma: L is a complete lattice • Define the map constructor LVL = Map(V, L)
The collecting lattice Lattice for a given control-flow node v: Lv=(2State, , , , , State) Lattice for entire control-flow graph with nodes V: LCFG = Map(V, Lv) We will use this lattice as a baseline for static analysis and define abstractions of its elements
Software package: paver142 • Built on top of the Soot compiler framework for Java • Download from web-site • Includes all necessary Soot jar files
Example analyses Soot-specific utilities Infrastructurefor implementingstatic analysis
Today Solving monotone systems Fixed-points Vanilla static analysis algorithm Chaotic iteration
Abstract interpretation via abstraction generalizes axiomatic verification statement S abstract semantics abstract representationof sets of states abstract representationof sets of states abstract representationof sets of states abstraction abstraction statement S collecting semantics set of states set of states {P} S {Q} sp(S, P)
Abstract interpretation via concretization abstract representationof sets of states abstract representationof sets of states statement S abstract semantics concretization concretization set of states set of states set of states statement S collecting semantics models(P) {P} models(sp(S, P)) S models(Q) {Q}
Missing knowledge Collecting semantics Abstract semantics Connection between collecting semantics and abstract semantics Algorithm to compute abstract semantics
The collecting lattice (sets of states) Lattice for a given control-flow node v: Lv=(2State, , , , , State) Lattice for entire control-flow graph with nodes V: LCFG = Map(V, Lv) We will use this lattice as a baseline for static analysis and define abstractions of its elements
Collecting semantics as equation system Semantic function for assume x>0 Semantic function for x:=x-1 lifted to sets of states entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1 A vector of variables R[0, 1, 2, 3, 4] R[0] = {xZ} // established inputR[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2] A (recursive) system of equations
General definition entry R[0] R[1] if x > 0 R[3] R[2] R[4] exit x := x-1 • A vector of variables R[0, …, k] one per input/output of a node • R[0] is for entry • For node n with multiple predecessors add equationR[n] = {R[k] | k is a predecessor of n} • For an atomic operation node R[m] S R[n] add equationR[n] = S R[m] • Transform if bthenS1elseS2to (assumeb; S1) or (assumeb; S2)
Static analysis • R[0] = {xZ} // established input • R[1] = R[0] R[4] • R[2] = assume x>0 R[1] • R[3] = assume x0 R[1] • R[4] = x:=x-1 R[2] • R[0]# = {xZ}# • R[1]# = R[0] R[4] • R[2]# = assume x>0#R[1] • R[3]# = assume x0#R[1] • R[4]# = x:=x-1#R[2] • Given a system of equationsfor the collecting semanticsA static analysis solves a corresponding system of equations over an abstract domain • Questions: • What is the relation between the solutions?Next lecture • How do you solve the second system? This lecture
Equation systems in general For R[i]=f[i] R Usually f[i] reads only a small subset of R – D[i]. We say that R[i] depends on D[i] • R[0] = {xZ} // established input • R[1] = R[0] R[4] • R[2] = R[1] {s | s(x) > 0} • R[3] = R[1] {s | s(x) 0} • R[4] = x:=x-1 R[2] • Let L be a complete lattice (D, , , , , ) • Let R be a vector of analysis variables R[0, …, n] D… D • Let F be a vector of functions of the type F[i] : R[0, …, n] R[0, …, n] • A system of equationsR[0] = f[0](R[0], …, R[n])…R[n] = f[n](R[0], …, R[n]) • In vector notation R = F(R) • Questions: • Does a solution always exist? • If so, is it unique? • If so, is it computable?
Equation systems in general If it does – it is a fixed point of this equation • Let L be a complete lattice (D, , , , , ) • Let R be a vector of analysis variables R[0, …, n] D… D • Let F be a vector of functions of the type F[i] : R[0, …, n] R[0, …, n] • A system of equationsR[0] = f[0](R[0], …, R[n])…R[n] = f[n](R[0], …, R[n]) • In vector notation R = F(R) • Questions: • Does a solution always exist? • If so, is it unique? • If so, is it computable?
Monotone functions Let L1=(D1, ) and L2=(D2, ) be two posets A function f : D1D2 is monotone if for every pair x, y D1x y implies f(x) f(y) A special case: L1=L2=(D, ) f : DD
Monotone function L1 L2 f f y f(y) f(x) 2 3 4 x 1
Important cases of monotonicity • Join: f(X, Y) = X Y is monotone in each operand • Prove it! • Set lifting function: for a set X and any function gF(X) = { g(x) | x X } is monotone w.r.t. • Prove it! • Notice that the collecting semantics function is defined in terms of • Join (set union) • Semantic function for atomic statements lifted to sets of states • Conclusion: collecting semantics function is monotone
Extensive/reductive functions Let L=(D, ) be a poset A function f : DD is extensiveif for every x D, we have that x f(x) A function f : DD is reductiveif for every x D, we have that x f(x)
Fixed points Red(f) gfp Fix(f) lfp Ext(f) fn() • Does a solution always exist? Yes • If so, is it unique? No, but it has least/greatest solutions • If so, is it computable? Under some conditions… • L = (D, , , , , ) • f : DDmonotone • Fix(f) = { d | f(d) = d } • Red(f) = { d | f(d) d } • Ext(f) = { d | d f(d) } • Theorem [Tarski 1955] • lfp(f) = Fix(f) = Red(f) Fix(f) • gfp(f) = Fix(f) = Ext(f) Fix(f)
Fixed point example F(d) : Fixed point d xZ xZ 0 0 = entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x0} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2]
Pre-fixed point example F(d) : pre-fixed point d xZ xZ 0 0 entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x<-5} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2]
Post-fixed point example F(d) : post-fixed point d xZ xZ 0 0 entry entry 1 1 xZ if x>0 xZ if x>0 {x>0} {x>0} 3 2 4 3 2 4 exit x := x-1 exit x := x-1 {x<9} {x0} {x0} {x0} R[0] = {xZ}R[1] = R[0] R[4]R[2] = R[1] {s | s(x) > 0}R[3] = R[1] {s | s(x) 0}R[4] = x:=x-1 R[2]
Recap • A system of equations of the form R=F(R) where R draws its elements from a complete latticeL= (D, , , , , ) • Tarski’s fixed point theorem ensures us that there exists a least fixed point: lfp(f) = Fix(f) • However, it is not an algorithm since D is often infinite • Ineffective when D is finite • We need a more constructive way of computing lfp(f)
Continuous functions • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • A function f is continuous if for every increasing chain Y D*, f(Y) = { f(y) | yY} • Lemma: if f is continuous then f is monotone • Proof: assume x yTherefore xy=yThen f(y) = f(xy) = f(x) f(y), which means f(x) f(y)
Kleene’s fixed point theorem • Let L = (D, , , ) be a complete partial order and a continuous function f: DD thenlfp(f) = nNfn() • That is, take the ascending chain f() f(f()) … fn() …and return the supremum • Why is this an ascending chain? • But how do you know if a function f is continuous
Continuity and ACC condition • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes:d0 d1 … dn = dn+1 = dn+2 = … • Lemma: Monotone functions on posets satisfying ACC are continuousProof:We need to show thatf(Y) = { f(y) | yY } • Every ascending chain Y eventually stabilizes d0 d1 … dn = dn+1 = … hence dn is the least upper bound of {d0, d1, … , dn},thus f(Y) = f(dn) • From monotonicity of f we get thatf(d0) f(d1) … f(dn) = f(dn+1) = … Hence f(dn) is the least upper bound of {f(d0), f(d1), … , f(dn)},thus { f(y) | yY } = f(dn)
Resulting algorithm Mathematical definition lfp(f) = nNfn() lfp fn() Algorithm d := whilef(d) ddod := f(d)returnd … f2() f() Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone
Vanilla algorithm Non-incremental. Most variables don’t change. Problem Definition: • Lattice of properties L of finite height (ACC) • For each statement define a monotone transformer Preparation: • Parse program into AST • Convert AST into CFG • Generate system of equations from CFG Analysis: • Initialize each analysis variable with • Update all analysis variables of each equation until reaching a fixed point
Chaotic iteration fori:=1 to n do X[i] := WL = {1,…,n}while WL do j := pop WL // choose index non-deterministically N := F[i](X) if N X[i] then X[i] := Nadd all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i])return X • Input: • A cpoL = (D, , , ) satisfying ACC • Ln = LL … L • A monotone function f : DnDn • A system of equations { X[i] | f(X) | 1 i n} • Output: lfp(f) • A worklist-based algorithm
Chaotic iteration for static analysis • Specialize chaotic iteration for programs • Create a CFG for program • Choose a cpo of properties for the static analysis to infer: L = (D, , , ) • Define variables R[0,…,n] for input/output of each CFG node such that R[i]D • For each node v let vout be the variable at the output of that node:vout = F[v]( u | (u,v) is a CFG edge) • Make sure each F[v] is monotone • Variable dependence determined by outgoing edges in CFG