Spring 2014 Program Analysis and Verification Lecture 11: Abstract Interpretation III

Spring 2014Program Analysis and Verification Lecture 11: Abstract Interpretation III Roman Manevich Ben-Gurion University

Syllabus

Previously Solving monotone systems Fixed-points Vanilla static analysis algorithm Chaotic iteration

Static analysis • R[0] = {xZ} // established input • R[1] = R[0]  R[4] • R[2] = assume x>0 R[1] • R[3] = assume x0 R[1] • R[4] = x:=x-1 R[2] • R[0]# = {xZ}# • R[1]# = R[0]  R[4] • R[2]# = assume x>0#R[1] • R[3]# = assume x0#R[1] • R[4]# = x:=x-1#R[2] • Given a system of equationsfor the collecting semanticsA static analysis solves a corresponding system of equations over an abstract domain • Questions: • How do you solve the second system? Chaotic Iteration • What is the relation between the solutions?This lecture

Monotone function L1 L2 f  f  y f(y) f(x) 2 3 4 x 1

Important cases of monotonicity • Join: f(X, Y) = X  Y is monotone in each operand • Prove it! • Set lifting function: for a set X and any function gF(X) = { g(x) | x X } is monotone w.r.t.  • Prove it! • Notice that the collecting semantics function is defined in terms of • Join (set union) • Semantic function for atomic statements lifted to sets of states • Conclusion: collecting semantics function is monotone

Fixed points  Red(f) gfp Fix(f) lfp Ext(f) fn()  • Does a solution always exist? Yes • If so, is it unique? No, but it has least/greatest solutions • If so, is it computable? Under some conditions… • L = (D, , , , , ) • f : DDmonotone • Fix(f) = { d | f(d) = d } • Red(f) = { d | f(d)  d } • Ext(f) = { d | d  f(d) } • Theorem [Tarski 1955] • lfp(f) = Fix(f) = Red(f)  Fix(f) • gfp(f) = Fix(f) = Ext(f)  Fix(f)

Continuous functions • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • A function f is continuous if for every increasing chain Y  D*, f(Y) = { f(y) | yY } • Lemma: if f is continuous then f is monotone • Proof:assume x yTherefore xy=yThen f(y) = f(xy) = f(x)  f(y), which means f(x)  f(y)

Continuous functions • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • A function f is continuous if for every increasing chain Y  D*, f(Y) = { f(y) | yY } • Lemma: if f is continuous then f is monotone • Proof: assume x yTherefore xy=yThen f(y) = f(xy) = f(x)  f(y), which means f(x)  f(y)

Kleene’s fixed point theorem • Let L = (D, , , ) be a complete partial order and a continuous function f: DD thenlfp(f) = nNfn() • That is, take the ascending chain  f()  f(f())  …  fn()  …and return the supremum • Why is this an ascending chain? • But how do you know if a function f is continuous

Continuity and ACC condition • Let L = (D, , , ) be a complete partial order • Every ascending chain has an upper bound • L satisfies the ascending chain condition (ACC) if every ascending chain eventually stabilizes:d0 d1  …  dn = dn+1 = dn+2 = … • Lemma: Monotone functions on posets satisfying ACC are continuous

Resulting algorithm  Mathematical definition lfp(f) = nNfn() lfp fn() Algorithm d := whilef(d)  ddod := f(d)returnd … f2() f()  Kleene’s fixed point theorem gives a constructive method for computing lfp(f) over a poset with ACC when f is monotone

Vanilla algorithm Non-incremental. Most variables don’t change. Problem Definition: • Lattice of properties L of finite height (ACC) • For each statement define a monotone transformer Preparation: • Parse program into AST • Convert AST into CFG • Generate system of equations from CFG Analysis: • Initialize each analysis variable with  • Update all analysis variables of each equation until reaching a fixed point

Chaotic iteration fori:=1 to n do X[i] := WL = {1,…,n}while WL  do j := pop WL // choose index non-deterministically N := F[i](X) if N  X[i] then X[i] := Nadd all the indexes that directly depend on i to WL (X[j] depends on X[i] if F[j] contains X[i])return X • Input: • A cpoL = (D, , , ) satisfying ACC • Ln = LL … L • A monotone function f : DnDn • A system of equations { X[i] | f(X) | 1  i  n} • Output: lfp(f) • A worklist-based algorithm

Required knowledge • Collecting semantics • Abstract semantics (over lattices) • Algorithm to compute abstract semantics(chaotic iteration) • Connection between collecting semantics and abstract semantics • Abstract transformers

Today Galois connections Abstract transformers Global soundness

Recap We defined a reference semantics – the collecting semantics We defined an abstract semantics for a given lattice and abstract transformers We defined an algorithm to compute abstract least fixed-point when transformers are monotone and lattice obeys ACC Questions: What is the connection between the two least fixed-points? Transformer monotonicity is required for termination – what should we require for correctness?

Recap We defined a reference semantics – the collecting semantics We defined an abstract semantics for a given lattice and abstract transformers We defined an algorithm to compute abstract least fixed-point when transformers are monotone and lattice obeys ACC Questions: Does the algorithm terminate? What is the connection between the two least fixed-points? Transformer monotonicity is required for termination – what should we require for correctness?

Handling non-monotone transformers Mathematical definition lfp(f) = nNfn() Algorithm d := whilef(d)  ddod := f(d)returnd Kleene’s fixed point theorem gives a constructive methodfor computing lfp(f) over a poset with ACC when f is monotone Monotonicity ensures   f()  … fn()  …is an ascending chain What if f is not necessarily monotone? How can we ensure termination?

Handling non-monotone transformers Mathematical definition lfp(f) = nNfn()  nNf’n() Revised algorithm d := whilef’(d)  ddod := f’(d)returnd Define f’(d) = d f(d) Now f’ is extensive:d d f(d) = f’(d) and so   f’()  … f’n()  …is an ascending chain Result is not necessarily the least fixed point – we get a (post)fixed point in finite time (ACC)

Relating the concrete domainand the abstract domain

Galois Connection • Given two complete latticesC = (DC, C, C, C, C, C) – concrete domainA = (DA, A, A, A, A, A) – abstract domain • A Galois Connection (GC) is quadruple (C, , , A)that relates C and A via the monotone functions • The abstraction function  : DC DA • The concretization function  : DA DC • For every concrete element cDCand abstract element aDA((a)) Aa and cC ((c)) • Alternatively (c) AaiffcC(a)

Galois Connection: cC ((c)) C A The most precise (least) element in A representing c  ((c)) 3  (c) 2 c  1

Galois Connection: ((a)) Aa What a represents in C(its meaning) C A  a (a) 1 2  ((a))  3

Example: lattice of equalities • Concrete lattice:C = (2State, , , , , State) • Abstract lattice:EQ = { x=y | x, y Var}A = (2EQ, , , , EQ , ) • Treat elements of A as both formulas and sets of constraints • Useful for copy propagation – a compiler optimization • (X) = ?(Y) = ?

Example: lattice of equalities • Concrete lattice:C = (2State, , , , , State) • Abstract lattice:EQ = { x=y | x, y Var}A = (2EQ, , , , EQ , ) • Treat elements of A as both formulas and sets of constraints • Useful for copy propagation – a compiler optimization • () = ({}) = { x=y | x = y} that is  x=y(X) = {() |  X} = A{() |  X}(Y) = { | Y} = models(Y)

Galois Connection: cC ((c)) C A …[x6, y6, z6][x5, y5, z5][x4, y4, z4] … x=x, y=y, z=z 4   3 x=x, y=y, z=z,x=y, y=x,x=z, z=x,y=z, z=y  2  1 [x5, y5, z5] The most precise (least) element in A representing [x5, y5, z5]

Most precise abstract representation (c) = {c’ | c  (c’)} C A 6 7  4 2 5   3 (c)  8 9  c 1

Most precise abstract representation (c) = {c’ | c  (c’)} C A x=y 6 7 x=y, z=y  x=y, y=z 4 2 5  3 (c)= x=x, y=y, z=z, x=y, y=x, x=z, z=x, y=z, z=y  8 9  c 1 [x5, y5, z5]

Galois Connection: ((a)) Aa What a represents in C(its meaning) C A …[x6, y6, z6][x5, y5, z5][x4, y4, z4] …    is called a semanticreduction 1 x=y, y=z 2   3 x=x, y=y, z=z,x=y, y=x,x=z, z=x,y=z, z=y

Partial reduction • The operator    is called a semantic reduction since((a)) means the same a a but it is a reduced – more precise version of a • An operator reduce : DA DAis a partial reduction if • reduce(a) Aaand • (a)=(reduce(a))

Galois Insertion a: ((a))=a How can we obtain a Galois Insertion from a Galois Connection? C A …[x6, y6, z6][x5, y5, z5][x4, y4, z4] … All elementsare reduced  1  2 x=x, y=y, z=z,x=y, y=x,x=z, z=x,y=z, z=y

Special cases

Properties of a Galois Connection The abstraction and concretization functions uniquely determine each other:(a) = {c | (c)  a}(c) = {a | c  (a)}

Abstracting (disjunctive) sets It is usually convenient to first define the abstraction of single elements(s) = ({s}) Then lift the abstraction to sets of elements (X) = A{(s) | sX}

The case of symbolic domains An important class of abstract domains are symbolic domains – domains of formulas C = (2State, , , , , State)A = (DA, A, A, A, A, A) If DA is a set of formulas then the abstraction of a state is defined as() = ({}) = A{ |  }the least formula from DA that s satisfies The abstraction of a set of states is(X) = A{() | sX} The concretization is() = { |  } = models()

Composing Galois connections

Inducing along the connections Assume the complete latticesC = (DC, C, C, C, C, C) A = (DA, A, A, A, A, A)M = (DM, M, M, M, M, M)andGalois connectionsGCC,A=(C, C,A, A,C, A) and GCA,M=(A, A,M, M,A, M) Lemma: both connections induce the GCC,M= (C, C,M, M,C, M) defined by C,M = C,A A,M and M,C = M,A A,C

Inducing along the connections C A M A,C M,A c’ 4 5 a’=A,M(C,A(c)) 3 c C,A(c) C,A A,M 1 2

Relating abstract transformers to concrete transformers

Sound abstract transformer • Given two latticesC = (DC, C, C, C, C, C)A = (DA, A, A, A, A, A)and GCC,A=(C, , , A) with • A concrete transformer f : DC DCan abstract transformer f# : DA DA • We say that f#is a sound transformer (w.r.t. f) if • c: f(c)=c’ (f#(c))  (c’) • For every a and a’ such that (f((a)))A f#(a)

Transformer soundness condition 1 c: f(c)=c’ (f#(c))  (c’) C A f#  5 f 4 1 2 3

Transformer soundness condition 2 a: f#(a)=a’ f((a))  (a’) C A 4  f 5 1 2 f# 3

Best (induced) transformer f#(a)=(f((a))) C A f# 4 f 3 1 2 Problem:  incomputable directly

Best abstract transformer [CC’77] • Best in terms of precision • Most precise abstract transformer • May be too expensive to compute • Constructively defined asf# =  f   • Induced by the GC • Not directly computable because first step is concretization • We often compromise for a “good enough” transformer • Useful tool: partial concretization

Developing a sound abstract transformer by example

Transformer example C = (2State, , , , , State) EQ = { x=y | x, y Var}A = (2EQ, , , , EQ , ) () = ({}) = { x=y | x = y }that is  x=y(S) = {() |  S} = A{ () | S }() = { |  } = models() Concrete: x:=y S = { [x y] | S } Abstract: x:=y#S = ?

Developing a transformer for EQ - 1 • Input has the form S = {a=b} • sp(x:=expr, ) = v. x=expr[v/x] [v/x] • sp(x:=y, S) = v. x=y[v/x] S[v/x] = … • Let’s define helper notations: • Mod(x:=y, S) = {x=a, b=x  S} • Subset of equalities containing x (will be modified) • Frame(x:=y, S) = S \ Mod(x:=y, S) • Subset of equalities not containing x (i.e., the frame)

Developing a transformer for EQ - 2 • sp(x:=y, S) = v. x=y[v/x] {a=b}[v/x] = … • Two cases • x is y: sp(x:=x, S) = S • x is different from y:sp(x:=y, S)= v. x=yMod(x:=y, S)[v/x] Frame(x:=y, S)[v/x]= x=y Frame(x:=y, S)  v. Mod(x:=y, S)[v/x] x=y Frame(x:=y, S) • Vanilla transformer: x:=y#1X = x=y Frame(x:=y, S) • Example: x:=y#1{x=p, q=x, m=n} = {x=y, m=n}Is this the most precise result?

Developing a transformer for EQ - 3 • x:=y#1{x=p, x=q, m=n} = {x=y, m=n}  {x=y, m=n, p=q} • Where does the information p=q come from? • sp(x:=y, S) = x=y Frame(x:=y, S) v. Mod(x:=y, S)[v/x] • v. Mod(x:=y, S)[v/x] holds possible equalities between different a’s and b’s – how can we account for that?

Spring 2014 Program Analysis and Verification Lecture 11: Abstract Interpretation III

Spring 2014 Program Analysis and Verification Lecture 11: Abstract Interpretation III

Presentation Transcript

Spring 2013 Program Analysis and Verification Lecture 1: Introduction

Spring 2014 Program Analysis and Verification Lecture 1: Introduction

Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II

Spring 2014 Program Analysis and Verification Lecture 13: Abstract Interpretation V

Spring 2014 Program Analysis and Verification Lecture 5: Axiomatic Semantics II

Practical verification with abstract interpretation

Spring 2014 Program Analysis and Verification Lecture 6: Axiomatic Semantics III

Static Analysis with Abstract Interpretation

Spring 2014 Program Analysis and Verification Lecture 14: Numerical Abstractions

Spring 2014 Program Analysis and Verification Lecture 4: Axiomatic Semantics I

Spring 2014 Program Analysis and Verification Lecture 8: Static Analysis II

Spring 2014 Program Analysis and Verification Lecture 9: Abstract Interpretation I

Lecture 11 Abstract Interpretation on Control-Flow Graphs

Spring 2014 Program Analysis and Verification Lecture 12: Abstract Interpretation IV

Iterative Program Analysis Abstract Interpretation

Iterative Program Analysis Abstract Interpretation

Spring 2014 Program Analysis and Verification Lecture 7: Static Analysis I

Abstract interpretation

Iterative Program Analysis Abstract Interpretation

Spring 2014 Program Analysis and Verification Lecture 2: Operational Semantics I

Purity Analysis : Abstract Interpretation Formulation

Abstract Interpretation and Future Program Analysis Problems