Introduction to Abstract Interpretation

Introduction to Abstract Interpretation Neil Kettle, Andy King and Axel Simon a.m.king@kent.ac.uk http://www.cs.kent.ac.uk/~amk Acknowledgments: much of this material has been adapted from surveys by Patrick and Radia Cousot

Applications of abstract interpretation • Verification: can a concurrent program deadlock? Is termination assured? • Parallelisation: are two or more tasks independent? What is the worst/base-case running time of function? • Transformation: can a definition be unfolded? Will unfolding terminate? • Implementation: can an operation be specialised with knowledge of its (global) calling context? • Applications and “players” are incredibly diverse

House-keeping

Computing Lab Xmas Party • Located in Origins – the “restaurant” in Darwin • A buffer lunch will be served – courtesy of the department • Department will supply some wine (which last year lasted 10 minutes) • Bar will be open afterwards if some wine is not enough wine • Send an e-mail to Deborah Sowrey [D.J.Sowery@kent.ac.uk] if you want to attend • Come along and meet other post-grads

Casting out nines algorithm • Which of the following multiplications are correct: • 2173  38 = 81574 or • 2173  38 = 82574 • Casting out nines is a checking technique that is really a form of abstract interpretation: • Sum the digits in the multiplicand n1, multiplier n2 and the product n to obtain s1, s2 and s. • Divide s1, s2 and s by 9 to compute the remainder, that is, r1 = s1 mod 9, r2 = s2 mod 9 and r = s mod 9. • If (r1 r2) mod 9  r then multiplication is incorrect • The algorithm returns “incorrect” or “don’t know”

Running the numbers for 2173  38 = 81574 • Compute r1 = (2+1+7+3) mod 9 = … • Compute r2 = (3+8) mod 9 = … • Calculate (r1 r2) mod 9 = … • Calculate r = (8+1+5+7+4) mod 9 = … • Check ((r1 r2) mod 9 = r) = … • Deduce that 2173  38 = 81574 is …

Abstract interpretation is a theory of relationships • The computational domain for multiplication (concrete domain): • N – the set of non-negative integers • The computational domain of remainders used in the checking algorithm (abstract domain): • R = {0, 1, …, 8} • Key question is what is the relationship between an element nN which is used in the real algorithm and its analog rR in the check

What is the relationship? • When multiplicand is n1 = 456, say, then the check uses r1 = (4+5+6) mod 9 = 4 • Observe that • 456 mod 9 = • (4*100 + 56) mod 9 = • (4*90+ 4*10 + 56) mod 9 = • (4*10 + 56) mod 9 = • ((4 + 5)*10 + 6) mod 9 = • ((4 + 5)*9 + (4 + 5) + 6) mod 9 = • (4 + 5 + 6) mod 9 • More generally, induction can show r1= n1 mod 9 and r2 = n2 mod 9

Correctness is the preservation of relationships • The check simulates the concrete multiplication and, in effect, is an abstract multiplication • Concrete multiplication is n = n1 n2 • Abstract multiplication is r = (r1 r2) mod 9 • Where r1 describes n1 and r2 describes n2 • For brevity, write r  n iff r = n mod 9 • Then abstract multiplication preserves  iff whenever r1 n1 and r2 n2 it follows that r  n

Correctness argument • Suppose r1  n1 and r2  n2 • If • n = n1 n2 then • n mod 9 = (n1 n2) mod 9 hence • n mod 9 = ((n1 mod 9)  (n2 mod 9)) mod 9 whence • n mod 9 = (r1 r2) mod 9 = r therefore • r  n • Consequently if (r  n) then n  n1 n2

Summary • Formalise the relationship between the data • Check that the relationship is preserved by the abstract analogues of the concrete operations • The relational framework [Acta Informatica, 30(2):103-129,1993] not only emphases the theory of relations but is very general

Numeric approximation and widening Abstract interpretation does not require a domain to be finite

Interval approximation • Consider the following Pascal-like program • SYNTOX [PLDI’90] inferred the invariants scoped within {…} • Invariants occur between consecutive lines in the program • i[0,15] asserts 0i15 whereas i[0,0] means i=0 begin i := 0; {1: i[0,0]} while (i < 16) do {2: i[0,15]} i := i + 1 {3: i[1,16]} end {4: i[16,16]}

Compilation versus (classic) interpretation • Abstract compilation – compile the concrete program into an abstract program (equation system) and execute the abstract program: • good separation of concerns that aids debugging • the particulars of the domain can be exploited to reorder operations, specialise operations, etc • Abstract interpretation – run the concrete program but on-the-fly interpret its concrete operations as abstract operations: • ideal for a generic framework (toolkit) which is parameterised by abstract domain plugins

Abstract domain that is used in interval analysis • Domain of intervals includes: • [l,u] where l  u and l,u  Z for bounded sets ie [0, 5]{0,1,4} since {0,1,4}  [0, 5] •  to represent the empty set of numbers, that is,  • [l,] for sets which are bounded below such as {l,l+2,l+4,…} • [-,u] to represent sets which are bounded above such as {..,l-5,l-3,l}

Weakening intervals if … then … {1: i[0,2]} else … {2: i[3,5]} endif {3: i[0,5]} Join (path merge) is defined: • Put d1d2 = d1 if d2 =  • d2 else if d1 =  • [min(l1,l2), max(u1,u2)] otherwise • whenever d1 = [l1,u1] and d2 = [l2,u2]

Strengthening intervals Meet is defined: • Put d1d2 = if (d1 = )  (d2 = ) • [max(l1,l2), min(u1,u2)] otherwise • whenever d1 = [l1,u1] and d2 = [l2,u2] {3: i[0,5]} if (2 < i) then {4: i[3,5]} … else {5: i[0,2]} …

Meet and join are the basic primitives for compilation • I1= [0,0] since program point (1) immediately follows the i := 0 • I2= (I1 I3)  [-, 15] since: • control from program points (1) and (3) flow into (2) • point (2) is reached only if i < 16 holds • I3 = {n+1 | n  I2} since (3) is only reachable from (2) via the increment • I4= (I1 I3)  [16, ] since: • control from (1) and (3) flow into (4) • point (4) is reached only if (i < 16) holds

Interval iteration

Jacobi versus Gauss-Seidel iteration • With Jacobi, the new vector I1’,I2’,I3’,I4’ of intervals is calculated from the old I1,I2,I3,I4 • With Gauss-Seidel iteration: • I1’ is calculated from I1,I2,I3,I4 • I2’ is calculated from I1’,I2,I3,I4 • I3’ is calculated from I1’,I2’,I3,I4 • I4’ is calculated from I1’,I2’,I3’,I4

Gauss-Seidel versus chaotic iteration • Observe that I4 might change if either I1 or I3 change, hence evaluate I4 after I1 and I3 stabilise • Suggests that wait until stability is achieved at one level before starting on the next I1 I2 {I1} {I4} I4 I3 {I2, I3}

Gauss-Seidel versus chaotic iteration • Chaotic iteration can postpone evaluating Ii for bounded number of iterations: • I1’ is calculated from I1,-,-,- • I2’ and I3’ are calculated Gauss-Seidel style from I1,I2,I3,- • I4’ is calculated from I1’,I2’,I3’,I4 • Fast and (incremental) fixpoint solvers [TOPLAS 22(2):187-223,2000] apply chaotic iteration

Research challenge • Compiling to equations and iteration is well-understood (albeit not well-known) • The implicit assumption is that source is available • With the advent of component and multi-linguistic programming, the problem is how to generate the equations from: • A specification of the algorithm or the API; • The types of the algorithm or component • In the interim, environments with support for modularity either: • Equip the programmer with an equation language • Or make worst-case assumptions about behaviour

Suppose i was decremented rather than incremented begin i := 0; {1: i[0,0]} while (i < 16) do {2: i[-,0]} i := i -1 {3: i[-,-1]} end {4: i} • I1= [0,0] • I2= (I1 I3)  [-, 15] • I3 = {n-1 | n  I2} • I4= (I1 I3)  [16, ]

Ascending chain condition • A domain D is ACC iff it does not contain an infinite strictly increasing chain d1<d2<d3<… where d<d’ iff dd’ and dd’ (see below) • The interval domain D is ordered by: •  d forall dD and • [l1,u1]  [l2,u2] iff l2l1u1u2 and is not ACC since [0,0]<[-1,0]<[-2,0]<… T … -4 –3 –2 –1 0 1 2 3 4 … 

Some very expressive relational domains are ACC • The sub-expression elimination relies on detecting duplicated expression evaluation • Karr [Acta Informatica, 6, 133-151] noticed that detecting an invariance such as y = x/2 – 7 was key to this optimisation begin x := sin(a) * 2; y := sin(a) – 7; end

The affine domain • The domain of affine equations over n variables is: • D = {A,B|A is mn dimensional matrix and B is m dimensional column vector} • D is ordered by: • A1,B1A2,B2 iff (if A1x=B1 then A2x=B2)

Pre-orders versus posets • A pre-order D,  is a set D ordered by a binary relation  such that: • If dd for all dD • If d1d2 and d2d3 then d1d3 • A poset is pre-order D,  such that: • If d1d2 and d2d3 then d1d3

The affine domain is a pre-order (so it is not ACC) • Observe A1,B1A2,B2 but A2,B2A1,B1 A1= B1= A2= B2= • To build a poset from a pre-order • define dd’ iff dd’ and d’d • define [d] = {d’D|dd’} and D = {[d]|dD} • define [d]  [d’]iff dd’ • The poset D,  is ACC since chain length is bounded by the number of variables n

Inducing termination for non-ACC (and huge ACC) domains • Enforce convergence for intervals with a widening operator :DD  D • d = d • d = d • [l1,u1]  [l2,u2] = [if l2<l1 then - else l1, if u1<u2 then  else u1] • Examples • [1,2][1,2] = [1,2] • [1,2][1,3] = [1,] but [1,3][1,2] = [1,3] • Safe since [li,ui]([l1,u1][l2,u2]) for i{1,2}

Chaotic iteration with widening • To terminate it is necessary to traverse each loop a finite number of times • It is sufficient to pass through I2 or I3 a finite number of times [Bourdoncle, 1990] • Thus widen at I3 since it is simpler I1 I2 I4 I3

Termination for the decrement • I1= [0,0] • I2= (I1 I3)  [-, 15] • I3 = I3{n-1 | n  I2} note the fix • I4= (I1 I3)  [16, ] • When I2 = [-1,0] and I3 = [-1,0], then I3{n+1 | n  I2} = [-1,0]  [-2,-1] = [-,0]

Widening dynamic data-structures cons cons cons or or 0 nil or or 0 1 nil cons begin i := 0; p := nil; while (i < 16) do i := i +1 p := new cons(i, p); {1:pcons(i, …cons(0,nil))} end 0 1 2 nil cons 0 nil or or 0 1 nil cons 0 nil

Depth-2 versus type-graph widening cons cons or or or or 0 1 2 nil cons 0 1 2 nil any any • Type-graph widening is more compact • Type-graph widening becomes difficult when a list contains lists as its elements • In constraint-based analysis, widening is dispensed with altogether

(Malicious) research challenge • Read a survey paper to find an abstract domain that is ACC but has a maximal chain length of O(2n) • Construct a program with O(n) symbols that iterates through all O(2n) abstractions • Publish the program in IPL

Not all numeric domains are convex • A set SRn is convex iff for all x,yS it follows that {x + (1-)y | 01}  S • The 2 leftmost sets in R2 are convex but the 2 rightmost sets are not.

Are intervals or affine equations convex? • Suppose the values of n variables are represented by n intervals [l1,u1],…,[ln,un] • Suppose x=x1,…,xn, y=y1,…,ynRnare described by the intervals • Then each lixiui and each liyiuiu • Let 01 and observe z = x + (1-)y = x1 + (1-)y1, …, xn + (1-)yn • Therefore limin(xi, yi) xi + (1-)yi max(xi, yi)ui and convexity follows

Arithmetic congruences are not convex • Elements of the arithmetic congruence (AC) domain take the form x – 2y = 1 (mod 3) which describes integral values of x and y • More exactly, the AC domain consists of conjunctions of equations of the form c1x1+…+cmxm = (c mod n) where ci,cZ and nN • Incredibly AC is ACC [IJCM, 30, 165--190, 1989]

Research challenge • Søndergaard [FSTTCS,95] introduced the concept of an immediate fixpoint • Consider the following (groundness) dependency equations over the domain of Boolean functions Bool, ,  • f1 = x  (y  z) • f2 = t(x(z(u  (tx)  v  (tz)  f4))) • f3 = u (v(x  u  z  v  f2)) • f4 = f1 f3 • Where x(f) = f[x true]f[x false] thus x(xy) = true and x(xy) = y

The alternative tactic • The standard tactic is to apply iteration: • Søndergaard found that the system can be solved symbolically (like a quadratic) • This would be very useful for infinite domains for improved precision and predictability

Combining analyses • Verifiers and optimisers are often multi-pass, built from several separate analyses • Should the analyses be performed in parallel or in sequence? • Analyses can interact to improve one another (problem is in the complexity of the interaction [Pratt])

Pruning combined domains • Suppose that 1 D1C and 2D2C, then how is D=D1D2 interpreted? • Then d1,d2c iff d11c d22c • Ideally, many d1,d2D will be redundant, that is, cC . c1d1c2d2

Time versus precision from TOPLAS 17(1):28--44,1993

The Galois framework Abstract interpretation is often presented in terms of Galois connections

Lattices – a prelude to Galois connections • Suppose S,  is a poset • A mapping :SSS is a join (least upper bound) iff • ab is an upper bound of a and b, that is, aab and bab for all a,bS • ab is the least upper bound, that is, if cS is an upper bound of a and b, then abc • The definition of the meet :SSS (the greatest lower bound) is analogous

Complete lattices • A lattice S, , ,  is a poset S,  equipped with a join  and a meet  • The join concept can often be lifted to sets by defining :(S)S iff • t(T) for all TS and for all tT • if ts for all tT then (T)s • If meet can often be lifted analogously, then the lattice is complete • A lattice that contains a finite number of elements is always complete

A lattice that is not complete • A hyperplane in 2-d space in a line and in 3-d space is a plane • A hyperplane in Rn is any space that can be defined by {xRn | c1x1+…+cnxn = c} where c1,…,cn,cR • A halfspace in Rn is any space that can be defined by {xRn | c1x1+…+cnxn c} • A polyhedron is the intersection of a finite number of half-spaces

Examples and non-examples in planar space

Join for polyhedra • Join of polyhedra P1 and P2 in Rn coincides (with the topological closure) of the convex hull of P1P2

The “join” of an infinite set of polyhedra • Consider the following infinite chain of regular polyhedra: • The only space that contains all these polyhedra is a circle yet this is not polyhedral

Introduction to Abstract Interpretation

Introduction to Abstract Interpretation

Presentation Transcript

Abstract Interpretation and Predicate Abstraction

Basic abstract interpretation theory

Practical verification with abstract interpretation

From Program slicing to Abstract Interpretation

Sparse Abstract Interpretation

Static Analysis with Abstract Interpretation

An Introduction to Abstract Argumentation

Iterative Program Analysis Abstract Interpretation

Iterative Program Analysis Abstract Interpretation

Eliminating Stack Overflow by Abstract Interpretation

Abstract/Introduction

Introduction to Aerial Photography Interpretation

Abstract interpretation

Abstract / Introduction

Iterative Program Analysis Abstract Interpretation

ABSTRACT Introduction:

Purity Analysis : Abstract Interpretation Formulation

Abstraction and Abstract Interpretation

Statutory Interpretation - Introduction

Logical Abstract Interpretation

Introduction to abstract

Abstract Interpretation and Predicate Abstraction