680 likes | 872 Views
Introduction to Abstract Interpretation. Neil Kettle, Andy King and Axel Simon a.m.king@kent.ac.uk http://www.cs.kent.ac.uk/~amk Acknowledgments: much of this material has been adapted from surveys by Patrick and Radia Cousot. Applications of abstract interpretation.
E N D
Introduction to Abstract Interpretation Neil Kettle, Andy King and Axel Simon a.m.king@kent.ac.uk http://www.cs.kent.ac.uk/~amk Acknowledgments: much of this material has been adapted from surveys by Patrick and Radia Cousot
Applications of abstract interpretation • Verification: can a concurrent program deadlock? Is termination assured? • Parallelisation: are two or more tasks independent? What is the worst/base-case running time of function? • Transformation: can a definition be unfolded? Will unfolding terminate? • Implementation: can an operation be specialised with knowledge of its (global) calling context? • Applications and “players” are incredibly diverse
Computing Lab Xmas Party • Located in Origins – the “restaurant” in Darwin • A buffer lunch will be served – courtesy of the department • Department will supply some wine (which last year lasted 10 minutes) • Bar will be open afterwards if some wine is not enough wine • Send an e-mail to Deborah Sowrey [D.J.Sowery@kent.ac.uk] if you want to attend • Come along and meet other post-grads
Casting out nines algorithm • Which of the following multiplications are correct: • 2173 38 = 81574 or • 2173 38 = 82574 • Casting out nines is a checking technique that is really a form of abstract interpretation: • Sum the digits in the multiplicand n1, multiplier n2 and the product n to obtain s1, s2 and s. • Divide s1, s2 and s by 9 to compute the remainder, that is, r1 = s1 mod 9, r2 = s2 mod 9 and r = s mod 9. • If (r1 r2) mod 9 r then multiplication is incorrect • The algorithm returns “incorrect” or “don’t know”
Running the numbers for 2173 38 = 81574 • Compute r1 = (2+1+7+3) mod 9 = … • Compute r2 = (3+8) mod 9 = … • Calculate (r1 r2) mod 9 = … • Calculate r = (8+1+5+7+4) mod 9 = … • Check ((r1 r2) mod 9 = r) = … • Deduce that 2173 38 = 81574 is …
Abstract interpretation is a theory of relationships • The computational domain for multiplication (concrete domain): • N – the set of non-negative integers • The computational domain of remainders used in the checking algorithm (abstract domain): • R = {0, 1, …, 8} • Key question is what is the relationship between an element nN which is used in the real algorithm and its analog rR in the check
What is the relationship? • When multiplicand is n1 = 456, say, then the check uses r1 = (4+5+6) mod 9 = 4 • Observe that • 456 mod 9 = • (4*100 + 56) mod 9 = • (4*90+ 4*10 + 56) mod 9 = • (4*10 + 56) mod 9 = • ((4 + 5)*10 + 6) mod 9 = • ((4 + 5)*9 + (4 + 5) + 6) mod 9 = • (4 + 5 + 6) mod 9 • More generally, induction can show r1= n1 mod 9 and r2 = n2 mod 9
Correctness is the preservation of relationships • The check simulates the concrete multiplication and, in effect, is an abstract multiplication • Concrete multiplication is n = n1 n2 • Abstract multiplication is r = (r1 r2) mod 9 • Where r1 describes n1 and r2 describes n2 • For brevity, write r n iff r = n mod 9 • Then abstract multiplication preserves iff whenever r1 n1 and r2 n2 it follows that r n
Correctness argument • Suppose r1 n1 and r2 n2 • If • n = n1 n2 then • n mod 9 = (n1 n2) mod 9 hence • n mod 9 = ((n1 mod 9) (n2 mod 9)) mod 9 whence • n mod 9 = (r1 r2) mod 9 = r therefore • r n • Consequently if (r n) then n n1 n2
Summary • Formalise the relationship between the data • Check that the relationship is preserved by the abstract analogues of the concrete operations • The relational framework [Acta Informatica, 30(2):103-129,1993] not only emphases the theory of relations but is very general
Numeric approximation and widening Abstract interpretation does not require a domain to be finite
Interval approximation • Consider the following Pascal-like program • SYNTOX [PLDI’90] inferred the invariants scoped within {…} • Invariants occur between consecutive lines in the program • i[0,15] asserts 0i15 whereas i[0,0] means i=0 begin i := 0; {1: i[0,0]} while (i < 16) do {2: i[0,15]} i := i + 1 {3: i[1,16]} end {4: i[16,16]}
Compilation versus (classic) interpretation • Abstract compilation – compile the concrete program into an abstract program (equation system) and execute the abstract program: • good separation of concerns that aids debugging • the particulars of the domain can be exploited to reorder operations, specialise operations, etc • Abstract interpretation – run the concrete program but on-the-fly interpret its concrete operations as abstract operations: • ideal for a generic framework (toolkit) which is parameterised by abstract domain plugins
Abstract domain that is used in interval analysis • Domain of intervals includes: • [l,u] where l u and l,u Z for bounded sets ie [0, 5]{0,1,4} since {0,1,4} [0, 5] • to represent the empty set of numbers, that is, • [l,] for sets which are bounded below such as {l,l+2,l+4,…} • [-,u] to represent sets which are bounded above such as {..,l-5,l-3,l}
Weakening intervals if … then … {1: i[0,2]} else … {2: i[3,5]} endif {3: i[0,5]} Join (path merge) is defined: • Put d1d2 = d1 if d2 = • d2 else if d1 = • [min(l1,l2), max(u1,u2)] otherwise • whenever d1 = [l1,u1] and d2 = [l2,u2]
Strengthening intervals Meet is defined: • Put d1d2 = if (d1 = ) (d2 = ) • [max(l1,l2), min(u1,u2)] otherwise • whenever d1 = [l1,u1] and d2 = [l2,u2] {3: i[0,5]} if (2 < i) then {4: i[3,5]} … else {5: i[0,2]} …
Meet and join are the basic primitives for compilation • I1= [0,0] since program point (1) immediately follows the i := 0 • I2= (I1 I3) [-, 15] since: • control from program points (1) and (3) flow into (2) • point (2) is reached only if i < 16 holds • I3 = {n+1 | n I2} since (3) is only reachable from (2) via the increment • I4= (I1 I3) [16, ] since: • control from (1) and (3) flow into (4) • point (4) is reached only if (i < 16) holds
Jacobi versus Gauss-Seidel iteration • With Jacobi, the new vector I1’,I2’,I3’,I4’ of intervals is calculated from the old I1,I2,I3,I4 • With Gauss-Seidel iteration: • I1’ is calculated from I1,I2,I3,I4 • I2’ is calculated from I1’,I2,I3,I4 • I3’ is calculated from I1’,I2’,I3,I4 • I4’ is calculated from I1’,I2’,I3’,I4
Gauss-Seidel versus chaotic iteration • Observe that I4 might change if either I1 or I3 change, hence evaluate I4 after I1 and I3 stabilise • Suggests that wait until stability is achieved at one level before starting on the next I1 I2 {I1} {I4} I4 I3 {I2, I3}
Gauss-Seidel versus chaotic iteration • Chaotic iteration can postpone evaluating Ii for bounded number of iterations: • I1’ is calculated from I1,-,-,- • I2’ and I3’ are calculated Gauss-Seidel style from I1,I2,I3,- • I4’ is calculated from I1’,I2’,I3’,I4 • Fast and (incremental) fixpoint solvers [TOPLAS 22(2):187-223,2000] apply chaotic iteration
Research challenge • Compiling to equations and iteration is well-understood (albeit not well-known) • The implicit assumption is that source is available • With the advent of component and multi-linguistic programming, the problem is how to generate the equations from: • A specification of the algorithm or the API; • The types of the algorithm or component • In the interim, environments with support for modularity either: • Equip the programmer with an equation language • Or make worst-case assumptions about behaviour
Suppose i was decremented rather than incremented begin i := 0; {1: i[0,0]} while (i < 16) do {2: i[-,0]} i := i -1 {3: i[-,-1]} end {4: i} • I1= [0,0] • I2= (I1 I3) [-, 15] • I3 = {n-1 | n I2} • I4= (I1 I3) [16, ]
Ascending chain condition • A domain D is ACC iff it does not contain an infinite strictly increasing chain d1<d2<d3<… where d<d’ iff dd’ and dd’ (see below) • The interval domain D is ordered by: • d forall dD and • [l1,u1] [l2,u2] iff l2l1u1u2 and is not ACC since [0,0]<[-1,0]<[-2,0]<… T … -4 –3 –2 –1 0 1 2 3 4 …
Some very expressive relational domains are ACC • The sub-expression elimination relies on detecting duplicated expression evaluation • Karr [Acta Informatica, 6, 133-151] noticed that detecting an invariance such as y = x/2 – 7 was key to this optimisation begin x := sin(a) * 2; y := sin(a) – 7; end
The affine domain • The domain of affine equations over n variables is: • D = {A,B|A is mn dimensional matrix and B is m dimensional column vector} • D is ordered by: • A1,B1A2,B2 iff (if A1x=B1 then A2x=B2)
Pre-orders versus posets • A pre-order D, is a set D ordered by a binary relation such that: • If dd for all dD • If d1d2 and d2d3 then d1d3 • A poset is pre-order D, such that: • If d1d2 and d2d3 then d1d3
The affine domain is a pre-order (so it is not ACC) • Observe A1,B1A2,B2 but A2,B2A1,B1 A1= B1= A2= B2= • To build a poset from a pre-order • define dd’ iff dd’ and d’d • define [d] = {d’D|dd’} and D = {[d]|dD} • define [d] [d’]iff dd’ • The poset D, is ACC since chain length is bounded by the number of variables n
Inducing termination for non-ACC (and huge ACC) domains • Enforce convergence for intervals with a widening operator :DD D • d = d • d = d • [l1,u1] [l2,u2] = [if l2<l1 then - else l1, if u1<u2 then else u1] • Examples • [1,2][1,2] = [1,2] • [1,2][1,3] = [1,] but [1,3][1,2] = [1,3] • Safe since [li,ui]([l1,u1][l2,u2]) for i{1,2}
Chaotic iteration with widening • To terminate it is necessary to traverse each loop a finite number of times • It is sufficient to pass through I2 or I3 a finite number of times [Bourdoncle, 1990] • Thus widen at I3 since it is simpler I1 I2 I4 I3
Termination for the decrement • I1= [0,0] • I2= (I1 I3) [-, 15] • I3 = I3{n-1 | n I2} note the fix • I4= (I1 I3) [16, ] • When I2 = [-1,0] and I3 = [-1,0], then I3{n+1 | n I2} = [-1,0] [-2,-1] = [-,0]
Widening dynamic data-structures cons cons cons or or 0 nil or or 0 1 nil cons begin i := 0; p := nil; while (i < 16) do i := i +1 p := new cons(i, p); {1:pcons(i, …cons(0,nil))} end 0 1 2 nil cons 0 nil or or 0 1 nil cons 0 nil
Depth-2 versus type-graph widening cons cons or or or or 0 1 2 nil cons 0 1 2 nil any any • Type-graph widening is more compact • Type-graph widening becomes difficult when a list contains lists as its elements • In constraint-based analysis, widening is dispensed with altogether
(Malicious) research challenge • Read a survey paper to find an abstract domain that is ACC but has a maximal chain length of O(2n) • Construct a program with O(n) symbols that iterates through all O(2n) abstractions • Publish the program in IPL
Not all numeric domains are convex • A set SRn is convex iff for all x,yS it follows that {x + (1-)y | 01} S • The 2 leftmost sets in R2 are convex but the 2 rightmost sets are not.
Are intervals or affine equations convex? • Suppose the values of n variables are represented by n intervals [l1,u1],…,[ln,un] • Suppose x=x1,…,xn, y=y1,…,ynRnare described by the intervals • Then each lixiui and each liyiuiu • Let 01 and observe z = x + (1-)y = x1 + (1-)y1, …, xn + (1-)yn • Therefore limin(xi, yi) xi + (1-)yi max(xi, yi)ui and convexity follows
Arithmetic congruences are not convex • Elements of the arithmetic congruence (AC) domain take the form x – 2y = 1 (mod 3) which describes integral values of x and y • More exactly, the AC domain consists of conjunctions of equations of the form c1x1+…+cmxm = (c mod n) where ci,cZ and nN • Incredibly AC is ACC [IJCM, 30, 165--190, 1989]
Research challenge • Søndergaard [FSTTCS,95] introduced the concept of an immediate fixpoint • Consider the following (groundness) dependency equations over the domain of Boolean functions Bool, , • f1 = x (y z) • f2 = t(x(z(u (tx) v (tz) f4))) • f3 = u (v(x u z v f2)) • f4 = f1 f3 • Where x(f) = f[x true]f[x false] thus x(xy) = true and x(xy) = y
The alternative tactic • The standard tactic is to apply iteration: • Søndergaard found that the system can be solved symbolically (like a quadratic) • This would be very useful for infinite domains for improved precision and predictability
Combining analyses • Verifiers and optimisers are often multi-pass, built from several separate analyses • Should the analyses be performed in parallel or in sequence? • Analyses can interact to improve one another (problem is in the complexity of the interaction [Pratt])
Pruning combined domains • Suppose that 1 D1C and 2D2C, then how is D=D1D2 interpreted? • Then d1,d2c iff d11c d22c • Ideally, many d1,d2D will be redundant, that is, cC . c1d1c2d2
The Galois framework Abstract interpretation is often presented in terms of Galois connections
Lattices – a prelude to Galois connections • Suppose S, is a poset • A mapping :SSS is a join (least upper bound) iff • ab is an upper bound of a and b, that is, aab and bab for all a,bS • ab is the least upper bound, that is, if cS is an upper bound of a and b, then abc • The definition of the meet :SSS (the greatest lower bound) is analogous
Complete lattices • A lattice S, , , is a poset S, equipped with a join and a meet • The join concept can often be lifted to sets by defining :(S)S iff • t(T) for all TS and for all tT • if ts for all tT then (T)s • If meet can often be lifted analogously, then the lattice is complete • A lattice that contains a finite number of elements is always complete
A lattice that is not complete • A hyperplane in 2-d space in a line and in 3-d space is a plane • A hyperplane in Rn is any space that can be defined by {xRn | c1x1+…+cnxn = c} where c1,…,cn,cR • A halfspace in Rn is any space that can be defined by {xRn | c1x1+…+cnxn c} • A polyhedron is the intersection of a finite number of half-spaces
Join for polyhedra • Join of polyhedra P1 and P2 in Rn coincides (with the topological closure) of the convex hull of P1P2
The “join” of an infinite set of polyhedra • Consider the following infinite chain of regular polyhedra: • The only space that contains all these polyhedra is a circle yet this is not polyhedral