Analysis of Software

Analysis of Software Eric Feron 6.242 From "Semantic Foundations of Program Analysis" by P. Cousot in "Program Flow Analysis Theory and Applications" Muchnik & Jones Eds. 1981 Prentice Hall

Main message • Traditional dynamical systems analysis tools can apply to certain aspects of software analysis, incl. run-time errors. • Most characteristics (eg overflow errors) cannot be detected using straight program and variables: Too many computations or computations are not even conceivable. • Tractability can be achieved via use of abstractions. • Tractability can be achieved via use of overbounding invariant sets.

[1] while x> 1000 do [2] x:=x+y ; [3] od; [4] (x,y) in I=[-b-1;b] integer b is overflow limit. Prototype program • Program characteristics: • Program terminates without error iff • (x0<1000) v (y0<0). • Execution never terminates iff • (1000<x0<b)(y0=0). • Execution leads to run-time error (by • overflow) iff (x0>1000)(y0>0). • These are desirable characteristics to be found

Programs are single-entry, single exit directed graphs Edges labeled with instructions. Program graph: <V,e,w,E> V finite set of vertices E finite set of edges e, w entry and exit vertices. Variables live in universe U. Ia(U): assignments. v=f(v) from U to U. It(U) are tests, ie are maps from U to B={true,false} Program a triple <G,U,L>. G is program graph, U isuniverse, and L is edge labeling with instructions. Graph representations of programs 1 if x<1000 if x> 1000 2 4 <x,y> < <x+y,y> if x>1000 if x<1000 3

Programs as dynamical systems • States Set S of states is set of pairs <c,m> with c in V {x} defined as control state. m in U is the memory state. x is the error control state. • State transition function Program p= (G,U,L) defines state transition function t as follows: • t(<x,m>) = <x,m> (can't recover from run-time error) • t(<w,m>) = <w,m> (once done, we’re done) • If c1in V has out-degree 1, <c1,c2> in E, L(<c1,c2>) = f, f in Ia(U) then if m is in dom(f) then t(<c1,m>) = <c2,f(m)> else t(<c1,m>) = <x, m>. • If c1in V has out-degree 2, <c1,c2> in E, <c1,c3> in E,L(<c1,c2>)=p, L(<c1,c2>)=¬p, with p in It(U) then if m is not in dom(p) then t(<c1,m>) = <x, m>, else if p(m) then t(<c1,m>) = <c2, m> else t(<c1,m>) = <c3, m>. • State transition relation: It's the graph of the state transition function (a boolean function over SxS)

Programs as dynamical systems (ct'd) • Transitive closure of binary relation: assume a,b  (SxS  B) are two binary relations on S. Their product ab is defined as { s3S : a(s1,s3) b(s3,s2)} So we can talk about the n-extension anof a. The transitive closure of a is then a* = l<s1,s2>.[n> 0: an(s1,s2)]

Example of complete lattice Set L of subsets of states in a state-space S: Partial order is traditional inclusion H = {H1,H2}  L H1 U H2 is the least upper bound for H. H1  H2 is the greatest lower bound for H. Obviously these exist for any H. L has an infimum: The empty set L has a supremum: S. H2 H1

Abstracting state spaces  {Set of all subsets of signed integer numbers between -b-1 and b} + - if x = T then x is any value if x = + then 0<x<b if x = 0 then x = 0 if x= - then -b-1<x<0 if x =  then  0  Rules: + + + = +; + + - = T - - + = -; -*- = +; … Effect: Go from huge state-space decompositions to finite and simple state-space decomposition

Abstracting state spaces {Set of all ellipsoids in Rn + Ø + Rn} {Set of all subsets of Rn} Operations are (conservative) union of ellispoids, intersections of ellispoids, sums of ellipsoids. The job itself is most often nonconvex. Usually relaxed based on convex optimization. Operations are traditional union/intersections/sums and differences What a mess….

Lattices of ellipsoids • Set Ell of ellipsoids centered around zero for simplification. • Partial order on ellipsoids: Set inclusion (that's a classic), and volume. • Ellipsoid theorems: H finite set of p ellipsoids (E1, …., Ep) characterized by Ei={x | xTPix < 1} • Minimum volume ellipsoid h containing H exists and is computed as follows: if p = 0 then h = . if p>0 then h = {x | xTPx < 1} where P = argmin log det (P-1) s.t. P< Pi , i=1,…,p • Maximum volume ellipsoid contained H also exists and is computable. Ell is a complete lattice then.

Rules of operations with ellipsoids (centered around zero) Ellipsoid given by {x | xTPx < 1} • Finding an ellipsoidal lowest upper bound Ell(K) on any set K of data in Rn: Assume set is described by finite list of points (xi, i=1, …, p). If p=0 then Ell(K)=. If p>0 then Ell(K)= argmin log det P-1 Subject to xiTPxi < 1 • Finding an approximate ellipsoidal lowest upper bound E3 on the sum of two ellipsoids E1 and E2 (characterized by P1 and P2 ) is a convex, semidefinite program that goes like where < is to be understood in the sense of P.D. matrices

Reasoning with abstractions [1] while x> 1000 do [2] x:=x+y ; [3] od; [4] (x,y) in I=[-b-1;b] integer <x1 ,y1 > = f <x2 ,y2 >=smash(<if (x1x3 >  - then  else + fi, y1y3 ) <x3 ,y3 >=smash(<x2 +y2 , y2 >) <x4 ,y4 >=smash(x1x3 ,y1y3 ) <xx ,yx >=if (x20)(y20)((x2= + )(y2= - )) ((x2= - ) (y2= + )) then <,> else <x2 ,y2> fi Start iterating with: all states at  and f = <+,->. In steady state, reach in a few iterations: <x1 ,y1 > = f, <x2 ,y2 > = <+,->, <x3 ,y3 > = <T,->, <x4 ,y4 > = <T,->, <xx ,yx >=<,>. Thus if x>0 and y< 0 then no overflow can occur.

Ellispoidal reachability analysis: one example the "star norm" Question: For which values of yedoes the state x not overflow? The exact answer involves computing the || . ||1 norm of the system (A,B,I). This norm is not easy to derive analytically. Consider the program x:= 0 %An integer vector n:=0 while n< 1000, x := Ax +Bue%A is a matrix n:= n+1 end; xw:=x ue is exogenous, bounded input, changes at each iteration.

Choice of abstractions New lattice for system abstractions Set of ellipsoids centered around zero Abstract interpretation x:= E0 n:=0 while n< 1000, x := (A Ell(x) + B Ell(ue)) n:= n+1 end; xw:=x There remains only to check that the ellipsoid xw is within bounds.

Abstractions for other applicationsAbstracted constrained optimization (Williams) • Consider the nonlinear optimization problem: • Minimize f(x) • Subject to gi(x) <0 xRn • Kuhn-Tucker conditions (assume differentiability of function, constraints and qualification of all these constraints) • li>0 such that at optimum x*, d/dx (f(x*)+ li gi(x)) = 0 Approximate analysis of optimization problems: Abstraction of y R: y {-,+, 0,}.

Abstractions for other applicationsAbstracted constrained optimization (Williams) Abstracted Kuhn-Tucker conditions li>0 such that at optimum x*, d/dx (f(x*)+ li gi(x)) = 0 Approximate analysis of optimization problems: Abstraction of y R: y {-,+, 0,}.

Analysis of Software