Abstract interpretation. Giorgio Levi Dipartimento di Informatica, Università di Pisa levi@di.unipi.it http://www.di.unipi.it/~levi/levi.html. The general idea. a semantics any definition style, from a denotational definition to a detailed interpreter
The general idea • a semantics • any definition style, from a denotational definition to a detailed interpreter assigning meanings to programs on a suitable concrete domain (concrete computations domain) • an abstract domain modeling some properties of concrete computations and forgetting about the remaining information (abstract computations domain) • we derive an abstract semantics, which allows us to “execute” the program on the abstract domain to compute its abstract meaning, i.e., the modeled property
Concrete and Abstract Domains • two complete partial orders • the partial orders reflect precision • smaller is better • concrete domain (Ã(C),,{},C, È, Ç) • has the structure of a powerset • we will see later why • abstract domain(A,,bottom,top, lub, glb) • each abstract value is a description of “a set of” concrete values
Concretization • concrete domain (Ã(C),,{},C, È, Ç) • abstract domain(A, , bottom,top, lub, glb) • the meaning of abstract values is defined by a concretization function :AÃ(C) aA, (a) is the set of concrete computations described by a • that’s why the concrete domain needs to be a powerset • the concretization function must be monotonic a1,a2 A, a1 a2 implies (a1) (a2) • concretization preserves relative precision
Abstraction • concrete domain (Ã(C),,{},C, È, Ç) • abstract domain(A, , bottom,top, lub, glb) • every element of Ã(C) should have a unique “best” (most precise) description in A • this is possible if and only if A is a Moore family • closed under glb • in such a case, we can define an abstraction function a:Ã(C) A cÃ(C), a(c) is the best abstract description of c • the abstraction function must be monotonic c1,c2 Ã(C),c1 c2 implies a(c1) a(c2) • abstraction preserves relative precision
Galois connection (Ã(C),,{},C, È , Ç) (A, , bottom,top, lub, glb) :AÃ(C)(concretization) a:Ã(C) A (abstraction) , monotonic • there may be loss of information (approximation) in describing an element of Ã(C)by an element of A • Galois connection (insertion) x Ã(C). x ((x)) y A. ((y)) y (y A. ((y)) =y) , mutually determine each other
Concrete semantics • the concrete semantics is defined as the least or (greatest) fixpont of a concrete semantic evaluation function F defined on the domain C • this does not necessarily mean that the semantic definition style is denotational! • F is defined in terms of primitive semantic operations fionC • the abstract semantic evaluation function is obtained by replacing in F each concrete operation fi by a suitable abstract operation • however, since the actual concrete domain is Ã(C), we need first to lift the concrete semantics lfpF to a collecting semantics defined onÃ(C)
Collecting semantics • liftinglfp F to the powerset (to get the collecting semantics) is simply a conceptual operation • collecting semantics = {lfp F} • we don’t need to define a brand new collecting semantic evaluation function Fc on Ã(C) • we just need to reason in terms of liftings of all the primitive operations (and ofF), while designing the abstract operations and establishing their properties • in the following, by abuse of notation, we will use the same notation for the standard and the collecting (“conceptually” lifted) operations
Abstract operations: local correctness • an abstract operator fidefined onAis locally correct wrt a concrete operator fiif x1,..,xn Ã(C). fi(x1,..,xn) (fi((x1),..,(xn))) • the concrete computation step is more precise than the concretization of the “corresponding” abstract computation step • a very weak requirement, which is satisfied, for example, by an abstract operator which always computes the worst abstract value top • the real issue in the design of abstract operations is therefore precision
Abstract operations: optimality and completeness • correctness x1,..,xn Ã(C). fi(x1,..,xn) (fi((x1),..,(xn))) • optimality y1,..,yn A. fi(y1,..,yn) = a(fi(g(y1),..,g(yn))) • the most precise abstract operator ficorrect wrt fi • a theoretical bound and basis for the design, rather then an implementable definition • completeness (exactness or absolute precision) x1,..,xn Ã(C). a(fi(x1,..,xn)) = fi((x1),..,(xn)) • no loss of information,the abstraction of the concrete computation step is exactly the same as the result of the corresponding abstract computation step
From local to global correctness • the composition of locally correct abstract operations is locally correct wrt the composition of concrete operations • composition does not preserve optimality, i.e., the composition of optimal operators may be less precise than the optimal abstract version of the composition • if we obtainF(abstract semantic evaluation function) by replacing inFevery concrete semantic operation by a corresponding (locally correct) abstract operation, the local correctness property still holds x Ã(C). F(x) (F((x))) • local correctness implies global correctness, i.e., correctness of the abstract semantics wrt the concrete one lfp F(lfpF)gfp F(gfpF) a(lfp F)lfpFa(gfp F)gfpF • the abstraction of the concrete semantics is more precise than the abstract semantics
lfpFcannot be computed in finitely many steps steps are in general required lfpFcan be computed in finitely many steps, if the abstract domain is finite or at leastnoetherian does not contain infinite increasing chains interesting for static program analysis, where the fixpoint computation must terminate most program properties considered in static analysis are undecidable we accept a loss of precision (safe approximation) in order to make the analysis feasible a(lfp F) lfp F:why computing lfpF?
comparative semantics a technique to reason about semantics at different level of abstraction non-noetherian abstract domain abstraction without approximation (completeness)(lfpF)=lfpF static analysis = effective computation of the abstract semantics if the abstract domain is noetherian and the abstract operations are computationally feasible if the abstract domain is non-noetherian or if the fixpoint computation is too complex use widening operators which effectively compute an (upper) approximation oflfpF one example later Applications
(Ã(C),,{},C, È, Ç)(concrete domain) (A,,bottom,top, lub, glb)(abstract domain) :AÃ(C)monotonic(concretization function) a:Ã(C) A monotonic(abstraction function) x Ã(C). x ((x)) y A. ((y)) y (Galois connection) fi $fi| x1,..,xn Ã(C). fi(x1,..,xn) (fi((x1),..,(xn))) (local correctness) critical choices the abstract domain to model the property the (possibly optimal) correct abstract operations The abstract interpretation framework
there exist weaker versions of abstract interpretation without Galois connections (e.g., concretization function only) based on approximation operators (widening, narrowing) without explicit abstract domain (closure operators) the theory provides also several results on abstract domain design how to combine domains how to improve the precision of a domain how to transform an abstract domain into a complete one …... we will look at some of these results in the last lecture Other approaches and extensions
concrete semantics executable specification (in ML) of the denotational semantics of untyped l-calculus without recursion abstract semantics abstract interpreter computing on the domain Sign A simple abstract interpreter computing Signs
The language: syntax • type ide = Id of string • type exp = | Eint of int | Var of ide | Times of exp * exp | Ifthenelse of exp * exp * exp |Fun of ide * exp |Appl of exp * exp
A program Fun(Id "x", Ifthenelse(Var (Id "x"), Times (Var (Id "x"), Var (Id "x")), Times (Var (Id "x"), Eint (-1)))) • the ML expression function x -> if x=0 then x * x else x * (-1)
Concrete semantics • denotational interpreter • eager semantics • separation from the main semantic evaluation function of the primitive operations • which will then be replaced by their abstract versions • abstraction of concrete values • identity function in the concrete semantics • symbolic “non-deterministic” semantics of the conditional
Semantic domains • type eval = | Funval of (eval -> eval) | Int of int | Wrong let alfa x = x • type env = ide -> eval let emptyenv (x: ide) = alfa(Wrong) let applyenv ((x: env), (y: ide)) = x y let bind ((r:env), (l:ide), (e:eval)) (lu:ide) = if lu = l then e else r(lu)
Semantic evaluation function • let rec sem (e:exp) (r:env) = match e with | Eint(n) -> alfa(Int(n)) | Var(i) -> applyenv(r,i) | Times(a,b) -> times ( (sem a r), (sem b r)) | Ifthenelse(a,b,c) -> let a1 = sem a r in (if valid(a1) then sem b r else (if unsatisfiable(a1) then sem c r else merge(a1,sem b r,sem c r))) | Fun(ii,aa) -> makefun(ii,aa,r) | Appl(a,b) -> applyfun(sem a r, sem b r)
Primitive operations let times (x,y) = match (x,y) with |(Int nx, Int ny) -> Int (nx * ny) | _ -> alfa(Wrong) let valid x = match x with |Int n -> n=0 let unsatisfiable x = match x with |Int n -> if n=0 then false else true let merge (a,b,c) = match a with |Int n -> if b=c then b else alfa(Wrong) | _ -> alfa(Wrong) let applyfun ((x:eval),(y:eval)) = match x with |Funval f -> f y | _ -> alfa(Wrong) let rec makefun(ii,aa,r) = Funval(function d -> if d = alfa(Wrong) then alfa(Wrong) else sem aa (bind(r,ii,d)))
From the concrete to the collecting semantics • the concrete semantic evaluation function • sem:exp -> env -> eval • the collecting semantic evaluation function • semc:exp -> env -> Ã(eval) • semc e r = {sem e r} • all the concrete primitive operations have to be lifted toÃ(eval) in the design of the abstract operations
Example of concrete evaluation # let esempio = sem( Fun (Id "x", Ifthenelse (Var (Id "x"), Times (Var (Id "x"), Var (Id "x")), Times (Var (Id "x"), Eint (-1)))) ) emptyenv;; val esempio : eval = Funval <fun> # applyfun(esempio,Int 0);; - : eval = Int 0 # applyfun(esempio,Int 1);; - : eval = Int -1 # applyfun(esempio,Int(-1));; - : eval = Int 1 • in the “virtual” collecting version applyfunc(esempio,{Int 0,Int 1}) = {Int 0, Int -1} applyfunc(esempio,{Int 0,Int -1}) = {Int 0, Int 1} applyfunc(esempio,{Int -1,Int 1}) = {Int 1, Int -1}
From the collecting to the abstract semantics • concrete domain: (Ã(ceval), ) • concrete (non-collecting) environment: • cenv = ide -> ceval • abstract domain:(eval, ) • abstract environment: env = ide -> eval • the collecting semantic evaluation function • semc:exp -> env -> Ã(ceval) • the abstract semantic evaluation function • sem:exp -> env -> eval
The Sign Abstract Domain • concrete domain(Ã(Z), ) sets of integers • abstract domain(Sign, )
Redefining eval for Sign type ceval = Funval of (ceval -> ceval) | Int of int | Wrong type eval = Afunval of (eval -> eval) | Top | Bottom | Zero | Zerop | Zerom | P | M let alfa x = match x with Wrong -> Top | Int n -> if n = 0 then Zero else if n > 0 then P else M • the partial order relation • the relation shown in the Sign lattice, extended with its lifting to functions • there exist no infinite increasing chains • we might add a recursive function construct and find a way to compute the abstract least fixpoint in a finite number of steps • lub and glb of eval are the obvious ones • concrete domain:(Ã(ceval),,{},ceval, È, Ç) • abstract domain:(eval, , Bottom, Top, lub, glb)
Concretization function • concrete domain:(Ã(ceval),,{},ceval, È, Ç) • abstract domain:(eval, , Bottom, Top, lub, glb) • gs(x) = {}, if x = Bottom {Int(y) |y>0},if x = P {Int(y) |y0},if x = Zerop {Int(0)},if x = Zero {Int(y)|y0},if x = Zerom {Int(y)|y<0},if x = M ceval,if x = Top {Funval(g) |y eval x gs(y), g(x) gs(f(y))}, if x = Afunval(f)
Abstraction function • concrete domain:(Ã(ceval),,{},ceval, È, Ç) • abstract domain:(eval, , Bottom, Top, lub, glb) • as(y) = glb{ Bottom,if y = {} M,if y {Int(z)| z<0} Zerom,if y {Int(z)| z0} Zero,if y ={Int(0)} Zerop,if y {Int(z)| z 0} P,if y {Int(z)| z>0} Top,if y ceval lub{Afunval(f)| Funval(g) gs(Afunval(f))}, if y {Funval(g)} & Funval(g) y}}
Galois connection • as and gs • are monotonic • define a Galois connection
Times Sign • optimal (hence correct) and complete (no approximation)
Abstract operations • in addition to times and lub let valid x = match x with | Zero -> true | _ -> false let unsatisfiable x = match x with | M -> true | P -> true | _ -> false let merge (a,b,c) = match a with | Afunval(_) -> Top | _ -> lub(b,c) let applyfun ((x:eval),(y:eval)) = match x with |Afunval f -> f y | _ -> alfa(Wrong) let rec makefun(ii,aa,r) = Afunval(function d -> if d = alfa(Wrong) then d else sem aa (bind(r,ii,d))) • sem is left unchanged
An example of abstract evaluation # let esempio = sem( Fun (Id "x", Ifthenelse (Var (Id "x"), Times (Var (Id "x"), Var (Id "x")), Times (Var (Id "x"), Eint (-1)))) ) emptyenv;; val esempio : eval = Afunval <fun> applyfunc(esempio,{Int 0,Int 1}) = {Int 0, Int -1} applyfunc(esempio,{Int 0,Int -1}) = {Int 0, Int 1} applyfunc(esempio,{Int -1,Int 1}) = {Int 1, Int -1} # applyfun(esempio,P);; - : eval = M # applyfun(esempio,Zero);; - : eval = Zero # applyfun(esempio,M);; - : eval = P # applyfun(esempio,Zerop);; - : eval = Top # applyfun(esempio,Zerom);; - : eval = Zerop # applyfun(esempio,Top);; - : eval = Top • wrt the abstraction of the concrete (collecting) semantics, approximation for Zerop • no abstract operations which “invent” the values Zerop and Zerom • which are the only ones on which the conditional takes both ways and can introduce approximation
Recursion • the language has no recursion • fixpoint computations are not needed • if (sets of) functions on the concrete domain are abstracted to functions on the abstract domain, we must be careful in the case of recursive definitions • a naïve solution might cause the application of a recursive abstract function to diverge, even if the domain is finite • we might never get rid of recursion because the guard in the conditional is not valid or satisfiable • we cannot explicitely compute the fixpoint, because equivalence on functions cannot be expressed • termination can only be obtained by a loop checking mechanism (finitely many different recursive calls) • we will see a different solution in a case where (sets of) functions are abstracted to non functional values • the explicit fixpoint computation will then be possible