Abstraction and Approximation via Abstract Interpretation:

a systematic approach to program analysis and verification Giorgio Levi Dipartimento di Informatica, Università di Pisa levi@di.unipi.it http://www.di.unipi.it/~levi.html Abstraction and Approximation via Abstract Interpretation: Abstract interpretation

two relevant concepts in several areas of computer science (and engineering) to reason about complex systems to make reasoning computationally feasible Abstraction and approximation Abstract interpretation

a 20-years old technique to systematically handle abstraction and approximation born to describe (and prove correct) static analyses (for imperative programs) popular mainly in declarative paradigms viewed today as a general technique to reason about semantics at different levels of abstraction successfully applied to distributed and mobile systems and to model checking recently applied to program verification Abstract Interpretation(Cousot & Cousot, POPL 77 & 79) Abstract interpretation

how abstract interpretation is often used in static program analysis a semantics an analysis algorithm developed by ad-hoc techniques the A.I. Theory (definition of an abstract domain) is used to prove that the algorithm is correct, i.e., that its results are an approximation of the property to be analyzed Abstract Interpretation, Semantics, Analysis Algorithms Abstract interpretation

the abstract interpretation I like a semantics an abstract domain designed to model the property to be analyzed the A.I. Theory is used to systematically derive the abstract semantics the analysis algorithm is exactly the computation of the abstract semantics and is correct by construction Abstract Interpretation, Semantics, Analysis Algorithms Abstract interpretation

concrete and abstract domain the Galois insertion abstract operations from the concrete to the abstract semantics Abstract InterpretationTheory in 4 Steps Abstract interpretation

two complete partial orders the partial orders reflect precision smaller is better concrete domain (C, ) C has the structure of a powerset abstract domain(A, ) each abstract value is a description of “a set of” concrete values Concrete and Abstract Domains Abstract interpretation

The Sign Abstract Domain Abstract Domain • concrete domain (P(Z), ) • sets of integers • abstract domain(Sign, )

Galois insertions Galois insertion (C, ), (A, ) : A C (concretization) :C A (abstraction)  , monotonic xC. x ((x)) yA. ((y)) = y  , mutually determine each other

The sign example Galois insertion sign(x) • , if x= bot • {y|y>0}, if x= + • {y|y0}, if x= 0+ • {0}, if x= 0 • {y|y0}, if x= 0- • {y|y<0}, if x= - • Z, if x= top sign (y) = glb of • bot , if y=  • - , if y {y|y<0} • 0- , if y {y|y0} • 0 , if y ={0} • 0+ , if y {y|y 0} • + , if y {y|y>0} • top , if y Z

the concrete semantic evaluation function is defined in terms ofprimitive semantic operations fion C for each fiwe need to provide a corresponding fidefined on A fiamust belocally correct, i.e.x1,..,xnC. fi(x1,..,xn) (fi((x1),..,(xn))) theoptimal (most precise) abstract operator isfi(y1,..,yn)= (fi((y1),.., (yn))) theoperator is complete (precise) ifx1,..,xnC. (fi(x1,..,xn)) =fi((x1),.., (xn))) Abstract Operations Abstract operations

Times Sign Abstract operations

Plus Sign Abstract operations

Times and Plus are the usual operations lifted to P(Z) both Timessign and Plussign are optimal (hence correct) Timessign is also complete (no approximation) Plussign is necessarily incomplete sign(Times({2},{-3})) = Timessign(sign({2}),sign({-3})) sign(Plus({2},{-3}))  Plussign(sign({2}),sign({-3})) The Sign example Abstract operations

F = concrete semantic evaluation function if we start from a standard semantic definition, the lifting to the powerset (collecting semantics) is simply a conceptual operation lfp F = concrete semantics F= abstract semantic evaluation function obtained by replacing in F every concrete semantic operation by a corresponding (locally correct) abstract operation lfp F = abstract semantics global correctness(lfp F) £lfp F the abstract semantics is less precise than the abstraction of the concrete semantics The Abstract Semantics Abstract semantics

incomplete abstract operations more execution paths in the abstract control flow the abstract state has not enough information to make deterministic choices conditionals, pattern matching, etc. the set of resulting abstract states is turned into a single abstract state, by performing an abstract lub operation Where does the approximation come from? Abstract semantics

concrete state [x={3}] if x>2 then y:=3 else y:=-5; concrete state [x={3}, y={3}] abstract state [x=+] if x>2 then y:=3 else y:=-5; the abstract guard “can be both true and false” both paths need to be abstractly evaluated the two resulting abstract states are merged by performing a lub in Sign abstract state [x=+,y=top] Approximation in abstract Sign computations Abstract semantics

lfp Fcannot be computed in finitely many steps steps are in general required lfp Fcan be computed in finitely many steps, if the abstract domain isfiniteor at least noetherian no infinite increasing chains static analysis 1 noetherian abstract domain termination, approximation static analysis 2 non-noetherian domain termination via widening further approximation comparative semantics non-noetherian domain abstraction without approximation (completeness)(lfp F) =lfp F a(lfp F) lfp Fwhy computing lfp F? Abstract interpretation

abstract domain and Galois connection to model the property (possibly optimal) correct abstract operations F the analysis is the computation of lfp F if the abstract domain is non-noetherian, or if the complexity oflfp Fis too high use awideningoperator which effectively computes an (upper) approximation oflfp F one example later Static Analysis Static Analysis

none of the two fixpoints is finitely computable useful to reason about different semantics and to systematically derive more abstract semantics choice of the most adequate reference semantics for analysis and verification Fis less expensive thanFin computing the observable property modeled by no junk hierarchy of transition systems semantics(P. Cousot, MFPS 97) trace, big-step operational, denotational, relational, predicate transformer, axiomatic, etc. systematic reconstruction of several fixpoint (TP-like) semantics for (positive) logic programs(Comini, Levi & Meo, Info. & Comp. 00) applied in Pisa also to finite failure & infinite computations, CLP, CCP, Prolog, -Prolog, sequent calculi Comparative Semantics (lfp F) =lfp F Semantics

the ad-hoc solution Milner’s algorithm, specified by a set of inference rules an elegant, well-understood, universally accepted semantic formalization the systematic derivation via abstract interpretation provides a better insight shows how to improve precision inference rules mimic the concrete semantics in the structure of the semantic evaluation function in the semantic domains (environment) semantics to well-typed programs only introduces approximation if true then 2 else false the most general polymorphic type for recursive functions is not computable the inferred type may not be the most general some type-correct programs cannot be typed Polymorphic type inference in ML-like functional languages Static analysis

abstract values = pairs of a term (with variables) type expression a constraint (on variables) set of term equalities in solved form partial order (on terms only) top is “no type” bottom is “any type” t1  t2, if t2 is an instance of t1 the domain is non-noetherian there exist infinite increasing chains an optimal abstract operation +((t1,c1),(t2,c2)) = (int, c1c2 {t1=int ,t2=int}) abstracting functional values the concrete semanticsEx.e r = v. E e (bind  x v) the abstract valuelet v1 = newvar() in let (v2,c2) = E e (bind  x (v1,{})) in (v1c2 -> v2,c2) Polymorphic type inference via Abstract Interpretation Type inference

the abstraction of recursive functions is similar to the one of regular functions, but a fixpoint computation is required the first approximation of the abstract value of the function is bottom since the abstract domain is non-noetherian the fixpoint computation may diverge the solution in Milner’s algorithm take the results of the first two iterations and compute their lub (most general common instantiation, computed through unification) if the lub is top (unification fails), the program is not typable (type error) this is exactly a widening operator, which returns a (correct) upper approximation of the lfp (Furiesi, Master Thesis Pisa. 99) Recursion and Widening Type inference

straightforward! perform at most k iterations of the fixpoint computation if we reach a fixpoint, it is the most general type otherwise, we apply Milner’s widening to the last two results we succeed in typing more functions we get more precise types one example (due to Cousot) CaML # let rec f f1 g n x = if n=0 then (g x) else (((((f f1)(fun x -> (fun h -> (g(h x)))))(n - 1))(x))(f1));; This expression has type ('a -> 'a) -> 'b but is here used with type 'b our answer (the fixpoint is reached in 3 iterations) val f : ('a -> 'a) -> ('a -> 'b) -> int -> 'a -> 'b = <fun> How to improve precision Type inference

Patrick Cousot has reconstructed a hierarchy of type systems for ML-like languages by using abstract interpretation(Cousot, POPL 97) type systems have been proposed to cope with other static analyses (strictness, various properties related to security) type systems need to be proved correct wrt a semantics abstract semantics are systematically derived from the semantics and are correct by construction two related open interesting problems comparison of the two approaches from the viewpoint of expressive power and analysis precision (and complexity) definition of methods to automatically translate formalizations from one approach to the other Abstract Interpretation vs. Type Systems Abstract Interpretation

abstract Interpretation is very popular in logic languages the computational model has several opportunities for optimization, based on analysis results it is (relatively) easy to define, because the standard semantics is collecting and the concrete domain (sets of substitutions) is quite simple several important properties (groundness, freeness, sharing, depth(k)) for some properties (i.e., groundness and sharing) a lot of different abstract domains techniques to compare the relative precision of abstract domains important results on techniques for the systematic design of abstract domains, which can probably be applied to other paradigms as well abstract compilation in CLP(Giacobazzi, Debray & Levi, JLP 95) the program is transformed by syntactically replacing concrete constraints by abstract constraints the abstract computation is a standard CLP computation on a different constraint system Static Analysis of Logic Programs Static analysis

CLP version concrete domain (P(Eqns),), sets of sets of term equations in solved form concrete semantics the CLP version of the s-semantics (answer constraints) 3 abstract domains G: the property of being ground DEF: functional groundness dependencies POS: DEF + some disjunctive information lattices shown in the 2-variables case Groundness in Logic Programs Groundness analysis

the program p(X,Y) :- X=a. p(X,Y) :- Y=b. q(X,Y) :- X=Y. r(X,Y) :- p(X,Y),q(X,Y). the concrete semantics p(X,Y) -> {{X=a},{Y=b}} q(X,Y) -> {{X=Y}} r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}} An example Groundness analysis • in the concrete semantics ofr • both the arguments are bound to ground terms (in all the answer constraints)

the program p(X,Y) :- X=a. p(X,Y) :- Y=b. q(X,Y) :- X=Y. r(X,Y) :- p(X,Y),q(X,Y). the concrete semantics p(X,Y) -> {{X=a},{Y=b}} q(X,Y) -> {{X=Y}} r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}} The domain G Groundness analysis G(v) = • , if v= bot • {e Eqns| X is bound to a ground term in e},if v= X X is always ground • Eqns,if v= true no groundness information • the abstraction of the concrete semantics p(X,Y) -> true q(X,Y) -> true r(X,Y) -> X & Y • the abstract semantics p(X,Y) -> true q(X,Y) -> true r(X,Y) -> true • the abstract program p(X,Y) :- lubG (X,Y). q(X,Y) :- true. r(X,Y) :- glbG (p(X,Y),q(X,Y)).

the program p(X,Y) :- X=a. p(X,Y) :- Y=b. q(X,Y) :- X=Y. r(X,Y) :- p(X,Y),q(X,Y). the concrete semantics p(X,Y) -> {{X=a},{Y=b}} q(X,Y) -> {{X=Y}} r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}} The domain Def Groundness analysis Def(v) = • {e  Eqns| X = Y  e},if v= X  Y X is ground if and only if Y is ground • {e Eqns| X = t  e and Y occurs in t},if v= X  Y if X is ground then Y is ground • ….. • the abstraction of the concrete semantics p(X,Y) -> true q(X,Y) -> X Y r(X,Y) -> X & Y • the abstract semantics p(X,Y) -> true q(X,Y) -> XY r(X,Y) -> XY • the abstract program p(X,Y) :- lubDef (X,Y). q(X,Y) :- X  Y. r(X,Y) :- glbDef (p(X,Y),q(X,Y)).

the program p(X,Y) :- X=a. p(X,Y) :- Y=b. q(X,Y) :- X=Y. r(X,Y) :- p(X,Y),q(X,Y). the concrete semantics p(X,Y) -> {{X=a},{Y=b}} q(X,Y) -> {{X=Y}} r(X,Y) -> {{X=a,Y=a},{X=b,Y=b}} The domain Pos Groundness analysis pos(v) = • {e Eqns| either X or Y is bound to a ground term in e},if v= X  Y eitherX or Y is ground • …. • the abstraction of the concrete semantics p(X,Y) -> X  Y q(X,Y) -> X Y r(X,Y) -> X & Y • the abstract semantics p(X,Y) -> X  Y q(X,Y) -> XY r(X,Y) -> X & Y • the abstract program p(X,Y) :- lubpos (X,Y). q(X,Y) :- X  Y. r(X,Y) :- glbpos(p(X,Y),q(X,Y)).

F = concrete semantic evaluation function concrete enough to observe the property the property is modeled by an abstract domain(A, )and a Galois insertion , F= abstract semantic evaluation function S = specification of the property, i.e., abstraction of the intended concrete semantics partial correctness:(lfp F) S sufficient partial correctness condition:F( S)S(Comini, Levi, Meo & Vitiello, JLP 99) ifF(S)S thenSis a prefixpoint ofF hence (lfp F) lfpFSa Program Verification byAbstract Interpretation Verification

F = concrete semantic evaluation function F= abstract semantic evaluation function analysis:computelfp F we need to compute a fixpoint noetherian domain or widening Analysis and Verification Verification • S = specification of the property • verification: prove F(S)S • no fixpoint computation and no need for noetherian domains • finite representation of the specification • decidability of 

assume the program to be partially correct wrt the specification S, i.e., (lfp F) S then there exists another specificationT, stronger thanS, such that the sufficient conditionF(T) Tholds we have shown that the proof method is complete if and only if the abstraction is complete (precise)(Levi & Volpe, PLILP 98) Completeness of the proof method Verification

one can be interested in establishing different kinds of properties of the final state of the relation between initial and final state of the relation between specific pairs of intermediate states, e.g., procedure calls …. there exist different corresponding proof methods all the proof methods are instances ofF(S)Sfor different choices of the concrete semantic evaluation functionF F can be derived by abstract interpretation (comparative semantics) from the most concrete semantics, i.e., a trace semantics first step of abstraction = choice of the “right” semantics Proof methods and the reference semantics Verification in (positive) logic programming, all the known verification methods have been reconstructed(Levi & Volpe, PLILP 98)

extensional specifications typical analysis properties described by noetherian abstract domains properties such as polimorphic types which lead to finite abstract semantics, even with non-noetherian domains intensional specifications, specified by means of assertions assertions are abstract domains a formula describes the set of all the concrete states which “satisfy” it (concretization) if the specification language is closed under conjunction, it is easy to define the abstraction function we can derive an abstract function Fa, which computes on the domain of assertions and instantiate the verification condition (Comini, Gori & Levi, MFCSIT 00) Making F(S)S effective Verification the relationon the domain of assertions must be decidable an open problem: completeness of the abstract semantics associated to a specific language of assertions

decidable specification languages have been proposed for functional programming and logic programming one example: a powerful language which allows one to express several properties of logic programs, including types, freeness and groundness(Volpe, SCP 00) experiments using Horn Clause Logic as specification language(Comini, Gori & Levi, AGP 00) it is not decidable most of the verification conditions can be proved without using a theorem prover simple logic program transformation techniques, which can be partially supported by an automatic tool Specification Languages Verification

once we have the abstract domain, the design of the abstract semantics is systematic abstract interpretation theory provides results which can be exploited to make the design of abstract domains (more) systematic to compare and combine domains to refine domains so as to improve their precision reducedproduct (of domainsAandB) allows one to analyze (together) the properties modeled byAandB often delivers better results than the separate analyses because of domain interaction Systematic abstract domain design Domain design lifting to the powerset (and disjunctive completion ) • roughly speaking, transformAintoP(A) • better precision • no loss of information in computing lub’s

several useful operators on abstract domains (refinements) a survey in(File’, Giacobazzi & Ranzato, ACM Comput. Surv. 96) linear completion (Giacobazzi, Ranzato & Scozzari, SAS 98) functional dependencies modeled by linear implication reconstruction of all the known domains for groundness analysis(Scozzari, SAS 97) DEF=G -> G POS = DEF -> DEF POS = POS -> POS optimality ofPOS Operations on Abstract Domains Domain design successfully applied to other domains for logic programs • types(Levi & Spoto, PLILP 98) • sharing and freeness(Levi & Spoto, PEPM 00) open problems • do the same refinements apply to other programming paradigms? • can refinements be extended to domains of assertions and to type systems?

a mathematically simple and solid foundation for comparative semantics static analysis verification a methodology for the systematic derivation of abstract domains from the property complexity issues? quantitative analyses? abstract semantics from the concrete semantics and the abstract domain Abstract Interpretation Abstract Interpretation

Abstraction and Approximation via Abstract Interpretation:

Abstraction and Approximation via Abstract Interpretation:

Presentation Transcript

Introduction to Abstract Interpretation

Abstract Interpretation and Predicate Abstraction

Basic abstract interpretation theory

Ch. 6 - Approximation via Reweighting

Practical verification with abstract interpretation

Sparse Abstract Interpretation

Static Analysis with Abstract Interpretation

Approximation via Doubling (Part II)

Abstraction/Abstract Art

Approximation via Doubling

Iterative Program Analysis Abstract Interpretation

Iterative Program Analysis Abstract Interpretation

Sketching and Streaming Entropy via Approximation Theory

Abstract interpretation

Iterative Program Analysis Abstract Interpretation

Approximation via Doubling

Purity Analysis : Abstract Interpretation Formulation

Abstraction and Abstract Thinking Part 1 “Algorithms” Part 2 “Abstract Networks”

Abstraction and Abstract Interpretation

Abstract Interpretation and Future Program Analysis Problems

Logical Abstract Interpretation

Sketching and Streaming Entropy via Approximation Theory