1 / 44

Program Analysis

Program Analysis. Mooly Sagiv http://www.math.tau.ac.il/~sagiv/courses/pa01.html Tel Aviv University 640-6706 Textbook: Principles of Program Analysis Chapter 1.5-8 (modified). Outline. Mathematical Background Abstract Interpretation Type systems Conclusions. Mathematical Background.

mahlah
Download Presentation

Program Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Program Analysis Mooly Sagiv http://www.math.tau.ac.il/~sagiv/courses/pa01.html Tel Aviv University 640-6706 Textbook: Principles of Program Analysis Chapter 1.5-8 (modified)

  2. Outline • Mathematical Background • Abstract Interpretation • Type systems • Conclusions

  3. Mathematical Background • Declaratively define • The result of the analysis • The exact solution • Allow comparison

  4. Posets • A partial ordering is a binary relation : L  L  {false, true} • For all l  L : l  l (Reflexive) • For all l1, l2, l3 L : l1  l2, l2  l3  l1  l3 (Transitive) • For all l1, l2 L : l1  l2, l2  l1  l1 = l2 (Anti-Symmetric) • Denoted by (L,  ) • In program analysis • l1 l2 l1 is more precise than l2 l1 represents fewer concrete states than l2 • Examples • Total orders (N, ) • Powersets (P(S), ) • Powersets (P(S), ) • More notations • l1 l2  l2 l1 • l1 l2  l1 l2  l1 l2 • l1 l2  l2 l1

  5. Upper and Lower Bounds • Consider a poset (L,  ) • A subset L’  L has a lower bound l L if for all l’  L’ : l l’ • A subset L’  L has an upper bound u L if for all l’  L’ : l’  u • A greatest lower bound of a subset L’  L is a lower bound l0 L such that l  l0 for any lower bound l of L’ • A lowest upper bound of a subset L’  L is an upper bound u0 L such that u0  u for any upper bound u of L’ • For every subset L’  L: • The greatest lower bound of L’ is unique if at all exists • L’ (meet)a b • The lowest upper bound of L’ is unique if at all exists • L’ (join) ab

  6. Complete Lattices • A poset (L,  ) is a complete lattice if every subset has least and upper bounds • L = (L, ) = (L, , , , , ) •  =   =  L •  =  L =   • Lemma For every poset (L,  ) the following conditions are equivalent • L is a complete lattice • Every subset of L has a least upper bound • Every subset of L has a greatest lower bound

  7. Cartesian Products • A complete lattice (L1, 1) = (L1, , 1, 1, 1, 1) • A complete lattice (L2, 2) = (, , 2, 2, 2, 2) • Define a Poset L = (L1 L2 ,) where • (x1, x2)  (y1, y2) if • x1  x2 and • y1  y2 • L is a complete lattice

  8. Chains • A subset Y  L in a poset (L,  ) is a chain if every two elements in Y are ordered • For all l1, l2  Y: l1  l2 or l2  l1 • An ascending chain is a sequence of values • l1  l2  l3  … • A strictly ascending chain is a sequence of values • l1  l2  l3… • A descending chain is a sequence of values • l1  l2  l3  … • A strictly descending chain is a sequence of values • l1  l2  l3  … • L has a finite height if every chain in L is finite • Lemma A poset (L,  ) has finite height if and only if every strictly decreasing and strictly increasing chains are finite

  9. Monotone Functions • A poset (L,  ) • A function f: L  L is monotoneif for every l1, l2 L: • l1  l2 f(l1 )  f(l2)

  10. f() f2() Fix(f) Red(f) f2() Ext(f) f()  Fixed Points • A monotone function f: L  L where (L, , , , , ) is a complete lattice • Fix(f) = { l: l  L, f(l) = l} • Red(f) = {l: l  L, f(l)  l} • Ext(f) = {l: l  L, l  f(l)} • l1  l2 f(l1 )  f(l2) • Tarski’s Theorem 1955: if f is monotone then: • lfp(f) =  Fix(f) =  Red(f)  Fix(f) • gfp(f) =  Fix(f) =  Ext(f)  Fix(f) gfp(f) lfp(f)

  11. Chaotic Iterations • A lattice L = (L, , , , , ) with finite strictly increasing chains • Ln = L  L  …  L • A monotone function f: LnLn • Compute lfp(f) • The simultaneous least fixed of the system {x[i] = fi(x) : 1  i n } for i :=1 to n do x[i] =  WL = {1, 2, …, n} while (WL  ) do select and remove an element i  WL new := fi(x) if (new  x[i]) then x[i] := new; Add all the indexes that directly depends on i to WL x := (, , …, ) while (f(x)  x ) do x := f(x)

  12. The Abstract Interpretation Technique • The foundation of program analysis • Goals • Establish soundness of (find faults in) a given program analysis algorithm • Design new program analysis algorithms • The main ideas: • Relate each step in the algorithm to a step in a structural semantics • Establish global correctness using a general theorem • Not limited to a particular form of analysis

  13. Soundness in Reaching Definitions • Every reachable definition is detected • May include more definitions • Less constants may be identified • Not all the loop invariant code will be identified • May warn against uninitailzed variables that are in fact in initialized • At every elementary block lRDentry(l) includes all the possibly definitions reaching l • At every elementary block lRDentry(l) “represents” all the possible concrete states arising when the structural operational semantics reaches l

  14. Proof of Soundness • Define an “appropriate” structural operational semantics • Define “collecting” structural operational semantics • Establish a Galois connection between collecting states and reaching definitions • (Local correctness) Show that the abstract interpretation of every atomic statement is soundw.r.t. the collecting semantics • (Global correctness) Conclude that the analysis is sound CC1976

  15. Structural Operational Semantics to justify Reaching Definitions • Normal states [Var* Z] are not enough • Instrumented states[Var* Z]  [Var* Lab*] • For an instrumented state (s, def) and variable xdef(x) holds the last definition of x

  16. [comp1sos] <S1 , (s, d)>  <S’1, (s’, d’)> <S1; S2, (s, d)>  < S’1; S2, (s’, d’)> [comp2sos] <S1 , (s, d)> (s’, d’) <S1; S2, (s, d)>  < S2, (s’, d’)> Instrumented Structural Semantics for While [asssos] <[x := a]l, (s, d)>  (s[x Aas], d(x l)) [skipsos] <[skip]l, (s, d)>  (s, d) axioms rules

  17. [ifttsos] <if [b]l then S1 else S2, (s, d)> <S1, (s, d)> [ifffsos] <if [b]l then S1 else S2, (s, d)> <S2, (s, d)> if Bbs=tt if Bbs=ff Instrumented Structural Semantics if construct

  18. Instrumented Structural Semanticswhile construct [whilesos] <while [b]l do S, (s, d)>  <if [b]l then (S; while [b]l do S) else skip, (s, d)>

  19. The Factorial Program [y := x]1;[z := 1]2; while [y>1]3 do ( [z:= z * y]4; [y := y - 1]5; ) [y := 0]6;

  20. Code Instrumentation • Alternative instrumentation • Generate an equivalent program which maintains more information • Use standard structural operational semantics

  21. Other Consumers of Instrumentation • Specialized interpreters • Code Instrumentation • Performance analysis qpt • count the number of executions of basic blocks or the number of calls to a function • Profiling Tools • Find “hot” paths (paths that are executed often) by remembering which edge in the control flow graph was executed • Cleanness Tools Purify, Insure • identify uninitialized objects

  22. Collecting (Instrumented) Semantics • The input state is not known at compile-time • “Collect” all the (instrumented) states for all possible inputs to the program • No lost of precision

  23. Flow Information for While • Associate labels with program statements describing when statements begin and end • init:StmLab* • init([x := a]l)= l • init([skip]l)= l • init(S1 ; S2) = init(S1) • init(if [b]lthen S1else S2) = l • init(while [b]l do S) = l • final:StmP(Lab*) • final([x := a]l)= {l} • final([skip]l)= {l} • final(S1 ; S2) = final(S2) • final(if [b]lthen S1else S2) = final(S1) final(S2) • final(while [b]l do S) = {l}

  24. Collecting (Instrumented) Semantics(Cont) • The input state is not known at compile-time • “Collect” all the (instrumented) states for all possible inputs to the program • Define d?:Var* Lab* by d?(x)=? • CSentry(l) = {(s’, d’)|s0: (P, (s0, d?) * (S’, (s’, d’)), init(S’)=l} • Soundness w.r.t. operational semanticsFor all (s’, d’) in CSentry (l) For all variable x (x, d(l)) RDentry(l) • Optimality w.r.t. operational semantics

  25. The Factorial Program [y := x]1;[z := 1]2; while [y>1]3 do ( [z:= z * y]4; [y := y - 1]5; ) [y := 0]6;

  26. An “Iterative” Definition • Generate a system of monotonic equations • The least solution is well-defined • The least solution is the collecting interpretation

  27. Equations Generated for Collecting Interpretation • Equations for elementary statements • [skip]lCSexit(1) = CSentry(l) • [b]lCSexit(1) = CSentry(l) • [x := a]lCSexit(1) = {(s[x Aas], d(x l)) | (s, d)  CSentry(l)} • Equations for control flow constructsCSentry(l) =  CSexit(l’) l’ immediately precedes l in thecontrol flow graph • An equation for the entryCSentry(1) = {(s0, d?) |s0  Var* Z}

  28. The Least Solution • 12 sets of equationsCSentry(1), …, CSexit (6) • Can be written in vectorial form • The least solution lfp(Fcs) is well-defined • Every component is minimal • Since Fcs is monotonic such a solution always exists • CSentry(l) = {(s’, d’)|s0: (P, (s0, d?) * (S’, (s’, d’)), init(S’)=l} • Simplify the soundness criteria

  29. Operational semantics statement s Set of states Set of states concretization abstraction statement s abstract representation Abstract semantics Abstract (Conservative) interpretation abstract representation

  30. The Abstraction Function • Map collecting states into reaching definitions • The abstraction of an individual state:[Var* Z]  [Var* Lab*]  P(Var*  Lab*)(s,d) = {(x, d(x) | x  Var* } • The abstraction of set of states:P([Var* Z]  [Var* Lab*])  P(Var*  Lab*) (CS) = (s, d)  CS (s,d) = = {(x, d(x) | (s, d)  CS, x  Var* } • Soundness(CSentry (l))  RDentry(l) • Optimality

  31. The Concretization Function • Map reaching definitions into collecting states • The formal meaning of reaching definitions • The concretization: P(Var*  Lab*)  P([Var* Z]  [Var* Lab*])  (RD) = {(s, d) |  x  Var* :(x, d(x)  RD}= = { (s, d) | (s, d)  RD} • SoundnessCSentry (l)   (RDentry(l)) • Optimality

  32. Galois Connections • The pair of functions (, ) form a Galois connection if: CS  P([Var* Z]  [Var* Lab*])  RD P(Var* Lab*) (CS)  RD iff CS   (RD) • Alternatively: CS  P([Var* Z]  [Var* Lab*])  RD P(Var* Lab*) ( (RD))  RD and CS   ((CS)) •  and  uniquely determine each other

  33. Local Concrete Semantics • For every atomic statement S • S  : [Var* Z]  [Var* Lab*] [Var* Z]  [Var* Lab*] • x := a]l ((s, d)) = (s[x Aas], d(x l)) • skip]l ((s, d)) = (s, d) • b]l ((s, d)) = (s, d)

  34. Local Abstract Semantics • For every atomic statement S • S  # : P(Var* Lab*)  P(Var* Lab*) • x := a]l  #(RD) = (RD - {(x, l’) | l’ Lab })  {(x, l)} • skip]l # (RD) = (RD) • b]l # (RD) = (RD)

  35. Local Soundness • For every atomic statement S show one of the following • ({S(s, d) | (s, d) CS }  S# ((CS)) • {S(s, d) | (s, d)   (RD)}   (S# (RD)) • ({S(s, d) | (s, d)   (RD)})  S# (RD) • The above condition implies global soundness [Cousot & Cousot 1976] (CSentry (l))  RDentry(l) CSentry (l)   (RDentry(l))

  36. Proof of Soundness (Summary) • Define an “appropriate” structural operational semantics • Define “collecting” structural operational semantics • Establish a Galois connection between collecting states and reaching definitions • (Local correctness) Show that the abstract interpretation of every atomic statement is soundw.r.t. the collecting semantics • (Global correctness) Conclude that the analysis is sound

  37. Induced Analysis (Relatively Optimal) • It is sometimes possible to show that a given analysis is not only sound but optimal w.r.t. the chosen abstraction (but not necessarily optimal) • Define S# (RD) = ({S(s, d) | (s, d)   (RD)}) • But this S# may not be computable • Derive (at compiler-generation time) an alternative form for S# • A useful measure to decide if the abstraction must lead to overly imprecise results

  38. Type and Effect Systems • The type of a program expression at a given program point provides a conservative estimation to its value in all the execution paths • A type system provides a syntax directed rules for annotating expressions with types • Simple type inference algorithms are linear • But in Ada, ML, ABC… • But types can also include implementation information such as reaching definitions

  39. Annotated Type Base for Reaching Definitions • S : RD1 RD2if S is executed when the reaching definitions is RD1 it produces reaching definitionsRD2 • Similar to the constraint based approach

  40. [seq] S1 : RD1RD2, S2 : RD2RD3 S1; S2: RD1 RD3 [if] S1 : RD1RD2, S2 : RD1RD2 if [b]l then S1 else S2 : RD1 RD2 Annotated Type Base for Reaching Definitions [ass] [x := a]l’: RD  (RD - {{(x, l) | l Lab })  {(x, l’)} [skip] [skip]l: RD RD axioms rules

  41. [wh] S : RD RD while [b]l do S: RD RD Annotated Type Base For Whilewhile construct

  42. [sub] S : RD2RD3 S: RD1 RD4 if RD1RD2 and RD3RD4 Annotated Type Base For Whilesubsumption rule

  43. Not Covered • Effect Systems • Transformations

  44. Conclusions • Three similar techniques • Dataflow analysis • Constraint based approach (a generalization) • Type and effect system (directly deals with the syntax) • Abstract interpretation can be used to show soundness of these methods • But more convenient in the dataflow setting • We are ready for more sophisticated analyses

More Related