Interprocedural Analysis

Interprocedural Analysis NoamRinetzky Mooly Sagiv http://www.cs.tau.ac.il/~msagiv/courses/pa05.html Tel Aviv University 640-6706 Textbook Chapter 2.5

Outline • Challenges in interprocedural analysis • The trivial solution • Why isn’t it adequate • Simplifying assumptions • A naive solution • Join over valid paths • The call-string approach • The functional approach • A case study linear constant propagation • Context free reachability • Modularity issues • Other solutions

Challenges in Interprocedural Analysis • Respect call-return mechanism • Handling recursion • Local variables • Parameter passing mechanisms: value, value-result, reference, by name • Procedure nesting • The called procedure is not always known • The source code of the called procedure is not always available • separate compilation • vendor code • ...

A Trivial treatment of procedures • Analyze a single procedure • After every call continue with conservative information • Global variables and local variables which “may be modified by the call” are mapped to 

A Trivial treatment of procedures begin proc p() is1 [x := 1]2 end3 [call p()]45 [print x]6 end [a, x] [a, x1] [x0] [x0] [x] [x]

Advantages of the trivial solution • Can be easily implemented • Procedures can be written in different languages • Procedure inline can help • Side-effect analysis can help

Disadvantages of the trivial solution • Modular (object oriented and functional) programming encourages small frequently called procedures • Optimization • Modern machines allows the compiler to schedule many instructions in parallel • Need to optimize many instructions • Inline can be a bad solution • Software engineering • Many bugs result from interface misuse • Procedures define partial functions

Simplifying Assumptions • All the code is available • Simple parameter passing • The called procedure is syntactically known • No nesting • Procedure names are syntactically different from variables • Procedures are uniquely defined • Recursion is supported

Constant Example begin proc p() is1 if [b]2 then ( [a := a -1]3 [call p()]45 [a := a + 1]6 ) [x := -2* a + 5]7 end8 [a=7]9 ; [call p()]1011 ; [print(x)]12 end

A naive Interprocedural solution • Treat procedure calls as gotos • Obtain a conservative solution • Find the least fixed point of the system: • Use Chaotic iterations DFentry(s) =  DFentry(v) = {f(e)(DFentry(u) : (u, v)  E}

Simple Example [x, a] begin proc p() is1 [x := a + 1]2 end3 [a=7]4 [call p()]56 [print x]7 [a=9]8 [call p()]910 [print a]11 end [x0, a0] a=7 [x8, a9] [x0, a7] [x0, a7] call p5 proc p [x0, a7] [x8, a7] x=a+1 call p6 [x8, a7] [x8, a7] print x end [x8, a7] a=9 [x8, a9] call p9 call p10 [x8, a7] print a

Simple Example [x, a] begin proc p() is1 [x := a + 1]2 end3 [a=7]4 [call p()]56 [print x]7 [a=9]8 [call p()]910 [print a]11 end [x0, a0] a=7 [x0, a7] call p5 proc p [x, a] [x, a] x=a+1 call p6 [x, a] [x, a] print x end [x, a] a=9 [x, a9] call p9 call p10 [x, a] print a

We want something better … • Let paths(v) denote the potentially infinite set paths from start to v (written as sequences of labels) • For a sequence of edges [e1, e2, …, en] definef [e1, e2, …, en]: L  L by composing the effects of basic blocksf [e1, e2, …, en](l) = f(en)(… (f(e2)(f(e1)(l)) …) • JOP[v] = {f[e1, e2, …,en]() [e1, e2, …, en]  paths(v)}

( ) Valid Paths callq ret f1 fk fk-1 f2 fk-2 f3 enterq exitq f4 fk-3 f5

void p() { if (...) { x = x + 1; p(); // p_calls_p1 x = x - 1; } return; } int x; void main() { x = 5; p(); return; } Invalid Path

A More Precise Solution • Only considers matching calls and returns (valid) • Can be defined via context free grammar • Every call is a different letter • Matching calls and returns Matched  | MatchedMatched |(cMatched )c for all [call p()]lclr in P Valid Matched | lcValid for all [call p()]lclr in P

A More Precise Solution • Only considers matching calls and returns (valid) • Can be defined via context free grammar • Every call is a different letter • Matching calls and returns Let Lab* = all the labels in the program LabIP={lc,lr : [call p()]lclr in the program} Intra   | (li,lj) Intra for all li , ljin Lab*\ LabIP Matched  | Intra | MatchedMatched | (lc,ln) Matched (lx,lr) for all [call p()]lclr and p islnS lx Valid Matched | (lc,ln)Valid for all [call p()]lclr for all [call p()]lclr and p isln S lx

The Join-Over-Valid-Paths (JVP) • For a sequence of edges [e1, e2, …, en] definef [e1, e2, …, en]: L  L by composing the effects of basic statements • f[](s)=s • f[e, p](s) = f[p](fe(s)) • JVPl = {f[e1, e2, …, e]() [e1, e2, …, e]  vpaths(l), e = (*,l)} • Compute a safe approximation to JVP • In some cases the JVP can be computed • Distributivity of f • Functional representation

The Call String Approach for Approximating JVP • No assumptions • Record at every node a pair (l, c) where l  L is the dataflow information and c is a suffix of unmatched calls • Use Chaotic iterations • To guarantee termination limit the size of c (typically 1 or 2) • Emulates inline (but no code growth) • Exponential in C • For a finite lattice there exists a C which leads to join over all valid paths

Simple Example begin proc p() is1 [x := a + 1]2 end3 [a=7]4 [call p()]56 [print x]7 [a=9]8 [call p()]910 [print a]11 end [x0, a0] a=7 9,[x8, a9] 5,[x0, a7] [x0, a7] 9,[x8, a9] call p5 proc p 5,[x0, a7] [x8, a7] x=a+1 call p6 9,[x10, a9] 5,[x8, a7] [x8, a7] 9,[x10, a9] print x end 5,[x8, a7] [x8, a7] a=9 [x8, a9] call p9 call p10 [x10, a9] print a

10:[x0, a7] Recursive Example 4:[x0, a6] begin0 proc p() is1 if [b]2 then ( [a := a -1]3 [call p()]45 [a := a + 1]6 ) [x := -2* a + 5]7 end8 [a=7]9 ; [call p()]1011 ; [print(x)]12 end13 p [x0, a0] If( … ) 10:[x0, a7] a=7 [x0, a7] a=a-1 10:[x0, a6] Call p10 Call p4 10:[x-7, a6] Call p11 4:[x-7, a6] Call p5 print(x) 4:[x-7, a6] a=a+1 4:[x0, a6] 4:[x-7, a7] x=-2a+5 4:[x, a] 4:[x-7, a6] end

The Functional Approach • The meaning of a function is mapping from states into states • The abstract meaning of a function is function from an abstract state to abstract states

e.[x-2e(a)+5, a e(a)] Motivating Example begin proc p() is1 if [b]2 then ( [a := a -1]3 [call p()]45 [a := a + 1]6 ) [x := -2* a + 5]7 end8 [a=7]9 ; [call p()]1011 ; [print(x)]12 end p [x0, a0] If( … ) a=7 [x0, a7] a=a-1 Call p10 Call p4 [x-9, a7] Call p11 Call p5 [x-9, a7] print(x) a=a+1 x=-2a+5 end

e.[x-2e(a)+5, a e(a)] Motivating Example begin proc p() is1 if [b]2 then ( [a := a -1]3 [call p()]45 [a := a + 1]6 ) [x := -2* a + 5]7 end8 [read(a)]9 ; [call p()]1011 ; [print(x)]12 end p [x0, a0] If( … ) read(a) [x0, a] a=a-1 Call p10 Call p4 [x, a] Call p11 Call p5 [x, a] print(x) a=a+1 x=-2a+5 end

The Functional Approach • Main idea: Iterate on the abstract domain of functions from L to L • Two phase algorithm • Compute the dataflow solution at the exit of a procedure as a function of the initial values at the procedure entry (functional values) • Compute the dataflow values at every point using the functional values • Can compute the JVP

Example: Constant propagation • L = VarN  {, } • Domain: F:LL • (f1f2)(x) = f1(x)f2(x) Id=envL.env x=7 env.env[x7] env.env[x7] ○ env.env y=x+1 env.env[yenv(x)+1] env.env[yenv(x)+1] ○env.env[x7] ○ env.env x=y

Example: Constant propagation • L = VarN  {, } • Domain: F:LL • (f1f2)(x) = f1(x)f2(x) Id=env.env Id=env.env x=7 y=x+1 env.env[yenv(x)+1] env.env[x7] env.env[yenv(x)+1] env.env[x7] x=y

Running Example 1 init p1 begin0 If( … )2 a=79 a=a-13 Call p10 Call p4 Call p11 Call p5 print(x)12 a=a+16 end13 x=-2a+57 end8

Running Example 1 p1 begin0 If( … )2 a=79 a=a-13 Call p10 Call p4 Call p11 Call p5 print(x)12 a=a+16 end13 x=-2a+57 end8

Running Example 2 p1 begin0 If( … )2 a=79 a=a-13 Call p10 Call p4 Call p11 Call p5 print(x)12 a=a+16 end13 x=-2a+57 end8

Issues in Functional Approach • How to guarantee that finite height for functional lattice? • It may happen that L has finite height and yet the lattice of monotonic function from L to L do not • Efficiently represent functions • Functional join • Functional composition • Testing equality • Usually non-trivial • But can be done for distributive functions

Example Linear Constant Propagation • Consider the constant propagation lattice • The value of every variable y at the program exit can be represented by: y =  {(axx + bx )| x Var* }  c ax ,c Z {, } bx Z • Supports efficient composition and “functional” join • [z := a * y + b] • What about [z:=x+y]? • Computes JVP

Functional Approach via Context Free Reachablity • The problem of computing reachability in a graph restricted by a context free grammar can be solved in cubic time • Can be used to compute JVP in arbitrary finite distributive data flow problems (not just bitvector) • Nodes in the graph correspond to individual facts • Efficient implementations exit (MOPED)

Conclusion • Handling functions is crucial for abstract interpretation • Virtual functions and exceptions complicate things • But scalability is an issue • Assume-guarantee helps • But relies on specifications

Interprocedural Analysis