1 / 26

Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability

Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability. Seth Hallem and Eric Watkins. Exhaustive Analysis Papers. “Precise Interprocedural Dataflow Analysis via Graph Reachability” Reps, Horowitz, Sagiv -- POPL 1995

Download Presentation

Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Context-Sensitive, Interprocedural Dataflow Analysis as CFL Reachability Seth Hallem and Eric Watkins

  2. Exhaustive Analysis Papers • “Precise Interprocedural Dataflow Analysis via Graph Reachability” • Reps, Horowitz, Sagiv -- POPL 1995 • applies CFL reachability to context-sensitive, interprocedural dataflow analysis • “Program Analysis via Graph Reachability” • Reps -- ILP 1997 • describes two additional applications: interprocedural program slicing and shape analysis

  3. The Reduction to CFL Reachability • Question 1: What problems can we solve? • Question 2: How do we set up the problem? • Question 3: How do we solve the problem? • Question 4: What is the complexity of this approach? • Running example: possibly uninitialized variables

  4. What problems can we solve? • IFDS problems • Finite set of dataflow facts (D) • Mapping from functions ƒ:2D2D to edges in the CFG • Each ƒ is distributive wrt the meet operator: • ƒ(a b) = ƒ(a) ƒ(b) • Possibly uninitialized vars: • Each program variable corresponds to a dataflow fact. When that fact holds, the variable may be uninitialized. • Transfer functions: a variable is uninitialized if it was just declared or if it is assigned an expression containing uninitialized variables.

  5. Simple Example int z; int main (void) { int x ,y = 0; /* {x, z} */ y = y + x; /* {x, y, z} */ z = 0; /* {x, y} */ } • D = {x, y, z}, domain/range of transfer functions is the power set of D (2D)

  6. How do we setup and solve IFDS problems? • Inputs to the algorithm: • Exploded supergraph (next couple of slides) • Outputs from the algorithm: • meet-over-all-realizable-paths solution: • MRPn = pfq( ) qRpaths (startmain, n)

  7. The Supergraph

  8. Representation Relations • Each dataflow function, ƒ, is converted to a representation relation, which is represented as a graph consisting of 2D + 2 nodes • D input nodes, one for each dataflow fact, plus the node  (or 0), which corresponds to the empty set. • D output nodes plus the node  • There is an edge from input node d1 to output node d2 if d2 ƒ(S) if d1S and d2 ƒ()

  9. More Representation Relations • (a) and (b) show representation relations for two functions (nodes smain and n1) • (c) and (d) show two ways to compose these relations • (d) illustrates the need for the  in each relation

  10. Exploding the Supergraph

  11. CFL Reachability • Want to solve the dataflow problem with a reachability query on the exploded supergraph. • Not all paths in G# are valid, though. Must match calls w/returns. • Insight: context-sensitivity = matching parens; language of matching parens is a CFL

  12. Context-Sensitivity = CFL • Assign a unique index to each callsite, define a CFL of matching calls and returns. • Suppose we have two call-sites to function P(), which we label i and k • (i (k )k )i is a valid path • (i (k )k is a valid path • (i (k )i is not

  13. Reachability Algorithm • Dynamic programming is the key • Start at the entry point to the program. Follow the edges in G#, recording what dataflow facts we can reach. • At a procedure call, follow the call. To avoid re-doing any work, though, maintain a cache of edges of that summarize pieces of the computation. • Summary edges record the results of an entire procedure, start at a callsite, end at the corresponding return-site. • Path edges record the suffix of a valid path.

  14. Dynamic Programming Details

  15. Complexity • Worst case for general CFL reachability is cubic in the number of nodes in the graph • Can do better for dataflow analysis: O(ED3) for any distributive problem, O(Call D3 + hED2) for h-sparse problems • possibly uninitialized variables is 2-sparse when aliasing is ignored: a variable’s status as initialized or uninitialized can only affect itself and one other variable (if it is assigned to that variable)

  16. Other Applications • Interprocedural slicing • identify all pieces of a program relevant to a particular statement • Shape Analysis • For any DAG data structure, determines a superset of the possible shapes for that data structure. • Each dataflow fact corresponds to a single possible shape. • Problem: infinite number of shapes. Solution is to define shape at program point q in terms of shape at previous program points. • ILP paper has an example of shape analysis of a linked list.

  17. The other papers • “Demand Interprocedural Dataflow Analysis” • Horowitz, Reps, Sagiv -- FSE 1995 • “Demand-driven Computation of Interprocedural Data Flow” • Duesterwald, Gupta, Soffa -- POPL 1995 • Provide two possible frameworks for transforming any IFDS analysis into a demand-driven analysis

  18. Steps to Demand-driven analysis • Define problem in the IFDS framework • Reverse the flow functions, or reverse the flow edges • Start with initial query < d, n > • Propagate the query backwards until solved

  19. Reversing dataflow • In Duesterwald et al., the dataflow problem is specified with flow functions • Reverse the functions • For CFL problems, the problem is represented as a set of edges • Just reverse the edges

  20. Example: CCP Notation • x – set of dataflow facts • xw – dataflow fact for variable w • fn(x)w – transfer fn for variable w at node n • [w = c] – set of dataflow facts, where the fact for variable w equals c

  21. Query Algorithm • Worklist holds the set of outstanding queries • While not empty, remove a query • Propagate backwards one node in the flowgraph • For a function call, create a backwards summary for that function and apply that

  22. Query Propagation More notation • rp – entry node for procedure p • m, n – normal nodes • fm – reverse dataflow fn for node m • Ncall – all nodes that are callsites • call(m) – the procedure called at node m • f(rp, ep) – summary fn for procedure p

  23. Backwards edge propagation

  24. Query Algorithm Efficiency • Optimizations: function summaries, early termination, query result cache • In the worst case, it’s the same as exhaustive analysis • Some problems work better than others for demand-driven analysis. • Depends how much information you need to answer queries, or how many queries need to be made.

  25. Conclusions • Demand-driven analysis is a powerful idea • Saves time and space, but in the worst case it’s no better than exhaustive analysis • Only works for distributive problems • Two approaches for demand-driven analysis are equivalent

  26. Discussion • Are these algorithms generally applicable? • Are they fast? • No evidence the papers, but the answer is yes (see ESP in a couple of weeks) • Why are they efficient (beyond the complexity guarantee)? • Is it always cheap to compute the exploded supergraph? • How can an imprecise alias analysis influence this step and the overall performance of the algorithm?

More Related