230 likes | 338 Views
IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries. Atanas (Nasko) Rountev Mariana Sharp Guoqing (Harry) Xu Ohio State University Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant. Interprocedural Analysis with Large Libraries.
E N D
IDE Dataflow Analysis in the Presence of Large Object-Oriented Libraries Atanas (Nasko) Rountev Mariana Sharp Guoqing (Harry) Xu Ohio State University Supported by NSF Career grant CCF-0546040 and IBM Eclipse Innovation grant
Interprocedural Analysis with Large Libraries • All programs are built with reusable components • Standard libraries in C++, Java, C# • Domain-specific libraries • Whole-program analysis: complete client program C, together with all libraries it uses • Solutions for all program points in C and in the libraries • Summary-based analysis: pre-analyze the library and record reusable library summary information • Solutions for all program points in C • Goal: reduce the cost without losing any precision • e.g., the solutions inside C should be the same • This may be low-hanging fruit
Talk Outline • Interprocedural distributed environment (IDE) dataflow analysis problems • Definition; precise whole-program analysis • Examples: dependence analysis and type analysis • Generation of library summaries for IDE problems • Intra/interprocedural analysis in the library • Handling the possible effects of unknown clients • Filtering away details that are irrelevant for clients • Experimental evaluation • Entire Java 1.4.2 libraries; 20 client programs
Interproc. Distributive Environment Problems • Defined by Sagiv, Reps, and Horwitz [TheorCompSci96] • Subsumes the interprocedural finite distributive subset (IFDS) problems from their [POPL95] work • Versions of constant propagation, slicing, alias analysis, side-effect analysis, reaching definitions, liveness, etc. • An environment is a map e : D L; e Env(D,L) • D is a set of symbols, L is a meet semi-lattice • Environment meet: (e1 e2)(d) = e1(d) e2(d) • Environment transformer t : Env(D,L) Env(D,L) • Distributive: e.g. t(e1 e2) = t(e1) t(e2)
Dependence Analysis and Type Analysis for Java • Dependencies: for a local variable v at CFG node n, which formal parameters of n’s method influence v? • Restricted form of dep. analysis; useful for SDG building • D = { v1, …, vk }: locals vi • L = powerset of { f1, …, fm }: formals fj; meet is • Transformer for v1:=f2: t(e) = e[v1 {f2}] • Transformer for v1:=v2+v3: t(e) = e[v1 e(v2)e(v3)] • Call v1:=meth(v2): composition of v2-to-formal, valid same-level paths in meth, return-to-v1 • 0-CFA type analysis: D = { v1, …, vk, fld1, …, fldm }: locals and fields; L = powerset of set of types
Representation of Environment Transformers • Key issue for any summary-based analysis: how do we represent and manipulate dataflow functions? • For IDE: composition/meet of environment transformers • Sagiv et al.: a transformer can be represented by a bipartite directed graph with 2(|D|+1) nodes • Edges labeled with functions L L
Composition of Transformers • Graph reachability + composition of edge labels
Precise Whole-Program Analysis • Graph reachability along valid interprocedural paths • Phase 1: summary function fn for each CFG node n • Represents the solution at n as a function of the solution at the entry of the procedure containing n • Computed through composition and meet of transformers • Summary function at proc exit used at call sites to proc • Partial functions fn: only for the subset of the domain that is relevant to callers of n’s procedure • Phase 2: Top-down propagation of actual environments (e.g., dependence sets, type sets) • Adapt to library summary generation?
Talk Outline • Interprocedural distributed environment (IDE) dataflow analysis problems • Definition; precise whole-program analysis • Examples: dependence analysis and type analysis • Generation of library summaries for IDE problems • Intra/interprocedural analysis in the library • Handling the possible effects of unknown clients • Filtering away details that are irrelevant for clients • Experimental evaluation • Entire Java 1.4.2 libraries; 20 client programs
Phase 1: Intraprocedural Summary Generation • Produce a set of summary functions yn,m • n is the entry or a call site • m is the exit or a call site • there exists a call-free path from n to m • Similar to the summary functions fnfrom the whole-program analysis, but • complete functions instead of partial functions • all possible compositions and meets of transformers (as graph operations), until a fixed point is reached • After this, some elements of D are filtered away • e.g., for dependence analysis: locals that are not actuals of calls and not written the return values from calls
Example entry cs1 rs2 exit
Phase 2: Interprocedural Summary Generation summary for toString, at cs2 rs1 exit
Phase 2: Interprocedural Summary Generation • Fixed call site: has exactly one possible target • Cannot be a site that calls back client methods • Check type hierarchy for possible overriding in clients • Cannot have multiple target methods • Static calls; constructor calls; final classes/methods • Intraprocedural 0-CFA type analysis: in the summary function, the only edge reaching xshould be L x • Fixed method: has only fixed calls (or no calls), and this also holds for all methods reachable from it • Bottom-up traversal of the SCC-DAG of fixed methods; composition and filtering • In non-fixed methods: instantiate fixed calls to fixed methods; composition and filtering
Example: Final Summary for format entry cs1 rs1 exit
Talk Outline • Interprocedural distributed environment (IDE) dataflow analysis problems • Definition; precise whole-program analysis • Examples: dependence analysis and type analysis • Generation of library summaries for IDE problems • Intra/interprocedural analysis in the library • Handling the possible effects of unknown clients • Filtering away details that are irrelevant for clients • Experimental evaluation • Entire Java 1.4.2 libraries; 20 client programs
Summary Generation • Libraries: 10238 classes, 77190 methods • 0-CFA type analysis + dependence analysis [w/ Soot] • Both data and control dependencies • Simple optimizations: def-use chains, sparse graphs • Cost: 90 minutes time, 1.2GB memory • Includes all Soot-related costs and all I/O • Final summary on disk: 18MB • Measurements: number of edges in the graph representation of transformers • [1]: before any composition or meet • [2]: after intraprocedural composition and meet • [3]: after [2] and intraprocedural filtering: remove elements that are irrelevant for callers and callees
Intraprocedural Propagation dependence analysis: reduction in # edges from [2] to [3]: 53% type analysis: reduction in # edges from [2] to [3]: 55%
Interprocedural Propagation for Dep. Analysis • Fixed methods: 25490 (33%); eliminate 7195 (9%) of them because their only callers are in the library • Summary functions for fixed methods • Instantiate at fixed calls within non-fixed methods: eliminates 21% of all library call sites • Additional intraprocedural propagation and filtering reduction in # edges from [3] to [4]: 32%
Summary-Based Analysis of Clients • Reduction in start-to-end time: IR building, type analysis + call graph, dependence analysis
Only Dependence Analysis • Reduction in analysis time: actual analysis and a hypothetical best case with no library dependencies
Overview of Results • Start-to-end cost: IR, type analysis, dep. analysis • Average time reduction 51% • Average memory reduction 33% • Only dependence analysis • Average time reduction 69% • Average memory reduction 90% • Very close to a conservative upper bound • Conclusions • Summary generation has reasonable cost • Summary size is small (# edges and total disk size) • Significant savings for analysis running time and memory usage, compared to whole-program analysis
Future Work • This is a very preliminary study • Promising initial results, but just the tip of the iceberg • More IDE analyses, with different characteristics • e.g. points-to analysis, side-effect analysis, constant propagation, typestate properties, etc. • Beyond IDE analyses • e.g. recent [POPL08] paper by Yorsh et al. • Better handling of callbacks and polymorphic calls • e.g. take advantage of behavioral subtyping • Reusable API for storing and retrieving summary information – generality for many different analyses • Open-source API implementation based on Soot