440 likes | 542 Views
An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages. John Whaley Monica S. Lam Computer Systems Laboratory Stanford University September 18, 2002. Background. Andersen’s points-to analysis for C (1994) Flow-insensitive, context-insensitive
E N D
An Efficient Inclusion-Based Points-To Analysis for Strictly-Typed Languages John Whaley Monica S. LamComputer Systems LaboratoryStanford UniversitySeptember 18, 2002
Background • Andersen’s points-to analysis for C (1994) • Flow-insensitive, context-insensitive • Inclusion-based, more accurate thanunification-based Steensgaard • O(n3), considered too slow to be practical • CLA optimization to Andersen’s analysis (Heintze & Tardieu, PLDI’01) • Online caching/cycle elimination • Field-independent: 1.3M lines of code in 137s SAS 2002
Doing it for Java • We want Andersen-level pointers for Java • Naïve port of CLA algorithm: • Spec “compress” benchmark: 2+ hours! • Call graph accuracy: same as RTA (terrible) • Our paper: how to do CLA for Java • Spec “compress” benchmark: 5 seconds! • JEdit (1371 classes): ~10 minutes! • Call graph accuracy: very good SAS 2002
Java vs. C: Virtual calls • Java has many virtual calls • Accuracy of analysis strongly affects number of call targets • More call targets leads to more code being analyzed and longer analysis times SAS 2002
Java vs. C: Treatment of Fields • Field-independent: in o.f, use only o • Most C pointer analyses • Sound even for non-type-safe languages • Field-based: in o.f, use only f • Very inaccurate, requires type safety • Field-sensitive: in o.f, use both o, f • Strictly more accurate than field-independent or field-based • Essential for Java SAS 2002
Java vs. C: Local variables • Local variables/stack locations are reused • Flow insensitivity causes many false aliases • Local flow sensitivity is necessary SAS 2002
Our Contribution • Andersen-style inclusion-based points-to analysis for Java, based on ideas from CLA • Field sensitivity • Tracks separate fields of separate objects • Uses “method summary graphs” • Sparse representation, uses local flow sensitivity • Optimizations • Caching across iterations, reducing redundant ops • Supports all features of Java SAS 2002
Algorithm Overview Intraprocedural:Generate a sparse, flow-insensitive summary graph for each method • Based on access paths, uses local flow sensitivity Interprocedural:Using summary graphs, build inclusion graph to obtain whole-program result SAS 2002
Method Summaries • Sparse, flow-insensitive summary of the semantics of each method • Stores (writes) in method • Calls made by method and their parameters • Return values, thrown and caught exceptions • Use a flow-sensitive technique to generate method summaries • Precisely model updates to stack and locals SAS 2002
Method Summary: Example Code for method foo: Summary for method foo: static void foo(C x, C y) {C t = x.f;t.g = y;x.g = x;t.bar(y); } f g x x.f y g bar(t,y); read edge write edge parameter map edge SAS 2002
Node types A node represents an object at run time. • Concrete type nodes • Objects that have a known concrete type • new statements and constant objects • Abstract nodes • Parameters, return values, dereferences • Interprocedural phase maps an abstract node to set of concrete nodes it can represent SAS 2002
Edge types • Read edge: • Created by load statements • Represent dereferences (access paths) of known locations • Write edge: • Created by store statements • Represent references created by the method f f SAS 2002
Outgoing parameter map • Records which nodes are passed as which parameters • This is used in the interprocedural phase to match call sites to call targets f g x x.f y g t.bar(y); SAS 2002
Generating method summary • Worklist data flow solver (flow-sensitive) • Strong updates on locals, weak on others • Detect and close cycles in access paths • More detail in the paper SAS 2002
Review: Andersen’s Points-to • Points-to is encoded as inclusion relations x = y implies x y x y is also written as: x y SAS 2002
x newy newy.f e x newy e newy.f e1 e2 e1 e2, e2 e3 e1 e3 Review: Andersen’s Points-to Rule name: If code contains: Apply rule: Store x.f = e; Load e = x.f; Copy e1 = e2; Transitive closure SAS 2002
Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y SAS 2002
Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y f C D E SAS 2002
x newy e newy.f Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y f C D E Rule name: If code contains: Apply rule: Load e = x.f; SAS 2002
x newy e newy.f Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y f C D E Rule name: If code contains: Apply rule: Load e = x.f; SAS 2002
x newy newy.f e Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y f C D E Rule name: If code contains: Apply rule: Store x.f = e; SAS 2002
x newy newy.f e Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y g f C D E Rule name: If code contains: Apply rule: Store x.f = e; SAS 2002
x newy newy.f e Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y g f C D E Rule name: If code contains: Apply rule: Store x.f = e; SAS 2002
x newy newy.f e Andersen example g t = x.f; t.g = y; x.g = x; f g x x.f y g g f C D E Rule name: If code contains: Apply rule: Store x.f = e; SAS 2002
Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); f g x x.f y g g f C D E SAS 2002
Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); f g x x.f y g g f C D E SAS 2002
Mapping method calls t.bar(y); g t = x.f; t.g = y; x.g = x; t.bar(y); f g x x.f y g g f C D E Bar:this Bar:p1 SAS 2002
Overall Picture “Abstract” world F E “Concrete” world C D SAS 2002
Graph-based Andersen • Computing full transitive closure is prohibitively expensive • Store the graph in pre-transitive form, and calculate reachable nodes on demand SAS 2002
Algorithm foreach write edge e1→ e2 do foreach n in getConcreteNodes(e1) add write edge n.f → e2 foreach read edge e1→ e2 do foreach n in getConcreteNodes(e1) add inclusion edge e2 n.f foreach method call e1.f() foreach n in getConcreteNodes(e1) add parameter mappings for target method SAS 2002
Caching reachability queries • getConcreteNodes(e): transitive closure query on the inclusion graph • The same queries are repeated many times • Store the result in a hash table • Cached result may be stale due to edges added since the last query • Iterate until convergence SAS 2002
Online cycle detection • Inclusion graph includes cycles • The algorithm collapses cycles as they are traversed • During traversal, keeps track of current path • If a node on current path is revisited, collapse all nodes in cycle • Each node has a “skip” pointer, which is set when collapsed and followed on all accesses SAS 2002
Reusing caches • Concrete node cache values don’t change much between algorithm iterations • Reallocation and rebuilding them is expensive • Reuse caches from old iterations • Keep track of an iteration ‘version’ number for each cache entry SAS 2002
Minimizing set union operations • Many caches don’t change across iterations • Avoid set union operations for caches that haven’t changed since the last iteration • Keep a ‘changed’ flag for each cache entry, records if last computation changed the entry • If input set hasn’t changed, set union operation is redundant SAS 2002
Experimental Results • Concrete type inference • Static call graph • Implemented in ~800 lines of Java • Freely available at: http://joeq.sourceforge.net SAS 2002
Programs • SpecJVM • Standard benchmark suite • J2EE – Java 2 Enterprise Edition v1.3 • Massive (1+ million lines) business framework • joeq • Compiler infrastructure, 75K lines • Cloudscape • Database shipped with J2EE, no source code • JEdit • Full-featured editor, 100K lines SAS 2002
Experimental Results • We analyzed the reachable code for each application • Results include code in class library • Analysis was very effective in reducing total program size • Pentium 4 2GHz 2GB RAM, Redhat 7.2 • Sun JDK 1.3.1_01 with 512MB heap SAS 2002
Analysis Precision vs. RTA SAS 2002
Analysis time: Small benchmarks SAS 2002
Analysis time: Large benchmarks SAS 2002
Analysis time (speedup) SAS 2002
Analysis time (bytecodes/second) SAS 2002
Related Work • Original CLA paper • Heintze and Tardieu (PLDI 2001) • Anderson’s analysis for Java • Rountev, Milanova, Ryder (OOPSLA 2001) • Liang, Pennings, Harrold (PASTE 2001) • Many others… • Concrete type inference • CHA, RTA • Flow and context sensitivity, 0-CFA SAS 2002
Conclusion • Improved precision • Field sensitivity • Local flow sensitivity • Improved efficiency • Reuse reachability cache across iterations • Minimize set-union operations • Scales to the largest Java programs • A new baseline for Java pointers • No reason to use a less precise analysis SAS 2002