460 likes | 661 Views
Context-sensitive points-to analysis: is it worth it?. Article by Ondřej Lhoták & Laurie Hendren from McGill University. Presentation by Roza Pogalnikova. Abstract. Evaluate precision of subset-based points-to analysis Compare different context-sensitivity approaches: call site strings
E N D
Context-sensitivepoints-to analysis:is it worth it? Article by Ondřej Lhoták & Laurie Hendren from McGill University Presentation by Roza Pogalnikova
Abstract • Evaluate precision of subset-based points-to analysis • Compare different context-sensitivity approaches: • call site strings • object sensitivity • algorithm by Zhu and Calman, Whaley and Lam (ZCWL)
Subset-based PTA • Finding allocation sites that reach variable: • S: a = new A() // allocation statement • for variable x somewhere in the program: can it point to object allocated at S?
Context Sensitivity • Call site: by program statement of method invocation • Object sensitivity: by receiving object of method invocation • ZCWL: k-CFA, where k is call graph depth without SCCs Run context-insensitive algorithm on cloned context-sensitive call graph. S: this->call_method() S:this->call_method()
Parameters • Include: • specialize only pointer variables • use heap abstraction as well • Different lengths of context strings
Measurements • Measure to guide implementation: • number of contexts • number of distinct contexts • number of distinct point-to sets • Measure to evaluate: • size of the call graph (methods/edges) • devirtualizable call sites • casts statically provable to be safe
Results • Object sensitivity is the best and most scalable • Heap abstraction improves precision of analysis • Reduced analysis precision when no context sensitivity call graph in cycles
What • Compare three kinds of context-sensitive points-to analysis: • call sites as context abstraction • object-sensitive analysis • ZCWL algorithm
How • Implemented with JEDD system: • language extension of Java • abstraction of work with Binary Decision Diagrams (BDDs) • Soot framework written in JEDD: • points-to analysis • call graph construction • side-effect analysis in BDDs • virtual call resolution
BDDs Binary decision tree and truth table for the function f(x1, x2, x3) = -x1 * -x2 * -x3 + x1 * x2 + x2 * x3 BDD for the function f * credit: http://en.wikipedia.org/wiki/Binary_decision_diagram
PTA using BDDs Points-to:(a, A)(b, B)(c, C)(a, B)(b, A)(c, A), (c, B) • Program:A: a = new O()B: b = new O()C: c = new O()a = bb = ac = b
PTA using BDDs Points-to representation:(a, A) as 0000(a, B) as 0001(b, A) as 0100(b, B) as 0101(c, A) as 1000(c, B) as 1001(c, C) as 1010 • Binary representation: • a & A as 00 • b & B as 01 • c & C as 10
PTA using BDDs • Compact way to represent points-to relations: * credit: [2] Points-to Analysis using BDDs
Determine • How many contexts generalized? • How number of contexts relates to precision of analysis? • How likely scalable solution to be feasible?
Background • O - pointer targets (objects) • P – pointers • I – method invocation p may point to o: O(o) ϵ pt(P(p))
Background • Oas – program statement where object was allocated • Pvar - pointer to local variable • [O(o), f] - field f of object o • Pfs(o.f) – pointer to a field f of object o
Background • Compare 2 families of invocation abstraction: • call site Ics(i) (program statement of metacall) • receiver object Iro(i) = O(o) (object on which method was invoked)
Background • String of contexts given base abstraction Ibase: Istring(i) = [Ibase(i), Ibase(i2), Ibase(i3), ...] • ij is a j'th topmost invocation on stack during i (i = i1) • Two approaches to make it finite: • define limit k to length of context string • ZCWL: exclude cycle edges from call graph
Background • Another choice: which pointers/objects to model context-sensitively? • Given context-insensitive Pci and context I model run-time pointer p: • context-sensitively by P(p) = [I(ip), Pci(p)] (ip method invocation with p) • context-insensitively by P(p) = Pci(p)
Background • Given allocation site abstraction Oas, and context I model object o: • context-sensitively by O(o) = [I(io), Oas(o)] (io method invocation where o was allocated) • context insensitively by O(o) = Oas(o)
Benchmarks • The study was performed on: • SpecJVM 98 benchmark suite • DaCapo benchmark suite (ver. beta050224) • Ashes benchmark suite • Polyglot extensible Java front-end • SUN standard library 1.3.1_01
Contexts Number • Considered intractable: • propagate context from call site to called method • context strings number grows exponentially in the length of call chains
Contexts Number • Clarify next issues: • how many of these contexts improve analysis results? • why BDDs can represent such number, and is there hope to represent it with traditional techniques?
Total contexts number • Count method-context pairs • Empty spots – analysis not completed with available memory • BDD lib. could allocate 41 million BDD nodes (~820 MB)
Total contexts number • Explicit context representation not scaling good • Contexts number grows slowly in object-sensitive (this pointer method invocations) • ZCWL • k is max call depth in the call graph after merging SCCs • big variations because k different for each benchmark
Equivalent contexts • Method-context pairs (m1, c1) and (m2, c2) are equivalent if: • m1 = m2 • ∀ local pointer p in the method, pt(P(p)) is the same for c1 and c2 • Equivalence classes reflect precision improvement due to context sensitivity
Equivalent contexts • BDD “automatically” merges equal points-to relations, i. e. is effective • Object-sensitive vs. call sites – more precise • Context string length does not have great impact • Surprisingly ZCWL is less precise due to context-insensitivity in SCCs
Distinct points-to sets • Measures analysis cost • Approximates space requirements in “traditional”representation, like shared bit-vectors • Similar results for all context-sensitive variations • Increase in distinct point-to sets with context-sensitive heap abstraction
Call Graph • Compare context-insensitive projection of context-sensitive call graphs • each node is method (and not method-context pair) • reachable methods preserved • ZCWL excluded (same as input context-insensitive graph)
Reachable methods • Context-sensitivity discovers more unreachable methods (bloat) • Context-sensitivity for heap objects: • In object-sensitive adds precision (sablecc-j) • In call site no impact
Call edges • Compare size of call graph in call edges • The same with exception of large difference in sablecc-j (specific code pattern)
Virtual call resolution • Number of virtual calls with more then one implementation • Object-sensitive analysis has clear advantage over call site. • heap objects add precision (sablecc-j)
Cast safety • Cast cannot fail if pointer can point-to only to object of “right” type (sub-type of the type in cast) • Count non-provable casts • Object-sensitivity, especially with heap objects is the best (polyglot, javac)
Conclusions Evaluated effects: generated contexts distinct point-to sets precision of call graph construction virtual call resolution cast safety analysis • Context-sensitive variations: • object-sensitive analysis • call sites as context abstraction • ZCWL algorithm
Conclusions • Context-sensitivity improvements: • small: call graph precision • medium: virtual call resolution • major: cast safety analysis • Object-sensitive analysis was the best: • analysis precision • potential scalability
Conclusions • Object-sensitive variations improvements: • small: length of context strings • significant: heap objects with context • implementable with other existing techniques
Conclusions • ZCWL algorithm: • disappointing results • caused by context-insensitive treatment of calls within SCCs of the initial graph • large proportion of edges in SCC