610 likes | 635 Views
Abstract Interpretation and Future Program Analysis Problems. Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of Technology. Abstract Interpretation: The Early Years. Formal Connection Between Sound analysis of program Execution of program
E N D
Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of Technology
Abstract Interpretation:The Early Years • Formal Connection Between • Sound analysis of program • Execution of program • Broader Impact • Insight that analysis is execution • Reduced need to think of analysis as reasoning about all possible executions! • Good fit with analysis problems of that era • Properties of local variables • Within single procedure
How Is Abstract Interpretation Holding Up? • Technical result as relevant as ever • Moore’s Law effects • Much more computing power for analysis • More complex programs • Ambitious analyses • Heap properties • Multiple threads • Interprocedural partial program analyses • Stretch intuitive vision of analysis as execution
Outline • Combined pointer and escape analysis • Rationale behind design decisions • Alternative choices in design space • Challenges and Predictions • Bigger Picture
Goal of Pointer Analysis • Characterize objects to which pointers point • Synthesize finite set of object representatives • Derive representative(s) each pointer points to r = p.f; p f r “p.f points to a object, so after the execution of r = p.f, r may point to a object, but not to a , , or object”
Our Pointer Analysis Goals • Accurate for multithreaded programs • Compositional, partial program analysis • Analyze each procedure once • Independently of callers • May skip analysis of invoked procedures • Why? • Parts of program unavailable (different language, not written yet) • Parts may be irrelevant for desired result
Analysis Abstraction Basic abstraction Is Points-to Graph • Nodes represent objects in heap • Edges represent references in heap f p f f q f u
Two Kinds of Edges • Inside edges (solid) – represent references created inside analyzed part of program • Outside edges (dashed) – represent references created outside analyzed part of program f p f f q f u
Two Kinds of Nodes • Inside nodes (solid) – represent objects created inside analyzed part of program • Outside nodes (dashed) – represent objects • Created outside analyzed part of program, or • Accessed via edges created outside analyzed part of program f p f f q f u
Key Question What does the heap look like when the procedure begins its execution? • Previous algorithms analyzed callers before callees, so model of heap always available • Unfortunately, this approach requires analysis of entire program in top-down fashion • Our solution: use code to reconstruct what (accessed part of) heap must look like
Analysis In Example m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q
Analysis In Example m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q s
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f q s
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s One option – continue to expand graph But the analysis may never terminate…
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s Instead have one outside node per load statement Represents all objects loaded at that statement Bounds graph and guarantees termination
Consequences of This Decision • Multiple objects represented by single node (load node in loop) • But can also have single object represented by multiple nodes in graph (!!) (object loaded at multiple statements) f do a = q.f; until (a = null); do b = q.f; until (b = null); f q f f
Consequences of This Decision • Form of points-to graph depends on program • Programs with identical behavior but different graphs… f f p p f r r f f f f q q s s do s = s.f; until (s = null); s = s.f; while (s != null) s = s.f
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s t
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f f q s t
Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f f q s t u
What Does Result Tell Us? • Nodes (outside) • Created outside analyzed part of program • Incomplete information • Nodes (inside, escaped) • Created inside analyzed part of program • But reachable from unanalyzed part of program • Incomplete information f p r f f f q s t u • Nodes (inside, captured) • Created inside analyzed part of program • Unreachable from unanalyzed part of program • Complete information about referencing relationships!
Crucial Distinction • Escaped vs. Captured • Enables analysis to identify regions of heap where it has complete information • Crucial for both • Accuracy of analysis • Effective use of analysis results f p r f f f q s t u
Multiple Calling Contexts f • Two Key Assumptions • p and q refer to different objects • Parallel threads may access objects p r f f f q m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } s t
f p r f f f q s t Multiple Calling Contexts What if p and q refer to the same object? (i.e. p and q aliased) m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } r f f p f f q s t
Multiple Calling Contexts f p What if p and q refer to the same object and there are no parallel threads? r f f f q m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } s t r f f p f f q s t
Multiple Calling Contexts What if p and q refer to the same object and there are no parallel threads? m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } r p f f q s t
Issues • Substantially different results for different calling contexts • But caller is unavailable at analysis time… • New analysis for each possible context? • Lots of contexts… • Most of which probably won’t be needed…
r p f f q s t Our Solution f p • Analyze assuming • Distinct parameters • Parallel threads • Aliased parameters at caller? Merge nodes… • No parallel threads? Remove outside edges and nodes… r f f f q s t r f f p f f q s t
Solution Is Not Perfect • Specialization can lose precision – can have two procedures such that when analyzed with • Distinct parameters – same analysis result • Aliased parameters - different analysis result • Conceptually complex analysis • Think about all contexts during analysis • Start to lose intuition of analysis as execution • Difficult time applying abstract interpretation framework
V – concrete values A – abstract values - abstraction function - concretization function Abstract Interpretation and Analysis Abstract interpretation is parameterized framework ta a1 a2 tv v1 v2
Applying Framework • A – points-to graphs • V – concrete heaps • - points-to graph for a given heap • Points-to graph depends on program • Need to augment heap with access history • - all heaps that correspond to points-to graph • OK, I give up…
Correctness Proof • Inductively construct a relation between • Objects in heap • Nodes that represent objects • Invariants that characterize • Transfer function • Takes points-to graph and • Give new points-to graph and • Prove that transfer functions preserve invariants
Threads and Abstract Interpretation • Philosophy of Abstract Interpretation • Come up with a decent abstraction • Execute program on that abstraction • Problem with threads • Execution usually modeled as interleaving • Too many interleavings!
Our Solution Points-to graphs explicitly represent all possible interactions between parallel threads Basic Analysis Approach • Analyze each thread in isolation • To compute combined effect of multiple threads • Retrieve result for each thread • Compute interactions that may occur Outside edges Interactions in which one thread reads a reference created by parallel thread Inside Edges Interactions in which one thread creates a reference read by parallel thread
Interthread Analysis n(p,q) || m(p,q)
p Interthread Analysis n(p,q) || m(p,q) p q q Retrieve points-to graph from analysis of each thread
p if may represent same object as A B A B Interthread Analysis n(p,q) || m(p,q) p q q Establish correspondence between nodes Start with parameter nodes
p Interthread Analysis n(p,q) || m(p,q) p q q • Compute Interactions Between Threads • Match inside and outside edges • For each outside node, compute nodes in other graph that it represents
p Interthread Analysis n(p,q) || m(p,q) p q q • Compute Interactions Between Threads • Match inside and outside edges • For each outside node, compute nodes in other graph that it represents
p p Interthread Analysis n(p,q) || m(p,q) p q q • Use computed representation relationship to • combine graphs and • obtain single graph for the execution of both threads q
Property of Analysis • Flow-sensitive within each thread (if reorder statements, get different result) • Flow-insensitive between threads • Assumes interactions can happen • Any number of times • In any order • Analysis models interactions that can’t actually happen in any interleaved execution
a a b b c c Imprecision Due To Flow Insensitivity n(a,b,c) { 1:p=b.f p.f=a 2:a.f=b } m(a,c) { 3:q=a.f 4:q.f=c } || Interthread Analysis Result Execution Order Required to Produce Blue Edge a 1 3 b 2 4 c
Initially: y=1 x=0 Thread 1 Thread 2 y=0 z = x+y x=1 What is value of z?
Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z = x+y z = 1
Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z can be 0 or 1 z = x+y z = 1
Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 INCORRECT REASONING! z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z can be 0 or 1 z = x+y z = 1