Abstract Interpretation: Future Program Analysis Problems

Abstract Interpretation and Future Program Analysis Problems Martin Rinard Alexandru Salcianu Laboratory for Computer Science Massachusetts Institute of Technology

Abstract Interpretation:The Early Years • Formal Connection Between • Sound analysis of program • Execution of program • Broader Impact • Insight that analysis is execution • Reduced need to think of analysis as reasoning about all possible executions! • Good fit with analysis problems of that era • Properties of local variables • Within single procedure

How Is Abstract Interpretation Holding Up? • Technical result as relevant as ever • Moore’s Law effects • Much more computing power for analysis • More complex programs • Ambitious analyses • Heap properties • Multiple threads • Interprocedural partial program analyses • Stretch intuitive vision of analysis as execution

Outline • Combined pointer and escape analysis • Rationale behind design decisions • Alternative choices in design space • Challenges and Predictions • Bigger Picture

Goal of Pointer Analysis • Characterize objects to which pointers point • Synthesize finite set of object representatives • Derive representative(s) each pointer points to r = p.f; p f r “p.f points to a object, so after the execution of r = p.f, r may point to a object, but not to a , , or object”

Our Pointer Analysis Goals • Accurate for multithreaded programs • Compositional, partial program analysis • Analyze each procedure once • Independently of callers • May skip analysis of invoked procedures • Why? • Parts of program unavailable (different language, not written yet) • Parts may be irrelevant for desired result

Analysis Abstraction Basic abstraction Is Points-to Graph • Nodes represent objects in heap • Edges represent references in heap f p f f q f u

Two Kinds of Edges • Inside edges (solid) – represent references created inside analyzed part of program • Outside edges (dashed) – represent references created outside analyzed part of program f p f f q f u

Two Kinds of Nodes • Inside nodes (solid) – represent objects created inside analyzed part of program • Outside nodes (dashed) – represent objects • Created outside analyzed part of program, or • Accessed via edges created outside analyzed part of program f p f f q f u

Key Question What does the heap look like when the procedure begins its execution? • Previous algorithms analyzed callers before callees, so model of heap always available • Unfortunately, this approach requires analysis of entire program in top-down fashion • Our solution: use code to reconstruct what (accessed part of) heap must look like

Analysis In Example m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p q

Analysis In Example m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r q s

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f q s

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s One option – continue to expand graph But the analysis may never terminate…

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s Instead have one outside node per load statement Represents all objects loaded at that statement Bounds graph and guarantees termination

Consequences of This Decision • Multiple objects represented by single node (load node in loop) • But can also have single object represented by multiple nodes in graph (!!) (object loaded at multiple statements) f do a = q.f; until (a = null); do b = q.f; until (b = null); f q f f

Consequences of This Decision • Form of points-to graph depends on program • Programs with identical behavior but different graphs… f f p p f r r f f f f q q s s do s = s.f; until (s = null); s = s.f; while (s != null) s = s.f

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f q s t

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f f q s t

Analysis In Example f m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } p r f f f q s t u

What Does Result Tell Us? • Nodes (outside) • Created outside analyzed part of program • Incomplete information • Nodes (inside, escaped) • Created inside analyzed part of program • But reachable from unanalyzed part of program • Incomplete information f p r f f f q s t u • Nodes (inside, captured) • Created inside analyzed part of program • Unreachable from unanalyzed part of program • Complete information about referencing relationships!

Crucial Distinction • Escaped vs. Captured • Enables analysis to identify regions of heap where it has complete information • Crucial for both • Accuracy of analysis • Effective use of analysis results f p r f f f q s t u

Multiple Calling Contexts f • Two Key Assumptions • p and q refer to different objects • Parallel threads may access objects p r f f f q m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } s t

f p r f f f q s t Multiple Calling Contexts What if p and q refer to the same object? (i.e. p and q aliased) m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } r f f p f f q s t

Multiple Calling Contexts f p What if p and q refer to the same object and there are no parallel threads? r f f f q m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } s t r f f p f f q s t

Multiple Calling Contexts What if p and q refer to the same object and there are no parallel threads? m(p, q) { r = new C(); p.f = r; s = q; do s = s.f; until (s = null); t = new C(); s.f = t; u = new C(); } r p f f q s t

Issues • Substantially different results for different calling contexts • But caller is unavailable at analysis time… • New analysis for each possible context? • Lots of contexts… • Most of which probably won’t be needed…

r p f f q s t Our Solution f p • Analyze assuming • Distinct parameters • Parallel threads • Aliased parameters at caller? Merge nodes… • No parallel threads? Remove outside edges and nodes… r f f f q s t r f f p f f q s t

Solution Is Not Perfect • Specialization can lose precision – can have two procedures such that when analyzed with • Distinct parameters – same analysis result • Aliased parameters - different analysis result • Conceptually complex analysis • Think about all contexts during analysis • Start to lose intuition of analysis as execution • Difficult time applying abstract interpretation framework

V – concrete values A – abstract values  - abstraction function  - concretization function Abstract Interpretation and Analysis Abstract interpretation is parameterized framework ta a1 a2     tv v1 v2

Applying Framework • A – points-to graphs • V – concrete heaps •  - points-to graph for a given heap • Points-to graph depends on program • Need to augment heap with access history •  - all heaps that correspond to points-to graph • OK, I give up…

Correctness Proof • Inductively construct a relation  between • Objects in heap • Nodes that represent objects • Invariants that characterize  • Transfer function • Takes points-to graph and  • Give new points-to graph and  • Prove that transfer functions preserve invariants

Threads and Abstract Interpretation • Philosophy of Abstract Interpretation • Come up with a decent abstraction • Execute program on that abstraction • Problem with threads • Execution usually modeled as interleaving • Too many interleavings!

Our Solution Points-to graphs explicitly represent all possible interactions between parallel threads Basic Analysis Approach • Analyze each thread in isolation • To compute combined effect of multiple threads • Retrieve result for each thread • Compute interactions that may occur Outside edges Interactions in which one thread reads a reference created by parallel thread Inside Edges Interactions in which one thread creates a reference read by parallel thread

Interthread Analysis n(p,q) || m(p,q)

p Interthread Analysis n(p,q) || m(p,q) p q q Retrieve points-to graph from analysis of each thread

p if may represent same object as A B A B Interthread Analysis n(p,q) || m(p,q) p q q Establish correspondence between nodes Start with parameter nodes

p Interthread Analysis n(p,q) || m(p,q) p q q • Compute Interactions Between Threads • Match inside and outside edges • For each outside node, compute nodes in other graph that it represents

p p Interthread Analysis n(p,q) || m(p,q) p q q • Use computed representation relationship to • combine graphs and • obtain single graph for the execution of both threads q

Property of Analysis • Flow-sensitive within each thread (if reorder statements, get different result) • Flow-insensitive between threads • Assumes interactions can happen • Any number of times • In any order • Analysis models interactions that can’t actually happen in any interleaved execution

a a b b c c Imprecision Due To Flow Insensitivity n(a,b,c) { 1:p=b.f p.f=a 2:a.f=b } m(a,c) { 3:q=a.f 4:q.f=c } || Interthread Analysis Result Execution Order Required to Produce Blue Edge a 1 3 b 2 4 c

Weak Memory Consistency Models

Initially: y=1 x=0 Thread 1 Thread 2 y=0 z = x+y x=1 What is value of z?

Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z = x+y z = 1

Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z can be 0 or 1 z = x+y z = 1

Initially: y=1 x=0 Three Interleavings z = x+y y=0 Thread 1 Thread 2 INCORRECT REASONING! z = x+y y=0 y=0 x=1 x=1 z = x+y z = 0 z = 1 x=1 y=0 What is value of z? x=1 z can be 0 or 1 z = x+y z = 1

Abstract Interpretation: Future Program Analysis Problems

Abstract Interpretation: Future Program Analysis Problems

Presentation Transcript

Introduction to Abstract Interpretation

Abstract Interpretation and Predicate Abstraction

Basic abstract interpretation theory

Spring 2014 Program Analysis and Verification Lecture 10: Abstract Interpretation II

Spring 2014 Program Analysis and Verification Lecture 13: Abstract Interpretation V

From Program slicing to Abstract Interpretation

Spring 2014 Program Analysis and Verification Lecture 11: Abstract Interpretation III

Sparse Abstract Interpretation

Static Analysis with Abstract Interpretation

Spring 2014 Program Analysis and Verification Lecture 9: Abstract Interpretation I

Spring 2014 Program Analysis and Verification Lecture 12: Abstract Interpretation IV

Iterative Program Analysis Abstract Interpretation

Iterative Program Analysis Abstract Interpretation

Analysis and Interpretation

Abstract interpretation

Iterative Program Analysis Abstract Interpretation

Radar Interpretation Problems

Purity Analysis : Abstract Interpretation Formulation

Abstraction and Abstract Interpretation

Logical Abstract Interpretation

Analysis and Interpretation

Program Analysis using Random Interpretation