Pointer Analysis Lecture 2

Pointer AnalysisLecture 2 G. Ramalingam Microsoft Research, India

Andersen’s Analysis • A flow-insensitive analysis • computes a single points-to solution valid at all program points • ignores control-flow – treats program as a set of statements • equivalent to merging all vertices into one (and applying algorithm A) • equivalent to adding an edge between every pair of vertices (and applying algo. A) • a solution R such that R IdealMayPT(u) for every vertex u

Example(Flow-Sensitive Analysis) 1 x = &a; y = x; x = &b; z = x; x = &a 2 y = x 3 x = &b 4 z = x 5

Example:Andersen’s Analysis 1 x = &a; y = x; x = &b; z = x; x = &a 2 y = x 3 x = &b 4 z = x 5

Andersen’s Analysis • Strong updates? • Initial state?

Why Flow-Insensitive Analysis? • Reduced space requirements • a single points-to solution • Reduced time complexity • no copying • individual updates more efficient • no need for joins • number of iterations? • a cubic-time algorithm • Scales to millions of lines of code • most popular points-to analysis

Andersen’s AnalysisA Set-Constraints Formulation • Compute PTx for every variable x

Steensgaard’s Analysis • Unification-based analysis • Inspired by type inference • an assignment “lhs := rhs” is interpreted as a constraint that lhs and rhs have the same type • the type of a pointer variable is the set of variables it can point-to • “Assignment-direction-insensitive” • treats “lhs := rhs” as if it were both “lhs := rhs” and “rhs := lhs” • An almost-linear time algorithm • single-pass algorithm; no iteration required

Example:Andersen’s Analysis 1 x = &a; y = x; y = &b; b = &c; x = &a 2 y = x 3 y = &b 4 b = &c 5

Example:Steensgaard’s Analysis 1 x = &a; y = x; y = &b; b = &c; x = &a 2 y = x 3 y = &b 4 b = &c 5

Steensgaard’s Analysis • Can be implemented using Union-Find data-structure • Leads to an almost-linear time algorithm

Exercise x = &a; y = x; y = &b; b = &c; *x = &d;

May-Point-To Analyses Ideal-May-Point-To ??? Algorithm A more efficient / less precise Andersen’s more efficient / less precise Steensgaard’s

Ideal Points-To Analysis:Definition Recap • A sequence of states s1s2 … snis said to be an execution (of the program) iff • s1is the Initial-State • si| si+1for 1 <= I < n • A state s is said to be a reachable stateiff there exists some execution s1s2 … snis such that sn= s. • RS(u) = { s | (u,s) is reachable } • IdealMayPT (u) = { (p,x) | $ s Î RS(u). s(p) == x } • IdealMustPT (u) = { (p,x) | " s Î RS(u). s(p) == x }

Does Algorithm A Compute The Most Precise Solution?

Ideal <-> Algorithm A • Abstract away correlations between variables • relational analysis vs. • independent attribute x: &y y: &z x: &b y: &x g x: &b x: &y y: &x y: &x a x: {&y,&b} y: {&x,&z} x: &y y: &z x: &b y: &z

Does Algorithm A Compute The Most Precise Solution?

Is The Precise Solution Computable? • Claim: The set RS(u) of reachable concrete states (for our language) is computable. • Note: This is true for any collecting semantics with a finite state space.

Computing RS(u)

Precise Points-To Analysis:Decidability • Corollary: Precise may-point-to analysis is computable. • Corollary: Precise (demand) may-alias analysis is computable. • Given ptr-exp1, ptr-exp2, and a program point u, identify if there exists some reachable state at u where ptr-exp1 and ptr-exp2 are aliases. • Ditto for must-point-to and must-alias • … for our restricted language!

Precise Points-To Analysis:Computational Complexity • What’s the complexity of the least-fixed point computation using the collecting semantics? • The worst-case complexity of computing reachable states is exponential in the number of variables. • Can we do better? • Theorem: Computing precise may-point-to is PSPACE-hard even if we have only two-level pointers.

May-Point-To Analyses Ideal-May-Point-To more efficient / less precise Algorithm A more efficient / less precise Andersen’s more efficient / less precise Steensgaard’s

Precise Points-To Analysis: Caveats • Theorem: Precise may-alias analysis is undecidable in the presence of dynamic memory allocation. • Add “x = new/malloc ()” to language • State-space becomes infinite • Digression: Integer variables + conditional-branching also makes any precise analysis undecidable.

May-Point-To Analyses Ideal (with Int, with Malloc) Ideal (with Int) Ideal (with Malloc) Ideal (no Int, no Malloc) Algorithm A Andersen’s Steensgaard’s

Dynamic Memory Allocation • s: x = new () / malloc () • Assume, for now, that allocated object stores one pointer • s: x = malloc ( sizeof(void*) ) • Introduce a pseudo-variable Vs to represent objects allocated at statement s, and use previous algorithm • treat s as if it were “x = &Vs” • also track possible values of Vs • allocation-site based approach • Key aspect: Vs represents a set of objects (locations), not a single object • referred to as a summary object (node)

Dynamic Memory Allocation:Example 1 x = new; y = x; *y = &b; *y = &a; x = new 2 y = x 3 *y = &b 4 *y = &a 5

Dynamic Memory Allocation:Object Fields • Field-sensitive analysis class Foo { A* f; B* g; } s: x = new Foo() x->f = &b; x->g = &a;

Dynamic Memory Allocation:Object Fields • Field-insensitive analysis class Foo { A* f; B* g; } s: x = new Foo() x->f = &b; x->g = &a;

Other Aspects • Context-sensitivity • Indirect (virtual) function calls and call-graph construction • Pointer arithmetic • Object-sensitivity

Andersen’s Analysis:Further Optimizations and Extensions • Fahndrich et al., Partial online cycle elimination in inclusion constraint graphs, PLDI 1998. • Rountev and Chandra, Offline variable substitution for scaling points-to analysis, 2000. • Heintze and Tardieu, Ultra-fast aliasing analysis using CLA: a million lines of C code in a second, PLDI 2001. • M. Hind, Pointer analysis: Haven’t we solved this problem yet?, PASTE 2001. • Hardekopf and Lin, The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code, PLDI 2007. • Hardekopf and Lin, Exploiting pointer and location equivalence to optimize pointer analysis, SAS 2007. • Hardekopf and Lin, Semi-sparse flow-sensitive pointer analysis, POPL 2009.

Context-Sensitivity Etc. • Liang & Harrold, Efficient computation of parameterized pointer information for interprocedural analyses. SAS 2001. • Lattner et al., Making context-sensitive points-to analysis with heap cloning practical for the real world, PLDI 2007. • Zhu & Calman, Symbolic pointer analysis revisited. PLDI 2004. • Whaley & Lam, Cloning-based context-sensitive pointer alias analysis using BDD, PLDI 2004. • Rountev et al. Points-to analysis for Java using annotated constraints. OOPSLA 2001. • Milanova et al. Parameterized object sensitivity for points-to and side-effect analyses for Java. ISSTA 2002.

Applications • Compiler optimizations • Verification & Bug Finding • use in preliminary phases • use in verification itself

Dynamic Memory Allocation:Summary Object Update 4 *y = &a 5

Abstract Transformers:Weak/Strong Update AS[stmt] : AbsDataState -> AbsDataState AS[ *x = y ] s = s[z  s(y)] if s(x) = {z} s[z1 s(z1) s(y)] if s(x) = {z1, …, zk} [z2s(z2) s(y)] (where k > 1) … [zks(zk) s(y)]

Correctness & Precision • How can we formally reason about the correctness & precision of abstract transformers? • Can we systematically derive a correct abstract transformer?

Enter: The French Recipe(Abstract Interpretation) • Concrete Domain • Concrete states: C • Semantics: For every statement st, • CS[st] : C -> C • AbstractDomain • Asemi-lattice(A, ) • TransferFunctions • For every statement st, • AS[st] : A -> A • Concrete (Collecting) Domain • Asemi-lattice(2C, ) • TransferFunctions • For every statement st, • CS*[st] : 2C -> 2C a g 2Data-State 2Var x Var’

Points-To Analysis(Abstract Interpretation) a(Y) = { (p,x) | exists s in Y. s(p) == x } MayPT(u) a Í a RS(u) IdealMayPT(u) 2Data-State 2Var x Var’ IdealMayPT (u) = a ( RS(u) )

Approximating Transformers:Correctness Criterion c is said to be correctly approximated by a iff a(c) Ía c1 correctly approximated by a1 f c2 f# correctly approximated by a2 C A

Approximating Transformers:Correctness Criterion concretization g c1 a1 f abstraction a c2 f# a2 requirement: f#(a1) ≥a (f( g(a1)) C A

Concrete Transformers • CS[stmt] : Data-State -> Data-State • CS[ x = y ] s = s[x s(y)] • CS[ x = *y ] s = s[x s(s(y))] • CS[ *x = y ] s = s[s(x) s(y)] • CS[ x = null ] s = s[x null] • CS*[stmt] : 2Data-State -> 2Data-State • CS*[st] X = { CS[st]s | s Î X }

Abstract Transformers • AS[stmt] : AbsDataState -> AbsDataState • AS[ x = y ] s = s[x s(y)] • AS[ x = null ] s = s[x {null}] • AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn) • AS[ *x = y ] s = ???

Algorithm A: TranformersWeak/Strong Update x: &y y: &x z: &a g x: {&y} y: {&x,&z} z: {&a} x: &y y: &z z: &a f f# *y = &b; *y = &b; x: &b y: &x z: &a x: {&y,&b} y: {&x,&z} z: {&a,&b} x: &y y: &z z: &b a

Algorithm A: TranformersWeak/Strong Update x: &y y: &x z: &a g x: {&y} y: {&x,&z} z: {&a} x: &y y: &z z: &a f f# *x = &b; *x = &b; x: &y y: &b z: &a x: {&y} y: {&b} z: {&a} x: &y y: &b z: &a a

Dynamic Memory Allocation:Summary Object Update 4 *y = &a 5

Pointer Analysis Lecture 2

Pointer Analysis Lecture 2

Presentation Transcript

Pointer Lesson 2 Outline

Pointer Analysis

Pointer Analysis

Pointer Analysis

Pointer Analysis – Part I

2. Pointer

Pointer Analysis – Part II

Context-Sensitive Pointer Analysis

Context-Insensitive Pointer Analysis

Lecture 5: Pointer

Pointer Analysis – A Survey

Pointer Analysis

Client-Driven Pointer Analysis

Pointer analysis

Pointer Analysis.

Pointer Analysis Survey.

Pointer Analysis – Part I

Probabilistic Pointer Analysis [PPA]

Pointer Analysis

Next Section: Pointer Analysis

Pointer Analysis Survey.

Pointer Analysis – Part I