420 likes | 558 Views
Pointer Analysis. G. Ramalingam Microsoft Research, India. A Constant Propagation Example. x = 3; y = 4; z = x + 5;. x is always 3 here can replace x by 3 and replace x+5 by 8 and so on. A Constant Propagation Example With Pointers. x = 3; *p = 4; z = x + 5;.
E N D
Pointer Analysis G. Ramalingam Microsoft Research, India
A Constant Propagation Example x = 3; y = 4; z = x + 5; • x is always 3 here • can replace x by 3 • and replace x+5 by 8 • and so on
A Constant Propagation ExampleWith Pointers x = 3; *p = 4; z = x + 5; • Is x always 3 here?
A Constant Propagation ExampleWith Pointers p = &y; x = 3; *p = 4; z = x + 5; if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5; p = &x; x = 3; *p = 4; z = x + 5; pointers affect most program analyses x is always 3 x is always 4 x may be 3 or 4 (i.e., x is unknown in our lattice)
A Constant Propagation ExampleWith Pointers p = &y; x = 3; *p = 4; z = x + 5; if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5; p = &x; x = 3; *p = 4; z = x + 5; p always points-to y p always points-to x p may point-to x or y
Points-to Analysis • Determine the set of targets a pointer variable could point-to (at different points in the program) • “p points-to x” == “*p has l-value x” • targets could be variables or locations in the heap (dynamic memory allocation) • p = &x; • p = new Foo(); or p = malloc (…); • must-point-to vs. may-point-to
A Constant Propagation ExampleWith Pointers Can *p denote the same location as *q? *q = 3; *p = 4; z = *q + 5; what values can this take?
More Terminology • *p and *q are said to be aliases (in a given concrete state) if they represent the same location • Alias analysis • determine if a given pair of references could be aliases at a given program point • *pmay-alias *q • *pmust-alias*q
Pointer Analysis • Points-To Analysis • may-point-to • must-point-to • Alias Analysis • may-alias • must-alias
Points-To Analysis:A Simple Example p = &x; q = &y; if (?) { q = p; } x = &a; y = &b; z = *q;
Points-To Analysis:A Simple Example p = &x; q = &y; if (?) { q = p; } x = &a; y = &b; z = *q;
Points-To Analysis x = &a; y = &b; if (?) { p = &x; } else { p = &y; } *x = &c; *p = &c; x: a x: {a,c} x: a y: {b,c} y: b y: b p: {x,y} p: {x,y} p: {x,y} a: c a: c a: null How should we handle this statement? Weak update Weak update Strong update
Questions • When is it correct to use a strong update? A weak update? • Is this points-to analysis precise? • What does it mean to say • p must-point-to x at pgm point u • p may-point-to x at pgm point u • p must-not-point-to x at u • p may-not-point-to x at u
Points-To Analysis, Formally • We must formally define what we want to compute before we can answer many such questions
Static Program Analysis • A static program analysis computes approximateinformation about the runtime behavior of a given program • The set of valid programs is defined by the programming language syntax • The runtime behavior of a given program is defined by the programming language semantics • The analysis problem defines whatinformation is desired • The analysis algorithm determines what approximation to make
Programming Language: Syntax • A program consists of • a set of variables Var • a directed graph (V,E,entry) with a distinguished entry vertex, with every edge labelled by a primitive statement • A primitive statement is of the form • x = null • x = y • x = *y • x = &y; • *x = y • skip (where x and y are variables in Var) • Omitted (for now) • Dynamic memory allocation • Pointer arithmetic • Structures and fields • Procedures
Example Program 1 Vars = {x,y,p,a,b,c} x = &a; y = &b; if (?) { p = &x; } else { p = &y; } *x = &c; *p = &c; x = &a 2 y = &b 3 p = &x p = &y 4 5 skip skip 6 *x = &c 7 *p = &c 8
Programming Language:Operational Semantics • Operational semantics == an interpreter (defined mathematically) • State • Data-State ::= Var -> (Var U {null}) • PC ::= V (the vertex set of the CFG) • Program-State ::= PC x Data-State • Initial-state: • (entry, \x. null)
Example States Vars = {x,y,p,a,b,c} Initial data-state 1 x: N, y:N, p:N, a:N, b:N, c:N x = &a 2 Initial program-state y = &b x: N, y:N, p:N, a:N, b:N, c:N <1, > 3 p = &x p = &y 4 5 skip skip 6 *x = &c 7 *p = &c 8
Programming Language:Operational Semantics • Meaning of primitive statements • CS[stmt] : Data-State -> Data-State • CS[ x = y ] s = s[x s(y)] • CS[ x = *y ] s = s[x s(s(y))] • CS[ *x = y ] s = s[s(x) s(y)] • CS[ x = null ] s = s[x null] • CS[ x = &y ] s = s[x y] must say what happens if null is dereferenced
Programming Language:Operational Semantics • Meaning of program • a transition relation on program-states • Program-State X Program-State • state1state2means that the execution of some edge in the program can transform state1 into state2 • Defining • (u,s) (v,s’) iff the program contains a control-flow edge u->vlabelled with a statement stmt such that M[stmt]s = s’
Programming Language:Operational Semantics • A sequence of states s1s2 … snis said to be an execution (of the program) iff • s1is the Initial-State • sisi+1for 1 <= I < n • A state s is said to be a reachable stateiff there exists some execution s1s2 … snis such that sn= s. • Define RS(u) = { s | (u,s) is reachable }
Programming Language:Operational Semantics • A sequence of states s1s2 … snis said to be an execution (of the program) iff • s1 is the Initial-State • si| si+1 for 1 <= I < n • A state s is said to be a reachable state iff there exists some execution s1s2 … snis such that sn= s. • Define RS(u) = { s | (u,s) is reachable } All of this formalism for this one definition
Ideal Points-To Analysis:Formal Definition • Let u denote a vertex in the CFG • Define IdealMustPT (u) to be { (p,x) | forall s in RS(u). s(p) == x } • Define IdealMayPT (u) to be { (p,x) | exists s in RS(u). s(p) == x }
May-Point-To Analysis:Formal Requirement Specification May Point-To Analysis Compute R: V -> 2Vars’ such that R(u) IdealMayPT(u) (where Var’ = Var U {null}) For every vertex u in the CFG, compute a set R(u) such that R(u) { (p,x) | $sÎRS(u). s(p) == x }
May-Point-To Analysis:Formal Requirement Specification • An algorithm is said to be correct if the solution R it computes satisfies "uÎV. R(u) IdealMayPT(u) • An algorithm is said to be precise if the solution R it computes satisfies "uÎV. R(u)=IdealMayPT(u) • An algorithm that computes a solution R1 is said to be more precise than one that computes a solution R2 if "uÎV. R1(u) ÍR2(u) Compute R: V -> 2Vars’ such that R(u) IdealMayPT(u)
Back To OurMay-Point-To Algorithm p = &x; q = &y; if (?) { q = p; } x = &a; y = &b; z = *q;
(May-Point-To Analysis)Algorithm A • Is this algorithm correct? • Is this algorithm precise? • Let’s first completely and formally define the algorithm.
Algorithm A: A Formal DefinitionThe “Data Flow Analysis” Recipe • Define semi-lattice of abstract-values • AbsDataState ::= Var -> 2Var’ • f1 f2 = \x. (f1 (x) È f2 (x)) • bottom = \x.{} • Define initial abstract-value • InitialAbsState = \x. {null} • Define transformers for primitive statements • AS[stmt] : AbsDataState -> AbsDataState
Algorithm A: A Formal DefinitionThe “Data Flow Analysis” Recipe • Let st(v,u) denote stmt on edge v->u • Compute the least-fixed-point of the following “dataflow equations” • x(entry) = InitialAbsState • x(u) = v->u AS(st(v,u)) x(v) x(v) x(w) v w st(v,u) st(w,u) u x(u)
Algorithm A:The Transformers • Abstract transformers for primitive statements • AS[stmt] : AbsDataState -> AbsDataState • AS[ x = y ] s = s[x s(y)] • AS[ x = null ] s = s[x {null}] • AS[ x = &y ] s = s[x {y}] • AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn) • AS[ *x = y ] s = ???
Correctness & Precision • We have a complete & formal definition of the problem. • We have a complete & formal definition of a proposed solution. • How do we reason about the correctness & precision of the proposed solution?
Enter: The French Recipe(Abstract Interpretation) • Concrete Domain • Concrete states: C • Semantics: For every statement st, • CS[st] : C -> C • AbstractDomain • Asemi-lattice(A, b) • TransferFunctions • For every statement st, • AS[st] : A -> A • Concrete (Collecting) Domain • Asemi-lattice(2C, b) • TransferFunctions • For every statement st, • CS*[st] : 2C -> 2C a g 2Data-State 2Var x Var’
Points-To Analysis(Abstract Interpretation) a(Y) = { (p,x) | exists s in Y. s(p) == x } MayPT(u) a Í a RS(u) IdealMayPT(u) 2Data-State 2Var x Var’ IdealMayPT (u) = a ( RS(u) )
Approximating Transformers:Correctness Criterion c is said to be correctly approximated by a iff a(c) Ía c1 correctly approximated by a1 f c2 f# correctly approximated by a2 C A
Approximating Transformers:Correctness Criterion concretization g c1 a1 f abstraction a c2 f# a2 requirement: f#(a1) ≥a (f( g(a1)) C A
Concrete Transformers • CS[stmt] : Data-State -> Data-State • CS[ x = y ] s = s[x s(y)] • CS[ x = *y ] s = s[x s(s(y))] • CS[ *x = y ] s = s[s(x) s(y)] • CS[ x = null ] s = s[x null] • CS*[stmt] : 2Data-State -> 2Data-State • CS*[st] X = { CS[st]s | s Î X }
Abstract Transformers • AS[stmt] : AbsDataState -> AbsDataState • AS[ x = y ] s = s[x s(y)] • AS[ x = null ] s = s[x {null}] • AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn) • AS[ *x = y ] s = ???
Algorithm A: TranformersWeak/Strong Update x: &y y: &x z: &a g x: {&y} y: {&x,&z} z: {&a} x: &y y: &z z: &a f f# *y = &b; *y = &b; x: &b y: &x z: &a x: {&y,&b} y: {&x,&z} z: {&a,&b} x: &y y: &z z: &b a
Algorithm A: TranformersWeak/Strong Update x: &y y: &x z: &a g x: {&y} y: {&x,&z} z: {&a} x: &y y: &z z: &a f f# *x = &b; *x = &b; x: &y y: &b z: &a x: {&y} y: {&b} z: {&a} x: &y y: &b z: &a a
Algorithm A: TransformersWeak/Strong Update • Transformer for “*p = q”