600 likes | 706 Views
The PPA Algorithm. Jeff Da Silva September 10 th , 2004. *A = ~. ~ = *B. The Pointer Alias Analysis Problem. Statically decide for any pair of pointers, at any point in the program, whether two pointers point to the same memory location. *A = ~ ~ = *B. Pointer Analysis Issues.
E N D
The PPA Algorithm Jeff Da Silva September 10th, 2004
*A = ~ ~ = *B The Pointer Alias Analysis Problem • Statically decide for any pair of pointers, at any point in the program, whether two pointers point to the same memory location. *A = ~ ~ = *B
Pointer Analysis Issues • Scalability vs. Accuracy • Generally, a ‘difficult’ tradeoff exists between: • the amount of computation and memory required vs. • the accuracy of the analysis. • Precision/Efficiency tradeoff, where is the sweet spot? • Which metric should be used? • Direct metric • Report performance applied to an optimization • Dynamically measure false positives • Which benchmark suite? • Are the results reproducible?
Pointer Analysis Issues • Complications associated with pointer arithmetic, casting, function pointers, long jumps, and multithreaded applications. • Can these be ignored? • Different pointer analysis uses have different needs. • A universal pointer analysis probably doesn’t exist.
Pointer Analysis Design Choices • Flow sensitivity • Context sensitivity • Heap modeling • Aggregate modeling • Alias representation • Whole program • Incremental compilation
Probabilistic Pointer Analysis (PPA) • Polynomial Time Complexity (guessing) • Inaccurate – many false ‘maybe’ outputs, but provides approximate probability metric • Does not require entire program • Memory Required: yet to be determined • Scalable Accuracy/Efficiency Tradeoff • Doubly Exponential • Accurate – very few ‘maybe’ outputs (control deps/runtime) • Requires Entire Program Info • Memory Required: Oodles • Does not scale well Chen, et al: Only Other PPA Address-taken Steensgaard SPAN BDD based • Linear Time Complexity • Inaccurate - many false ‘maybe’ outputs • Memory Required: Negligible
PPA Algorithm Objectives • An Interprocedural,Flow Sensitive, Context Sensitive/Mergedapproach that uses Transfer Functions. • Must be scalable and should require less space and time than any traditional analysis. • Provide an approximate probability for the ‘Maybe’ output.
Design Choices (tentative) • Flow sensitivity: flow sensitive • Context sensitivity: context merged • Heap modeling: allocation site • Aggregate modeling: arrays aggregated, structs separated • Alias representation: points-to • Whole program: not required • Incremental compilation: limited support
Speculative Parallelizing (TLS) Compiler Probabilistic Dependence Analysis How is Probabilistic Pointer Analysis used? Probabilistic Pointer Analysis Dynamic Profiling Speculative Parallelized Executable Source Code
*A = ~ ~ = *B The Probabilistic Pointer Analysis (PPA) Problem Probabilistic Pointer Analysis (PPA): For any pair of pointers, at any point in the program, statically estimate the probability that two pointers point to the same memory location. *A = ~ ~ = *B
x y z r s t a b c The Traditional Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c;
x y z r s t a b c The Traditional Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; if(…) x=&s; s=&c;
x y z r s t a b c The Traditional Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; if(…) x=&s; s=&c; r=&b;z=&r;
x y z r s t a b c The Traditional Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; if(…) x=&s; s=&c; r=&b; z=&r; if(…) y = x;
x y z r s t a b c The Traditional Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; if(…) x=&s; s=&c; r=&b; z=&r; if(…) y = x; *x = &a;
x y z r s t a b c The Probabilistic Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; 1.0 1.0 1.0 1.0 1.0 1.0
x y z r s t a b c The Probabilistic Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; if(…) /*60% taken*/ x=&s; s=&c; 0.4 0.6 0.6 0.4
x y z r s t a b c The Probabilistic Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; if(…) /*60% taken*/ x=&s; s=&c; r=&b;z=&r; 0.6 0.4 0.6 0.4
y z x r s t a b c The Probabilistic Points-To Graph int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; if(…) /*60% taken*/ x=&s; s=&c; r=&b; z=&r; if(…) /*10% taken*/ y = x; 0.04 0.96 0.6 0.4 0.6 0.4
The Probabilistic Points-To Graph y int a, b, c; int *r, *s, *t; int **x, **y, **z; x=&r; y=&s; z=&t; r=&a; s=&b; z=&c; if(…) /*60% taken*/ x=&s; s=&c; r=&b; z=&r; if(…) /*10% taken*/ y = x; *x = &a; x z 0.04 0.6 0.96 0.4 r s t 0.4 0.16 0.4 0.24 0.6 0.6 0.6 a b c What is the probability that **y points to a?
y x z 0.04 0.6 0.96 0.4 r s t 0.4 0.16 0.24 0.6 0.6 a b c A Probabilistic Points-To Matrix
My PPA Algorithm PPA Algorithm Goal: • For each program point generate a probabilistic points-to graph that specifies, for each pointer, the set of probabilities that it points to each location.
Definition: Probability Analysis • Let <p,v> denote a points-to relationship from a pointer p to a location v. • At every static program point s there exists a probability function P(s, <p,v>) that denotes the probability that p points to v during dynamic program execution. P(s, <p,v>) = D(s, <p,v>) / D(s) • Where D(s) is the number of times s is (expected to be) dynamically visited and D(s, <p,v>) is the number of times that the points-to relation <p,v> dynamically holds.
Conservative Probability • My algorithm is a may alias conservative analysis. • A probability of 0.0 [P(s,<p,v>) = 0.0] indicates that a points-to relation <p,v> will never hold. • The converse is not true. • A probability of 1.0 [P(s,<p,v>) = 1.0] indicates that a points to relation <p,v> will always hold. • The converse is not necessarily true: a dynamic points-to relationship <p,v> that always exists may not be reported with a probability of 1.0.
Location Sets • Each node in the graph is implemented with a location set, which is a triple of the form <name, offset, stride>consisting of: a variable name that describes the memory block, an offset within that block and a stride that characterizes the recurring structure of data vectors (in bytes). struct ds { int e,f,g; } … int a; struct ds b; int c[100] struct ds d[100]; Aggregate modeling: arrays aggregated, structs separated
Special Location Sets • Each dynamic memory allocation site has its own name. Eg: the location set that represents a field f in a structure dynamically located at site s is <s, f, 0>. • Additional Location Sets • UND: undefined • UNK: unknown • NULL: C null
Basic Pointer Assignment Transformations Ignoring pointer arithmetic and casting for now.
PPA • Let Xs represent the probabilistic points-to graph/matrix at a specific program point s. XIN Basic pointer assignment instruction XOUT • Claim: There exists a transformation function T(X) for every instruction i, such that XOUT = Ti(XIN).
Linear Transformations • A transformation T(X) is linear iff the following relationships hold for all points-to matrices U and V: • T(U+V) = T(U) + T(V) • T(cU) = cT(U) • If TB and TA are linear transformations represented by the matrices B and A respectively, then: • TB(TA(X)) = [B][A][X]
Linear Points-To Representation • A points-to matrix is used to represent the points-to graph. • Matrix row/column labeling: • Locations sets are denoted with L<id> • Pointers are denoted with P<id> • Rules for linearity: • Pointers can only point to Location sets • Location sets always point to themselves with probability 1.0 • All rows sum to 1.0
a b tmp a b tmp x y UND NULL UNK allocL6 Linear Points-To Representation int *a; /*L1, P1*/ int *b; /*L2, P2*/ int x[N]; /*L3*/ int y[N]; /*L4*/ int *tmp; /*L5, P5*/ … ~ = (int*)calloc(N, sizeof(int)) /*L6*/;
L1 L2 L5 P1 P2 P5 Linear Points-To Representation int *a; /*L1, P1*/ int *b; /*L2, P2*/ int x[N]; /*L3*/ int y[N]; /*L4*/ int *tmp; /*L5, P5*/ … ~ = (int*)calloc(N, sizeof(int)) /*L6*/; UND NULL UNK L3 L4 L6
Points-To Matrix int *a; /*L1, P1*/ int *b; /*L2, P2*/ int x[N]; /*L3*/ int y[N]; /*L4*/ int *tmp; /*L5, P5*/ … ~ = (int*)calloc(N, sizeof(int)) /*L6*/;
The Transformation Matrix • For every Basic Pointer Assignment there exists a linear transformation matrix T such that: XOUT = TXIN XIN Basic pointer assignment instruction XOUT
The Pointer Assignment Operation MATLAB code: % PPA_ptra: Probabilistic Pointer Analysis pointer assignment function % Returns the PPA ptr assignment transformation matrix function T = PPA_ptra(ptr, loc, N) T = eye(N); T(ptr,ptr) = 0.0; T(ptr,loc) = 1.0;
Pointer Assignment Example int *a; /*L1, P1*/ int *b; /*L2, P2*/ int x[N]; /*L3*/ int y[N]; /*L4*/ int *tmp; /*L5, P5*/ tmp = a; S1: P5 -> P1; T(P5->P1) = TS1 = eye(12); Ts1(P5,P5) = 0.0 Ts1(P5,P1) = 1.0
Pointer Assignment Example int *a; /*L1, P1*/ int *b; /*L2, P2*/ int x[N]; /*L3*/ int y[N]; /*L4*/ int *tmp; /*L5, P5*/ a = x; S2: P1 -> L3; T(P1->L3) = TS2 = eye(12); Ts2(P1,P1) = 0.0 Ts2(P1,L3) = 1.0
Combining Transformation Matrices XOUT = T2 T1 XIN XIN T1: Basic pointer assignment instruction T2: Basic pointer assignment instruction XOUT
Combining Pointer Assignment Example int *a; /*L1, P1*/ int *b; /*L2, P2*/ int x[N]; /*L3*/ int y[N]; /*L4*/ int *tmp; /*L5, P5*/ void swap { tmp = a; a = b; b = tmp; } S1: P5 -> P1 S2: P1 -> P2 S3: P2 -> P5 Tswap = TS3 TS2 TS1
L5 L2 L1 L1 L2 L5 P2 P1 P5 P2 P1 P5 L3 L3 L4 L4 L6 L6 UND UND NULL NULL UNK UNK Combining Pointer Assignment Example Tswap
L2 L5 L1 L1 L2 L5 P2 P1 P5 P2 P1 P5 L3 L3 L4 L4 L6 L6 UND UND NULL NULL UNK UNK Combining Pointer Assignment Example Tswap 0.3 0.9 0.9 0.1 0.7 0.7 0.1 0.3 0.9 0.1
q p r A N Control flow and loops • Loops are found and back edges are labeled with there back edge count. [assume all loops have constant trip count for now] • Denoted with a capital letter • All other edges are labeled with there basic block fan-in probability that sums to 1. • Denoted with a small case letter
= TA A A = TC [pTBTA + qTA] B q p C The Effect of Control Flow
A = TD [pTB + qTC] TA B C q p D The Effect of Control Flow
A … B1 B2 Bn p2 p1 pn C = Tc [p1TB1 + p2TB2 + … + pnTBn] TA The Effect of Control Flow
Example int *a; /*L1, P1*/ int *b; /*L2, P2*/ int x[N]; /*L3*/ int y[N]; /*L4*/ int *tmp; /*L5, P5*/ void might_alias { if(!RANDOM(10)) a = b; } BB1: if() /*0.1*/ BB2: S1: P1 -> P2 fi BB3: Tmight_alias = TBB3 [0.1 TBB2 TBB1 + 0.9 TBB1]
= TA[TA]N = [TA]N+1 A N Loops – Constant Trip Count = [ TB [TA]N+1 ]M+1 A N M B
Loop Transformation types • Identity • Converges • Periodic • Converges and Periodic for(i=0;i<N;i++) { swap(); } for(i=0;i<N;i++) { if(RANDOM(10)) { a = b; swap(); } } for(i=0;i<N;i++) { if(!RANDOM(10)) { a = b; } } If Odd If Even
= 1/(N+1) [ [TA]0 + [TA]1 + … + [TA]N] ] A N Loops – Non-Constant Trip Count Geometric Series Transform [gstr] operation = gstr(TA, 0, N)