Title Page

Title Page Pointer Analysis for Programs with Structures and Casting Suan Hsi Yong, Susan Horwitz, Thomas Reps University of Wisconsin-Madison

Intro: pointer analysis Pointer Analysis • Finds locations to which a pointer may point • Needed for static analyses • e.g. constant propagation, slicing • Precision of pointer analysis affects precision of subsequent analyses • smaller points-to set  more precise • factors: flow-sensitivity, context-sensitivity, treatment of aggregate objects...

Our Approach • Develop a pointer-analysis framework for distinguishing fields of structures • task is complicated by ability to type cast in C • examine the tradeoffs between precision and portability • ideas apply to both flow-sensitive and flow-insensitive analysis

No-structure rules 1 x = &y;  Prog points-to(x,y) w = x;  Prog, points-to(x,y) points-to(w,y) statement rule effect x = &y; x y w = x; w

Collapse Always example “Collapse Always” Approach struct { int * s1; int * s2; } s; int i, j; int * p; s.s1 = &i; s.s2 = &j; p = s.s1;  s = &i;  s = &j;  p = s;  points-to(s,i)  points-to(s,j)  points-to(p,i) points-to(p,j)

No-cast rules 1 Handling Structures x = &s.a; s : x x = &s.a;  Prog a : points-to(x,s.a) x = &((*p).a); x = &((*p).a);  Prog, p s : points-to(p,s) a : points-to(x,s.a) x

No-cast rules 2 Handling Structures s = *p; s = *p;  Prog, points-to(p,b), points-to(b.x,a) p b x1 points-to(s.x,a) x2 x3 a s x1 x2 x3

s = *p with casting s b x1 x1 x2 x2 x3 x3 b y1 y2 y3 ? y4 What Happens With Casting? s = *p; s = *p;  Prog, points-to(p,b), points-to(b.y,a) p points-to(s.?,a) a

s = *p using offsets b 0 4 8 12 s 0 4 8 One Approach: Use Field Offsets s = *p; s = *p;  Prog, points-to(p,b), points-to(b.n,a) p points-to(s.n,a) a • But: • Offsets are compiler-specific • May not be available

Abstract collapse always collapse on cast initial sequence offsets (not portable) least precise most precise Contributions • Identify problems specific to structures and casting in pointer analysis • Introduce a pointer-analysis framework that handles structures and casting with different levels of precision, efficiency, and portability • Present experimental results showing that i)distinguishing fields of structures is important ii) there is very little penalty for portability

C-specs on structures Layout of structs in ANSI C 1) The first field of a structure is at offset 0 i.e. the address of the first field of a structure is the same as the address of the structure 2) The common initial sequence of fields with compatible types in two structures are guaranteed to line up struct S { int s1; char s2; float s3; int s4; }; struct T { int t1; char t2; int * t3; int t4; };

Problem: first field Problems Introduced by Casting 1. “aliasing problem” with the first field(s) of structures struct S { struct T { int * t1; } t; } s; void * p; } p = &s; p = &s.t; p = &s.t.t1; equivalent assignments

Problem: first field 2 Problems Introduced by Casting 1. “aliasing problem” with the first field(s) of structures struct S { struct T { int * t1; } t; } s; void * p;  points-to(p,s)? points-to(p,s.t)? points-to(p,s.t.t1)? p = &s;

Solution: normalize Solution normalize each variable to its “innermost first field” struct S { struct T { int * t1; } t; } s; void * p;  points-to(p,normalize(s))  points-to(p,s.t.t1) p = &s;

Normalize: maps objects with same address In general, normalize can be any function that maps variables to some representative object e.g. normalize(s.a)= s(wheresis the outermost object containings.a)  “Collapse Always” approach e.g. normalize(s.a)= ‹s,offsetof(s,a)›  “Offsets” approach

Normalize: rule change example x = &s.a;  Prog points-to(x,s.a)  x = &s.a;  Prog points-to(normalize(x),normalize(s.a)) points-to(x, y)   a, b such that normalize(a) = x and normalize(b) = y, apoints tob.

Problem: (*p).a Problems Introduced by Casting 2. If p points to a type to which it isn’t declared to point, which field is accessed in the dereference (*p).a? struct S { int s1; char s2; float s3;} s; struct T { int t1; char t2; int * t3;} *p; void * q; p = (struct T *) &s; q = &((*p).t2); p struct T s t1 s1 t2 s2 t3 s3 q = &((*p).t2);  Prog, ? points-to(p,s) q points-to(q,s.t2)

Solution: lookup target : fi : f1 : f2 : Solution Introduce a function to lookup the corresponding field lookup(type, field, target) = the set of fields in target that may correspond to field in type. p:type* type : f :

q=&(*p).a rule with lookup q = &((*p).t2);  Prog, points-to(p,s), s.f  lookup(struct T,t2,s) points-to(q,s.f) q = &((*p).t2);  Prog, points-to(p,s) points-to(q,s.t2) p struct T s t1 s1  t2 s2 t3 s3 q

Problem: assigning block Problems Introduced by Casting 3. What happens when a block of memory of one type is copied into a block of memory of a different type? struct S { int *y1; char *y2; float *y3;} s; struct T { int *x1; char *x2; int *x3; } t; void * p = &t; s = *p; p s = *p;  Prog, s t y1 x1 points-to(p,t),  y2 x2 a points-to(t.x,a) y3 x3 points-to(s.x,a)

Solution: resolve Solution Introduce a function resolve to match corresponding fields in two structures resolve(obj1, obj2, type) = the set of pairs obj1.f, obj2.f ’ where f is a field in obj1 and f ’is the correspond field in obj2 obj1 obj2 obj1.f1 , obj2.f1’ obj1.fn , obj2.fm’ obj1.fn , obj2.fn’ f1 f1’ : : fm’ fn fn’

s = *p rule with resolve t f1’ f2’ f3’ s = *p;  Prog, points-to(p,t), s.f,t.f’ resolve(s,t,ts), s f1 points-to(t.f’,a) f2 points-to(s.f,a) f3 s = *p; s = *p;  Prog, p t x1 points-to(p,t), x2 points-to(t.x,a) x3 points-to(s.x,a) a s y1  y2 y3

First 3 rules with normalize, resolve, lookup x = &s.a;  Prog s : x a points-to(normalize(x), normalize(s.a)) : p q = &((*p).a);  Prog, t*p s : : points-to(normalize(p), s), a a’ s.a’  lookup(t*p, a, s) : : points-to(normalize(q), s.a’ ) q s = *p;  Prog, p points-to(normalize(p), t.b), s t.b : : s.a, t.a’ resolve(normalize(s),t.b,ts), a a’ : : points-to(t.a’, u.c) u.c points-to(s.a, u.c) q

Approaches: Collapse Always collapse always collapse on cast initial sequence offsets (not portable) least precise most precise 1. Collapse Always: portable, least precise normalize(s.a) = s (wheresis an “outermost object”) lookup(t, a, s) = { s } resolve(s, t, t) = {s, t } p s : q : : q = &((*p).a);

Approaches: Collapse On Cast collapse always collapse on cast initial sequence offsets (not portable) least precise most precise 2. Collapse On Cast:portable normalize(s.a) = innermost first field ofs.a lookup(t, a, s) = ifts = t then { normalize(s.a) } else { normalize(s.c)|cis a field ofs } p t s : : a a : : q q = &((*p).a);

Approaches: Collapse On Cast collapse always collapse on cast initial sequence offsets (not portable) least precise most precise 2. Collapse On Cast:portable normalize(s.a) = innermost first field ofs.a lookup(t, a, s) = ifts = t then { normalize(s.a) } else { normalize(s.c)|cis a field ofs } p t s : c1 a c2 : c3 q q = &((*p).a);

Approaches: CoC resolve collapse always collapse on cast initial sequence offsets (not portable) least precise most precise 2. Collapse On Cast  dis a field oft, resolve(s, t, t) = a, a’alookup(t, d, s), a’lookup(t, d, t)    s t t y1 d1 x1 y2 d2 x2 y3 d3 x3

Approaches: Common Initial Sequence x1:int y1:int x2:char y2:char collapse always collapse on cast initial sequence offsets (not portable) t t least precise most precise common InitSeq x3:int y3:int* x4:int y4:int 3. Common Initial Sequence: most precise portable approach p q = &((*p).x2); q lookup(t,x2,t) = {y2}

Approaches: Common Initial Sequence x1:int y1:int x2:char y2:char collapse always collapse on cast initial sequence offsets (not portable) t t least precise most precise common InitSeq x3:int y3:int* x4:int y4:int 3. Common Initial Sequence: most precise portable approach p q = &((*p).x3); q lookup(t, x3,t) = {y3,y4}

Approaches: Offsets ‹s,0› c1 ‹s,4› c2 collapse always collapse on cast initial sequence c3 ‹s,8› offsets (not portable) least precise offsetof(t,a) = 4 most precise 4. “Offsets”:non-portable, most precise normalize(s.a) = ‹s, offsetof(s,a) › (wheresis an “outermost object”) lookup(t, a, ‹s, 0›) = {‹s, offsetof(t, a)›} p t s : a : q q = &((*p).a);

Approaches: Offsets collapse always collapse on cast initial sequence offsets (not portable) least precise most precise ‹s,8› ‹s,10› 4. “Offsets”:non-portable, most precise resolve(‹s, 0›, ‹t, 0›, t) = {‹s, k›, ‹t, k› |kis an integer in[0..sizeof(t)-1] } s t ‹s,0› ‹t,0› ‹s,4› ‹t,4› x ‹s,8› ‹t,10› y

What have we done? Experiments 1. Implemented the pointer-analysis framework in C++ using SUIF 2. Implemented the four algorithms on top of this framework. 3. Ran the four algorithms on 20 C programs (600 to 30,000 lines), and measured • size of points-to sets (precision) • time (efficiency)

Results: Points-to set sizes per deref 50 350 45 300 40 250 35 200 30 25 20 15 10 5 0 bc twig 130.li agrep football less-177 flex-2.4.7 simulator gzip-1.2.4 bison-1.2.2 124.m88ksim ispell-4.0.ispell collapse always common initial sequence collapse on cast offsets Average size of points-to set per dereference

Results: Analysis time (34.9) (9.6) (114.3) 5 4 3 2 1 0 ft ks bc twig 130.li agrep yacr2 099.go triangle football anagram ansitape less-177 flex-2.4.7 simulator gzip-1.2.4 bison-1.2.2 124.m88ksim 129.compress ispell-4.0.ispell collapse always common initial sequence collapse on cast offsets Analysis times, normalized to “offsets” times

Results: Number of points-to edges (22.9) (6.8) (11.0) 5 4 3 2 1 0 bc twig 130.li agrep football less-177 flex-2.4.7 simulator gzip-1.2.4 bison-1.2.2 124.m88ksim ispell-4.0.ispell collapse always common initial sequence collapse on cast offsets Number of points-to edges, normalized to “offsets’’

Conclusions Conclusions • Precise points-to information requires distinguishing fields of structures • Portability does not cost much in terms of time or precision

The End The End

Title Page

Title Page

Presentation Transcript

Title Page

Title Page

Title Page

Title Page

Title page

Title Page

Title Page

Title Page

Title Page

TITLE PAGE

Title Page

Title Page

Title Page

Title Page

Page Title

Title Page

Title page

Title Page

Title Page

Title Page

Title Page