310 likes | 465 Views
Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis. Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer. Microsoft Research University of Washington UC Berkeley. Motivation. Static analysis for program verification
E N D
Speeding Up Dataflow Analysis Using Flow-Insensitive Pointer Analysis Stephen Adams, Tom Ball, Manuvir Das Sorin Lerner, Mark Seigle Westley Weimer Microsoft Research University of Washington UC Berkeley
Motivation • Static analysis for program verification • Complex dataflow analyses are popular • SLAM, ESP, BLAST, CQual, … • Flow-Sensitive • Interprocedural • Expensive! • Cut down on “data flow facts” • Without losing anything important
General Idea • If complex analysis is worse than O(N) • And you have a cheap analysis that • Is O(N) • Reduces N • Then composing them saves time
Value Flow Graph (VFG) • Variant of a points-to graph • Encodes the flow of values in the program • Conservative approximation • Lightweight, fast to compute and query • Early queries can safely reduce • data-flow facts considered • program points considered • Like slicing a program wrt. value flow
Computing a VFG • Use a subtyping-based pointer analysis • We used One-Level Flow [Das] • Process all assignments • Not just those involving pointers • Represent constant values explicitly • Put them in the graph • Label graph with source locations • Encodes program slices
Example Points-To Graph 1: int a, *x; 2: x = &a; 3: *x = 7; x Points-to Edge a Source “Address” Node x Expr Node
One Level Flow Graph Flow Edge x Points-to Edge 1: int a, *x; 2: x = &a; 3: *x = 7; a Source “Address” Node x Expr Node
Value Flow Graph 2 Flow Edge x Points-to Edge 1: int a, *x; 2: x = &a; 3: *x = 7; 2 7 a Source “Address” Node x Expr Node 3 2 2,3
VFG Properties • Computed in almost-linear time • Get points-to sets from VFG in linear time • Backwards reachability via flow edges • Gather up all variables • Get value flow from VFG in linear time • Backwards reachability via flow edges • Follow points-to edges up one
VFG Query: Points-To of x 2 Flow Edge x Points-to Edge 1: int a, *x; 2: x = &a; 3: *x = 7; 2 7 a Source “Address” Node x Expr Node 3 2 2,3
VFG Query: Value Flow into a 2 Flow Edge x Points-to Edge 1: int a, *x; 2: x = &a; 3: *x = 7; 2 7 a Source “Address” Node x Expr Node 3 2 2,3
VFG Summary • Computed in almost-linear time • Queries complete in linear time • Approximates flow of values in program • Show two applications that benefit • ESP • SLAM
Application 1: ESP • Verification tool for large C++ programs • Tracks “typestate” of values • Encoded as Finite State Machine • Special Error state • Core: interprocedural data-flow engine • Flow sensitive: state at every point • Performed bottom-up on call graph • Requires function summaries
ESP Function Summaries • Consider stateful memory locations • Summarize function behavior for each loc • Reducing number of locs would be good! • But C has evil casts, so types cannot be used • Worst case set of locations: • All globals and formal parameters • Everything transitively reachable from there
Reduce Location Set • Location L needs to be considered in F if • Some exp E has its state changed in F • Value held by L at entry to F can flow into E • Assuming state-changing ops are known • Query VFG to find values that flow in
ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } Locations to consider for foo() summary: { e, *e, f, *f, g, *g, h, *h }
ESP Example FILE *e, *f, *g, *h; void foo() { FILE **p; int a = (int)h; if (…) p = &e; else p = &f; *p = fopen(…); } • Compute VFG • (2) Query value flow on *p • (3) Reduced locations to consider for foo() summary: { e, f } • (4) Reduce lines to consider for dataflow
ESP Results • FILE * output in GCC • 140 KLOC, 2149 functions, 66 files, 1068 globals • VFG Queries take 200 seconds • Reduce average number of locations per function summary from 1100 to <1 • Median of 15 for functions with >0 • Verification takes 15 minutes • Infeasible otherwise
Application 2: SLAM • Validates temporal safety properties • Boolean abstraction • Interprocedural dataflow analysis • Counterexample-driven refinement • Convert C program to Boolean program • Exhaustive dataflow analysis • No errors? Program is safe. • Real error? Program has a bug. • False error? Add predicates, repeat.
Boolean Programs int x,y; x = 5; y = 6; x = x * 2; y = y * 2; assert(x<y) bool p,q; p = 1; q = 1; p = 0; q = 0; q = 1; assert(q) p means “x == 5” q means “x < y” Predicates (important!) C Program Boolean Program
SLAM Predicates • Hard to come up with good predicates • Counterexample-driven refinement • Picks good predicates • Is very slow • Taking all possible predicates • Is even slower • Want “all the useful” predicates
Speeding Up SLAM • For a simple subset of C • Similar to “Copy Constants” • Use VFG to find a sufficient set of predicates • Provably sufficient for this subset • If this set fails to prove the real program • Fall back on counterexample-driven refinement
A Simple Language s ::= vi = n // constants | vi = vj // variable copy | if (*) s1 else s2 // condition ignored | vi = fun(vj, …) // function call | return(vi) // function return | assert(vi» vj) // safety property
Predicate Discovery • High-level idea • Each flow edge in the VFG means “values may flow from X to Y” • Add predicates to see if they do • For each assert(vi» vj) • Consider the chain of values flowing to vi, vj • Add an equality predicate for each link • Use constants to resolve scoping
SLAM Example int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 2 c 4
Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 Predicates: b == r r == 3 r == f f == a a == 1
Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 Predicates: b == r r == 3 r == f f == a // no scope! a == 1
Predicates For “b” int sel(int f) { int r; if (*) r = f; else r = 3; return(r); } void main() { int a,b,c; a = 1; b = sel(a); if (*) c = 2; else c = 4; assert(b > c); } b 3 r f a 1 Predicates: b == r b == r r == 3 r == 3 r == f r == f f == a // no scope! f == 1 f == 3 a == 1 a == 1 a == 3
Why does this work? • Simple language • No arithmetic, etc. • Just copying around initial values • Knowing final values of variables • Completely decides safety condition • Still related to real life • Cannot do arithmetic on locks, FILE *s, device driver status codes, etc.
Some SLAM Results Generated predicates are between all and two-thirds of the necessary predicates. However, since SLAM must iterate once to generate 3-7 missing predicates, the net performance increase is more than linear. Predicates can be specialized or simplified if the assert() condition is a common relational operator (e.g., x==y, x<y, x==5).
Conclusions • Complex interprocedural analyses can benefit from inexpensive value-flow • VFG encodes value flow • Constructed and queried quickly • Prune the set of dataflow facts and program points considered • Large net performance increase