Precise Interprocedural Analysis using Random Interpretation

Precise Interprocedural Analysis using Random Interpretation Sumit Gulwani George Necula UC-Berkeley

RandomInterpretation = Random Testing + Abstract Interpretation • Almost as simple as random testing but better soundness guarantees. • Almost as sound as abstract interpretation but more precise, efficient, and simple.

Example • Random testing needs to execute all 4 paths to verify assertions. • Abstract interpretation analyzes statements once but uses complicated operations. • Random interpretation simply executes program once (and captures effect of all paths). True False * a := 0; b := i; a := i-2; b := 2; True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; assert(c+d = 0); assert(c = a+i)

Outline • Framework for intraprocedural random interpretation • Advantages • Investigate all analyses using one framework • Design and proof of new analyses will be simpler • A generic algorithm for interprocedural analysis

Outline • Framework for intraprocedural random interpretation • Affine join function • Eval function • Example • A generic algorithm for interprocedural analysis

Random Interpretation framework Goal: Detect equivalences of expressions. Generic Algorithm: • Choose random values for input variables. • Execute assignments. • Using Eval function to evaluate expressions. • Execute both branches of conditionals and combine the program states at join points. • Using Affine Join function. • Compare values of expressions to decide equality.

a := 4; b := 1; a := 2; b := 3; Affine Join function Used for combining program states at join points. w: State £ State ! State Let  = w(1,2). Then, (y) =def w£1(y) + (1-w)£2(y) 2: [a=4, b=1] 1: [a=2, b=3] • = 7(1,2): [a=7¢2 + (1-7)¢4, b=7¢3 +(1-7)¢1] i.e. [a=-10, b=15]

Properties of Affine Join • Affine join preserves common linear relationships e.g. a+b=5. • It does not introduce false relationships w.h.p. a := 4; b := 1; a := 2; b := 3; 2: [a=4, b=1] 1: [a=2, b=3] • = 7(1,2): [a=7¢2 + (1-7)¢4, b=7¢3 +(1-7)¢1] i.e. [a=-10, b=15]

Eval function Eval: Expression £ State ! Value • Used for executing expressions • Defined in terms of Poly: Expression ! Polynomial • Poly is abstraction specific Eval(e,) = Evaluation of Poly(e) using  and random choices for non-program variables Poly must satisfy: • Correctness: Poly(e1) = Poly(e2) iff e1 = e2 • Linearity: Poly(e) is linear in program variables.

Example of Poly function • Linear Arithmetic (POPL 2003) Expression e := y | e1§ e2 | c¢e Poly(e) = e • Uninterpreted Functions (POPL 2004) Expression e := y | F(e) Poly(y) = y Poly(F(e)) = a £ Poly(e) + b

Example: Random Interpretation for Linear Arithmetic i=3 False True * a := 0; b := i; a := i-2; b := 2; w1 = 5 i=3, a=1, b=2 i=3, a=0, b=3 i=3, a=-4, b=7 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; i=3, a=-4, b=7 c=11, d=-11 i=3, a=-4, b=7 c=-1, d=1 w2 = 2 i=3, a=-4, b=7 c=23, d=-23 assert (c+d = 0); assert (c = a+i)

Outline • Framework for intraprocedural random interpretation • Affine join function • Eval function • Example • A generic algorithm for interprocedural analysis • Random summary (Idea #1) • Issue of freshness (Idea #2) • Error probability and complexity • Experiments

Example • The second assertion is true in the context i=2. • We need two new ideas to make the analysis interprocedural. i=3 False True * a := 0; b := i; a := i-2; b := 2; w1 = 5 i=3, a=1, b=2 i=3, a=0, b=3 i=3, a=-4, b=7 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; i=3, a=-4, b=7 c=11, d=-11 i=3, a=-4, b=7 c=-1, d=1 w2 = 2 i=3, a=-4, b=7 c=23, d=-23 assert (c+d = 0); assert (c = a+i)

Idea #1: Keep input variables symbolic True False • Do not choose random values for input variables (to later instantiate by any context). • Resulting program state at the end is a random summary. * a := 0; b := i; a := i-2; b := 2; a=0, b=i w1 = 5 a=i-2, b=2 a=8-4i, b=5i-8 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; a=8-4i, b=5i-8 c=8-3i, d=3i-8 a=8-4i, b=5i-8 c=9i-16, d=16-9i w2 = 2 a=0, b=2 c=2, d=-2 i=2 a=8-4i, b=5i-8 c=21i-40, d=40-21i assert (c+d = 0); assert (c = a+i)

Idea #2: Generate fresh summaries Procedure P Procedure Q Input: i u := P(2); v := P(1); w := P(1); True False * x := i+1; x := 3; u = 5¢2 -7 = 3 v = 5¢1 -7 = -2 w = 5¢1 -7 = -2 w = 5 x = i+1 x = 3 x = 5i-7 Assert (u = 3); Assert (v = w); return x; • Plugging the same summary twice is unsound. • Fresh summaries can be generated by random affine combination of few independent summaries!

Generating 2 random summaries for P Input: i Procedure P x = 7(5i-7,7-2i) = 47i-91 x = 6(5i-7,7-2i) = 40i-77 x = 2(5i-7,7-2i) = 19i-35 x = 0(5i-7,7-2i) = 7-2i x = 5(5i-7,7-2i) = 33i-63 x = 1(5i-7,7-2i) = 5i-7 True False * x := i+1; x := 3; w=[5,-2] x = [3,3] x=[i+1,i+1] x=[5i-7,7-2i] return x; Procedure Q calls P 3 times. Hence, generating 2 random summaries for Q requires 2£3 fresh summaries of P.

Generating 2 random summaries for Q Procedure Q x = 7(5i-7,7-2i) = 47i-91 x = 6(5i-7,7-2i) = 40i-77 x = 2(5i-7,7-2i) = 19i-35 x = 0(5i-7,7-2i) = 7-2i x = 5(5i-7,7-2i) = 33i-63 x = 1(5i-7,7-2i) = 5i-7 u := P(2); v := P(1); w := P(1); u = [47¢2-91, 40¢2-77] =[3,3] v = [19¢1-35, 7-2¢1] =[-16,5] w = [33¢1-63, 5¢1-7] =[-30,-2] Assert (u = 3); Assert (v = w);

Loops and Fixed point computation • In presence of loops (in procedures and call-graphs), fixed point computation is required. • The number of iterations required to reach fixed point is kv(2kI+1) + 1 kv: # of visible variables kI: # of input variables

Error Probability and Complexity Time Complexity = nkVkI2t Error probability = 1/qt-m n: size of program kV, kI: # of visible and input variables t: # of random summaries q: size of set from which random values are chosen m: kI kV (generic bound) kI + kV (for linear arithmetic) 4 (for unary uninterpreted functions)

Related Work • Intraprocedural random interpretation • Linear arithmetic (POPL 03) • Uninterpreted functions (POPL 04) • Interprocedural dataflow analysis (POPL 95, TCS 96) • Sagiv, Reps, Horwitz • Cons: simpler properties, e.g. liveness, linear constants • Pro: better computational complexity • Interprocedural linear arithmetic (POPL 04) • Muller-Olm, Seidl • Cons: O(k2) times slower • Pro: works for non-linear relationships too

Experiments Det Inter (TCS 96) Random Inter (this paper) Random Intra (POPL 2003) • Inp: # of input variables that were constants • Var: # of local variable that were constants • (Var): # of fewer local variable constants discovered • Random Inter discovers 10-70% more facts; Random Intra is faster by 10-500 times; Det Inter is faster by 2 times.

Conclusion • Randomization buys efficiency, simplicity at cost of probabilistic soundness. • Combining randomized techniques with symbolic techniques is powerful.

Precise Interprocedural Analysis using Random Interpretation

Precise Interprocedural Analysis using Random Interpretation

Presentation Transcript

Discovering Affine Equalities Using Random Interpretation

Discovering Affine Equalities Using Random Interpretation

More Interprocedural Analysis

Interprocedural Analysis and Optimization

Interprocedural Analysis and Optimization

Global Value Numbering using Random Interpretation

A precise analysis

Interprocedural Analysis

Interprocedural Analysis

Random Interpretation

Interprocedural Analysis

Interprocedural Analysis

Precise Inter-procedural Analysis

Interprocedural Analysis

Interprocedural Optimizations

Interprocedural Analysis

Global Value Numbering using Random Interpretation

Interprocedural Analysis and Optimization

Interprocedural Analysis

Interprocedural Shape Analysis for Recursive Programs

Program Analysis using Random Interpretation

Interprocedural Analysis