230 likes | 393 Views
Precise Interprocedural Analysis using Random Interpretation. Sumit Gulwani George Necula UC-Berkeley. Random Interpretation. = Random Testing + Abstract Interpretation Almost as simple as random testing but better soundness guarantees.
E N D
Precise Interprocedural Analysis using Random Interpretation Sumit Gulwani George Necula UC-Berkeley
RandomInterpretation = Random Testing + Abstract Interpretation • Almost as simple as random testing but better soundness guarantees. • Almost as sound as abstract interpretation but more precise, efficient, and simple.
Example • Random testing needs to execute all 4 paths to verify assertions. • Abstract interpretation analyzes statements once but uses complicated operations. • Random interpretation simply executes program once (and captures effect of all paths). True False * a := 0; b := i; a := i-2; b := 2; True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; assert(c+d = 0); assert(c = a+i)
Outline • Framework for intraprocedural random interpretation • Advantages • Investigate all analyses using one framework • Design and proof of new analyses will be simpler • A generic algorithm for interprocedural analysis
Outline • Framework for intraprocedural random interpretation • Affine join function • Eval function • Example • A generic algorithm for interprocedural analysis
Random Interpretation framework Goal: Detect equivalences of expressions. Generic Algorithm: • Choose random values for input variables. • Execute assignments. • Using Eval function to evaluate expressions. • Execute both branches of conditionals and combine the program states at join points. • Using Affine Join function. • Compare values of expressions to decide equality.
a := 4; b := 1; a := 2; b := 3; Affine Join function Used for combining program states at join points. w: State £ State ! State Let = w(1,2). Then, (y) =def w£1(y) + (1-w)£2(y) 2: [a=4, b=1] 1: [a=2, b=3] • = 7(1,2): [a=7¢2 + (1-7)¢4, b=7¢3 +(1-7)¢1] i.e. [a=-10, b=15]
Properties of Affine Join • Affine join preserves common linear relationships e.g. a+b=5. • It does not introduce false relationships w.h.p. a := 4; b := 1; a := 2; b := 3; 2: [a=4, b=1] 1: [a=2, b=3] • = 7(1,2): [a=7¢2 + (1-7)¢4, b=7¢3 +(1-7)¢1] i.e. [a=-10, b=15]
Eval function Eval: Expression £ State ! Value • Used for executing expressions • Defined in terms of Poly: Expression ! Polynomial • Poly is abstraction specific Eval(e,) = Evaluation of Poly(e) using and random choices for non-program variables Poly must satisfy: • Correctness: Poly(e1) = Poly(e2) iff e1 = e2 • Linearity: Poly(e) is linear in program variables.
Example of Poly function • Linear Arithmetic (POPL 2003) Expression e := y | e1§ e2 | c¢e Poly(e) = e • Uninterpreted Functions (POPL 2004) Expression e := y | F(e) Poly(y) = y Poly(F(e)) = a £ Poly(e) + b
Example: Random Interpretation for Linear Arithmetic i=3 False True * a := 0; b := i; a := i-2; b := 2; w1 = 5 i=3, a=1, b=2 i=3, a=0, b=3 i=3, a=-4, b=7 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; i=3, a=-4, b=7 c=11, d=-11 i=3, a=-4, b=7 c=-1, d=1 w2 = 2 i=3, a=-4, b=7 c=23, d=-23 assert (c+d = 0); assert (c = a+i)
Outline • Framework for intraprocedural random interpretation • Affine join function • Eval function • Example • A generic algorithm for interprocedural analysis • Random summary (Idea #1) • Issue of freshness (Idea #2) • Error probability and complexity • Experiments
Example • The second assertion is true in the context i=2. • We need two new ideas to make the analysis interprocedural. i=3 False True * a := 0; b := i; a := i-2; b := 2; w1 = 5 i=3, a=1, b=2 i=3, a=0, b=3 i=3, a=-4, b=7 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; i=3, a=-4, b=7 c=11, d=-11 i=3, a=-4, b=7 c=-1, d=1 w2 = 2 i=3, a=-4, b=7 c=23, d=-23 assert (c+d = 0); assert (c = a+i)
Idea #1: Keep input variables symbolic True False • Do not choose random values for input variables (to later instantiate by any context). • Resulting program state at the end is a random summary. * a := 0; b := i; a := i-2; b := 2; a=0, b=i w1 = 5 a=i-2, b=2 a=8-4i, b=5i-8 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; a=8-4i, b=5i-8 c=8-3i, d=3i-8 a=8-4i, b=5i-8 c=9i-16, d=16-9i w2 = 2 a=0, b=2 c=2, d=-2 i=2 a=8-4i, b=5i-8 c=21i-40, d=40-21i assert (c+d = 0); assert (c = a+i)
Idea #2: Generate fresh summaries Procedure P Procedure Q Input: i u := P(2); v := P(1); w := P(1); True False * x := i+1; x := 3; u = 5¢2 -7 = 3 v = 5¢1 -7 = -2 w = 5¢1 -7 = -2 w = 5 x = i+1 x = 3 x = 5i-7 Assert (u = 3); Assert (v = w); return x; • Plugging the same summary twice is unsound. • Fresh summaries can be generated by random affine combination of few independent summaries!
Generating 2 random summaries for P Input: i Procedure P x = 7(5i-7,7-2i) = 47i-91 x = 6(5i-7,7-2i) = 40i-77 x = 2(5i-7,7-2i) = 19i-35 x = 0(5i-7,7-2i) = 7-2i x = 5(5i-7,7-2i) = 33i-63 x = 1(5i-7,7-2i) = 5i-7 True False * x := i+1; x := 3; w=[5,-2] x = [3,3] x=[i+1,i+1] x=[5i-7,7-2i] return x; Procedure Q calls P 3 times. Hence, generating 2 random summaries for Q requires 2£3 fresh summaries of P.
Generating 2 random summaries for Q Procedure Q x = 7(5i-7,7-2i) = 47i-91 x = 6(5i-7,7-2i) = 40i-77 x = 2(5i-7,7-2i) = 19i-35 x = 0(5i-7,7-2i) = 7-2i x = 5(5i-7,7-2i) = 33i-63 x = 1(5i-7,7-2i) = 5i-7 u := P(2); v := P(1); w := P(1); u = [47¢2-91, 40¢2-77] =[3,3] v = [19¢1-35, 7-2¢1] =[-16,5] w = [33¢1-63, 5¢1-7] =[-30,-2] Assert (u = 3); Assert (v = w);
Loops and Fixed point computation • In presence of loops (in procedures and call-graphs), fixed point computation is required. • The number of iterations required to reach fixed point is kv(2kI+1) + 1 kv: # of visible variables kI: # of input variables
Error Probability and Complexity Time Complexity = nkVkI2t Error probability = 1/qt-m n: size of program kV, kI: # of visible and input variables t: # of random summaries q: size of set from which random values are chosen m: kI kV (generic bound) kI + kV (for linear arithmetic) 4 (for unary uninterpreted functions)
Related Work • Intraprocedural random interpretation • Linear arithmetic (POPL 03) • Uninterpreted functions (POPL 04) • Interprocedural dataflow analysis (POPL 95, TCS 96) • Sagiv, Reps, Horwitz • Cons: simpler properties, e.g. liveness, linear constants • Pro: better computational complexity • Interprocedural linear arithmetic (POPL 04) • Muller-Olm, Seidl • Cons: O(k2) times slower • Pro: works for non-linear relationships too
Related Work • Intraprocedural random interpretation • Linear arithmetic (POPL 03) • Uninterpreted functions (POPL 04) • Interprocedural dataflow analysis (POPL 95, TCS 96) • Sagiv, Reps, Horwitz • Cons: simpler properties, e.g. liveness, linear constants • Pro: better computational complexity • Interprocedural linear arithmetic (POPL 04) • Muller-Olm, Seidl • Cons: O(k2) times slower • Pro: works for non-linear relationships too
Experiments Det Inter (TCS 96) Random Inter (this paper) Random Intra (POPL 2003) • Inp: # of input variables that were constants • Var: # of local variable that were constants • (Var): # of fewer local variable constants discovered • Random Inter discovers 10-70% more facts; Random Intra is faster by 10-500 times; Det Inter is faster by 2 times.
Conclusion • Randomization buys efficiency, simplicity at cost of probabilistic soundness. • Combining randomized techniques with symbolic techniques is powerful.