300 likes | 438 Views
Random Interpretation. Sumit Gulwani. UC-Berkeley. Program Analysis. Applications in all aspects of software development, e.g. Program correctness Compiler optimizations Translation validation Parameters Completeness (precision, no false positives) Computational complexity
E N D
Random Interpretation Sumit Gulwani UC-Berkeley
Program Analysis Applications in all aspects of software development, e.g. • Program correctness • Compiler optimizations • Translation validation Parameters • Completeness (precision, no false positives) • Computational complexity • Ease of implementation • What if we allow probabilistic soundness? • We obtain a new class of analyses: random interpretation
RandomInterpretation = Random Testing + Abstract Interpretation • Almost as simple as random testing but better soundness guarantees. • Almost as sound as abstract interpretation but more precise, efficient, and simple.
Example 1 True False • Random testing needs to execute all 4 paths to verify assertions. • Abstract interpretation analyzes statements once but uses complicated operations. • Randominterpretation executes program once, but in a way that captures effect of all paths. * a := 0; b := i; a := i-2; b := 2; True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; assert(c+d = 0); assert(c = a+i)
Outline Random Interpretation • Linear arithmetic (POPL 2003) • Uninterpreted functions (POPL 2004) • Inter-procedural analysis (POPL 2005) • Other applications
Problem: Linear relationships in linear programs • Does not mean inapplicability to “real” programs • “abstract” other program stmts as non-deterministic assignments (standard practice in program analysis) • Linear relationships are useful for • Program correctness • Buffer overflows • Compiler optimizations • Constant propagation, copy propagation, common subexpression elimination, induction variable elimination.
Basic idea in random interpretation Generic algorithm: • Choose random values for input variables. • Execute both branches of a conditional. • Combine the values of variables at join points. • Test the assertion.
a = 2 b = 3 a = 4 b = 1 a = 7(2,4) = -10 b = 7(3,1) = 15 Idea #1: The Affine Join operation • Affine join of v1 and v2 w.r.t. weight w w(v1,v2)´w v1 + (1-w) v2 • Affine join preserves common linear relationships (e.g. a+b=5) • It does not introduce false relationships w.h.p. • Unfortunately, non-linear relationships are not preserved (e.g. a £ (1+b) = 8) w = 7
Geometric Interpretation of Affine Join • satisfies all the affine relationships that are satisfied by both (e.g. a + b = 5) • Given any relationship that is not satisfied by any of (e.g. b=2), also does not satisfy it with high probability : State before the join : State after the join b a + b = 5 (a = 2, b = 3) b = 2 (a = 4, b = 1) a
Example 1 i=3 • Choose a random weight for each join independently. • All choices of random weights verify first assertion • Almost all choices contradict second assertion False True * a := 0; b := i; a := i-2; b := 2; w1 = 5 i=3, a=1, b=2 i=3, a=0, b=3 i=3, a=-4, b=7 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; i=3, a=-4, b=7 c=11, d=-11 i=3, a=-4, b=7 c=-1, d=1 w2 = 2 i=3, a=-4, b=7 c=23, d=-23 assert (c+d = 0); assert (c = a+i)
Example 2 We need to make use of the conditional x=y on the true branch to prove the assertion. a := x + y True False x = y ? b := a b := 2x assert (b = 2x)
Idea #2: The Adjust Operation • Execute multiple runs of the program in parallel. • Sample = Collection of states at a program point • Combine states in the sample before a conditional s.t. • The equality conditional is satisfied. • Original relationships are preserved. • Use adjusted sample on the true branch.
Geometric Interpretation of Adjust • Program states = points • Adjust = projection onto the hyperplane • S’ satisfies e=0 and all relationships satisfied by S Algorithm to obtain S’ = Adjust(S, e=0) S1 S4 S’1 S’2 Hyperplane e = 0 S’3 S2 S3
Correctness of Random Interpreter R • Completeness: If e1=e2, then R ) e1=e2 • assuming non-det conditionals • Soundness: If e1e2, then R ) e1=e2 • error prob. · • b: number of branches • j: number of joins • d: size of the field • k: number of points in the sample • If j = b = 10, k = 15, d ¼ 232, then error ·
Outline Random Interpretation • Linear arithmetic (POPL 2003) • Uninterpreted functions (POPL 2004) • Inter-procedural analysis (POPL 2005) • Other applications
Problem: Global value numbering • Goal: Detect expression equivalence in programs that have been abstracted using “uninterpreted functions” • Axiom of the theory of uninterpreted functions If x=y, then F(x)=F(y) • Applications • Compiler optimizations • Translation validation
Example x = (a,b) y = (a,b) z = (F(a),F(b)) F(y) = F((a,b)) False True * x := b; y := b; z := F(b); x := a; y := a; z := F(a); assert(x = y); assert(z = F(y)); • Typical algorithms treat as uninterpreted • Hence cannot verify the second assertion • The randomized algorithm interprets • as affine join operation w
How to “execute” uninterpreted functions e := y | F(e1,e2) • Choose a random interpretation for F • Non-linear interpretation • E.g. F(e1,e2) = r1e12 + r2e22 • Preserves all equivalences in straight-line code • But not across join points • Lets try linear interpretation
e= F F F a b c d Random Linear Interpretation • Encode F(e1,e2) = r1e1 + r2e2 • Preserves all equivalences across a join point • Introduces false equivalences in straight-line code. E.g. e and e’ have same encodings even though e e’ Encodings e = r1(r1a+r2b) + r2(r1c+r2d) = r12(a)+r1r2(b)+r2r1(c)+r22(d) e’ = r12(a)+r1r2(c)+r2r1(b)+r22(d) e’ = F F F a c b d • Problem: Scalar multiplication is commutative. • Solution: Evaluate expressions to vectors and choose r1 and r2 to be random matrices
Outline Random Interpretation • Linear arithmetic (POPL 2003) • Uninterpreted functions (POPL 2004) • Inter-procedural analysis (POPL 2005) • Other applications
Example False True * a := 0; b := i; a := i-2; b := 2; • The second assertion is true in the context i=2. • Interprocedural Analysis requires computing procedure summaries. True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; assert (c + d = 0); assert (c = a + i)
Idea #1: Keep input variables symbolic True False • Do not choose random values for input variables (to later instantiate by any context). • Resulting program state at the end is a random procedure summary. * a := 0; b := i; a := i-2; b := 2; a=0, b=i w1 = 5 a=i-2, b=2 a=8-4i, b=5i-8 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; a=8-4i, b=5i-8 c=8-3i, d=3i-8 a=8-4i, b=5i-8 c=9i-16, d=16-9i w2 = 2 a=0, b=2 c=2, d=-2 i=2 a=8-4i, b=5i-8 c=21i-40, d=40-21i assert (c+d = 0); assert (c = a+i)
Idea #2: Generate fresh summaries Procedure P Procedure Q Input: i u := P(2); v := P(1); w := P(1); True False * x := i+1; x := 3; u = 5¢2 -7 = 3 v = 5¢1 -7 = -2 w = 5¢1 -7 = -2 w = 5 x = i+1 x = 3 x = 5i-7 Assert (u = 3); Assert (v = w); return x; • Plugging the same summary twice is unsound. • Fresh summaries can be generated by random affine combination of few independent summaries!
Randomized Deterministic Experiments • Randomized algorithm discovers 10-70% more facts. • Randomized algorithm is slower by a factor of 2.
Experimental measure of error The % of incorrect relationships decreases with increase in • S = size of set from which random values are chosen. • N = # of random summaries used. S N The experimental results are better than what is predicted by theory.
Outline Random Interpretation • Linear arithmetic (POPL 2003) • Uninterpreted functions (POPL 2004) • Inter-procedural analysis (POPL 2005) • Other applications
Other applications of random interpretation • Model Checking • Randomized equivalence testing algorithm for FCEDs, which represent conditional linear expressions and are generalization of BDDs. (SAS 04) • Theorem Proving • Randomized decision procedure for linear arithmetic and uninterpreted functions. This runs an order of magnitude faster than det. algo. (CADE 03) • Ideas for deterministic algorithms • PTIME algorithm for global value numbering, thereby solving a 30 year old open problem. (SAS 04)
Future Work and Limitations Future Work • Random interpreters for other theories • E.g. data-structures • Combining random interpreters • E.g. random interpreter for the combined theory of linear arithmetic and uninterpreted functions. Limitations • Does not discover “never equal” information • Only detects “always equal” information
Summary Random interpretation Abstract interpretation • Lessons Learned • Randomization buys efficiency, simplicity at cost of prob. soundness. • Randomization suggests ideas for deterministic algorithms. • Combining randomized techniques with symbolic is powerful.