340 likes | 358 Views
Program Analysis using Random Interpretation. Sumit Gulwani. UC-Berkeley March 2005. Program Analysis. Applications in all aspects of software development, e.g. Program correctness Software bugs are expensive! Compiler optimizations
E N D
Program Analysis using Random Interpretation Sumit Gulwani UC-Berkeley March 2005
Program Analysis Applications in all aspects of software development, e.g. • Program correctness • Software bugs are expensive! • Compiler optimizations • Provide people freedom to write code the way they want (leaving performance issues to compilers). • Translation validation • Semantic equivalence of programs before and after compilation (difficult to trust o/p of compiler for safety-critical systems).
Design choices in Program Analysis • Completeness (precision, # of false positives) • Computational complexity • Ease of implementation • Soundness = If analysis says “no bugs”, it means “no bugs”. What if we allow “probabilistic soundness” ? • We get more precise, efficient and even simpler algorithms. • Earlier probabilistic algorithms were used in other areas like networks, but not in program analysis. • We obtain a new class of analyses: random interpretation.
RandomInterpretation = Random Testing + Abstract Interpretation Random Testing: • Test program on random inputs • Simple, efficient but unsound (can’t prove absence of bugs) Abstract Interpretation: • Class of deterministic program analyses • Interpret (analyze) an abstraction (approximation) of program • Sound but usually complicated, expensive Random Interpretation: • Class of randomized program analyses • Almost as simple, efficient as random testing • Almost as sound as abstract interpretation
Example 1 True False * a := 0; b := i; a := i-2; b := 2; True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; assert(c+d = 0); assert(c = a+i)
Example 1: Random Testing True False * • Need to test blue path to falsify second assertion. • Chances of choosing blue path from set of all 4 paths are small. • Hence, random testing is unsound. a := 0; b := i; a := i-2; b := 2; True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; assert(c+d = 0); assert(c = a+i)
Example 1: Abstract Interpretation True False * • Computes invariant at each program point. • Operations are usually complicated and expensive. a := 0; b := i; a := i-2; b := 2; a=0, b=i a=i-2, b=2 a+b=i True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; a+b=i c=2a+b, d=b-2i a+b=i c=b-a, d=i-2b a+b=i, c=-d assert(c+d = 0); assert(c = a+i)
Example 1: Random Interpretation • Choose random values for input variables. • Execute both branches of a conditional. • Combine values of variables at join points. • Test the assertion. True False * a := 0; b := i; a := i-2; b := 2; True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; assert(c+d = 0); assert(c = a+i)
Outline • Random Interpretation • Linear arithmetic (POPL 2003) • Uninterpreted functions (POPL 2004) • Inter-procedural analysis (POPL 2005) • Other applications
Linear relationships in programs with linear assignments • Linear relationships (e.g., x=2y+5) are useful for • Program correctness (e.g. buffer overflows) • Compiler optimizations (e.g., constant and copy propagation, CSE, Induction variable elimination etc.) • “programs with linear assignments” does not mean inapplicability to “real” programs • “abstract” other program stmts as non-deterministic assignments (standard practice in program analysis)
Basic idea in random interpretation Generic algorithm: • Choose random values for input variables. • Execute both branches of a conditional. • Combine the values of variables at join points. • Test the assertion.
a = 2 b = 3 a = 4 b = 1 a = 7(2,4) = -10 b = 7(3,1) = 15 Idea #1: The Affine Join operation • Affine join of v1 and v2 w.r.t. weight w w(v1,v2)´w v1 + (1-w) v2 • Affine join preserves common linear relationships (a+b=5) • It does not introduce false relationships w.h.p. w = 7
a = 2 b = 3 a = 4 b = 1 a = 5(2,4) = -6 b = 5(3,1) = 11 a = 7(2,4) = -10 b = 7(3,1) = 15 Idea #1: The Affine Join operation • Affine join of v1 and v2 w.r.t. weight w w(v1,v2)´w v1 + (1-w) v2 • Affine join preserves common linear relationships (a+b=5) • It does not introduce false relationships w.h.p. • Unfortunately, non-linear relationships are not preserved (e.g. a £ (1+b) = 8) w = 7 w = 5
Geometric Interpretation of Affine Join • satisfies all the affine relationships that are satisfied by both (e.g. a + b = 5) • Given any relationship that is not satisfied by any of (e.g. b=2), also does not satisfy it with high probability : State before the join : State after the join b a + b = 5 (a = 2, b = 3) b = 2 (a = 4, b = 1) a
Example 1 i=3 • Choose a random weight for each join independently. • All choices of random weights verify first assertion • Almost all choices contradict second assertion False True * a := 0; b := i; a := i-2; b := 2; w1 = 5 i=3, a=1, b=2 i=3, a=0, b=3 i=3, a=-4, b=7 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; i=3, a=-4, b=7 c=11, d=-11 i=3, a=-4, b=7 c=-1, d=1 w2 = 2 i=3, a=-4, b=7 c=23, d=-23 assert (c+d = 0); assert (c = a+i)
Example 2 We need to make use of the conditional x=y on the true branch to prove the assertion. a := x + y True False x = y ? b := a b := 2x assert (b = 2x)
Idea #2: The Adjust Operation • Execute multiple runs of the program in parallel. • Sample S = Collection of states at a program point • Adjust(S, e=0) is the sample obtained by linear combination of states in S such that • The equality conditional is satisfied. • Note that original relationships are preserved. • Use Adjust(S, e=0) on true branch of the conditional e=0
Geometric Interpretation of Adjust • Program states = points • Adjust = projection onto the hyperplane • Adjust operation loses one point. Algorithm to obtain S’ = Adjust(S, e=0) S1 S4 S’1 S’2 Hyperplane e = 0 S’3 S2 S3
Correctness of Random Interpreter R • Completeness: If e1=e2, then R ) e1=e2 • assuming non-det conditionals • Soundness: If e1e2, then R e1 = e2 • error prob. · • b, j : number of branches and joins • d: size of set from which random values are chosen • k: number of points in the sample • If j = b = 10, k = 15, d ¼ 232, then error ·
Proof Methodology Proving correctness was the most complicated part in this work. We used the following methodology. • Design an appropriate deterministic algorithm (need not be efficient) • Prove (by induction) that the randomized algorithm simulates each step of the deterministic algorithm with high probability.
Outline • Random Interpretation • Linear arithmetic (POPL 2003) • Uninterpreted functions (POPL 2004) • Inter-procedural analysis (POPL 2005) • Other applications
Abstraction Problem: Global value numbering a := 5; x := F(a,b); y := F(5,b); z := F(b,a); a := 5; x := a*b; y := 5*b; z := b*a; • x=y and x=z • Reasoning about multiplication is undecidable • only x=y • Reasoning is decidable but tricky in presence of joins • Axiom: If x1=y1 and x2=y2, then F(x1,x2)=F(y1,y2) • Goal: Detect expression equivalence when program operators are abstracted using “uninterpreted functions” • Application: Compiler optimizations, Translation validation
Example x = (a,b) y = (a,b) z = (F(a),F(b)) F(y) = F((a,b)) False True * x := b; y := b; z := F(b); x := a; y := a; z := F(a); assert(x = y); assert(z = F(y)); • Typical algorithms treat as uninterpreted • Hence cannot verify the second assertion • The randomized algorithm interprets • as affine join operation w
How to “execute” uninterpreted functions ? Expression Language e := y | F(e1,e2) • Choose a random interpretation for F • Non-linear interpretation • E.g. F(e1,e2) = r1e12 + r2e22 • Preserves all equivalences in straight-line code • But not across join points • Let’s try linear interpretation
e= F e’ = F F F F F a b c d a c b d Random Linear Interpretation • Encode F(e1,e2) = r1e1 + r2e2 • Preserves all equivalences across a join point • Introduces false equivalences in straight-line code. E.g. e and e’ have same encodings even though e e’ Encodings e = r1(r1a+r2b) + r2(r1c+r2d) = r12(a)+r1r2(b)+r2r1(c)+r22(d) e’ = r12(a)+r1r2(c)+r2r1(b)+r22(d) • Problem: Scalar multiplication is commutative. • Solution: Choose r1 and r2 to be random matrices and evaluate expressions to vectors
Outline • Random Interpretation • Linear arithmetic (POPL 2003) • Uninterpreted functions (POPL 2004) • Inter-procedural analysis (POPL 2005) • Other applications
Example False True * a := 0; b := i; a := i-2; b := 2; • The second assertion is true in the context i=2. • Interprocedural Analysis requires computing procedure summaries. True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; assert (c + d = 0); assert (c = a + i)
Idea #1: Keep input variables symbolic True False • Do not choose random values for input variables (to later instantiate by any context). • Resulting program state at the end is a random procedure summary. * a := 0; b := i; a := i-2; b := 2; a=0, b=i w1 = 5 a=i-2, b=2 a=8-4i, b=5i-8 True False * c := b – a; d := i – 2b; c := 2a + b; d := b – 2i; a=8-4i, b=5i-8 c=8-3i, d=3i-8 a=8-4i, b=5i-8 c=9i-16, d=16-9i w2 = 2 a=0, b=2 c=2, d=-2 i=2 a=8-4i, b=5i-8 c=21i-40, d=40-21i assert (c+d = 0); assert (c = a+i)
Randomized Deterministic Experiments • Randomized algorithm discovers 10-70% more facts. • Randomized algorithm is slower by a factor of 2.
Experimental measure of error The % of incorrect relationships decreases with increase in • S = size of set from which random values are chosen. • N = # of random summaries used. S N The experimental results are better than what is predicted by theory.
Outline • Random Interpretation • Linear arithmetic (POPL 2003) • Uninterpreted functions (POPL 2004) • Inter-procedural analysis (POPL 2005) • Other applications
Other applications of random interpretation • Model Checking • Randomized equivalence testing algorithm for FCEDs, which represent conditional linear expressions and are generalization of BDDs. (SAS 04) • Theorem Proving • Randomized decision procedure for linear arithmetic and uninterpreted functions. This runs an order of magnitude faster than det. algo. (CADE 03) • Ideas for deterministic algorithms • PTIME algorithm for global value numbering, thereby solving a 30 year old open problem. (SAS 04)
Summary • Lessons Learned • Randomization buys efficiency, simplicity at cost of prob. soundness. • Randomization suggests ideas for deterministic algorithms. • Combining randomized and symbolic techniques is powerful.