1 / 44

On Abstraction Refinement for Program Analyses in Datalog

On Abstraction Refinement for Program Analyses in Datalog. Xin Zhang , Ravi Mangal , Mayur Naik Georgia Tech. Radu Grigore , Hongseok Yang Oxford University. Datalog for program a nalysis. Datalog. What is Datalog ?. Datalog. What is Datalog ?. Input relations:

Download Presentation

On Abstraction Refinement for Program Analyses in Datalog

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Abstraction Refinement for Program Analyses in Datalog Xin Zhang, Ravi Mangal, MayurNaik Georgia Tech RaduGrigore, Hongseok Yang Oxford University

  2. Datalog for program analysis Datalog Programming Language Design and Implementation, 2014

  3. What is Datalog? Datalog Programming Language Design and Implementation, 2014

  4. What is Datalog? Input relations: Output relations: Rules: edge(i, j). path(i, j). (1) path(i, i). (2) path(i, k) :- path(i, j), edge(j, k). Least fixpoint computation: Input: edge(0, 1), edge(1, 2). path(0, 0). path(1, 1). path(2, 2). path(0, 1) :- path(0, 0), edge(0, 1). path(0, 2) :- path(0, 1), edge(1, 2). Datalog Programming Language Design and Implementation, 2014

  5. Why Datalog? If there exists a path from a to b, and there is an edge from b to c, then there exists a path from a to c: path(a, c) :- path(a, b), edge(b, c). Datalog Programming Language Design and Implementation, 2014

  6. Why Datalog? k-object-sensitivity, k = 2, ~100KLOC Programming Language Design and Implementation, 2014

  7. Limitation k-object-sensitivity, k = 10, ~500KLOC k-object-sensitivity, k = 2, ~100KLOC Programming Language Design and Implementation, 2014

  8. Program abstraction Abstraction Precision Scalability Programming Language Design and Implementation, 2014

  9. Parametric program abstraction Abstraction Precision Scalability Programming Language Design and Implementation, 2014

  10. Parametric program abstraction Abstraction Precision Scalability Programming Language Design and Implementation, 2014

  11. Parametric program abstraction: Example 1 Cloning depth K for each call site and allocation site Pointer Analysis Programming Language Design and Implementation, 2014

  12. Parametric program abstraction: Example 2 Predicates to use as abstraction predicates Shape Analysis Programming Language Design and Implementation, 2014

  13. Program abstraction Datalog Program Datalog Program alias(p, q)? alias(m, n)? Programming Language Design and Implementation, 2014

  14. Program abstraction Counterexample guided refinement (CEGAR) via MAXSAT Datalog Program Datalog Program alias(p, q)? alias(m, n)? Programming Language Design and Implementation, 2014

  15. Pointer analysis example f(){ v1= new ...; v2= id1(v1); v3= id2(v2); q2:assert(v3!= v1); } id1(v){return v;} g(){ v4= new ...; v5= id1(v4); v6= id2(v5); q1:assert(v6!= v1); } id2(v){return v;} Programming Language Design and Implementation, 2014

  16. Pointer analysis as graph reachability 0 3 a0 b0 b1 a1 6’ 6 6’’ a0 b0 b1 a1 1 4 c1 d0 c0 d1 7’ 7 7’’ c0 d1 d0 c1 2 5 Programming Language Design and Implementation, 2014

  17. Graph reachability in Datalog 0 3 a0 b0 b1 a1 6’ 6 6’’ Input relations: edge(i, j, n), abs(n) Output relations: path(i, j) Rules: (1) path(i, i). (2) path(i, j) :- path(i, k), edge(k, j, n), abs(n). a0 b0 b1 a1 1 4 c1 d0 c0 d1 7’ 7 7’’ c0 d1 d0 c1 2 5 Input tuples: edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0), … abs(a0)abs(a1), abs(b0)abs(b1), abs(c0)abs(c1), abs(d0)abs(d1). 16 possible abstractions in total Programming Language Design and Implementation, 2014

  18. Desired result 0 3 Input relations: edge(i, j, n), abs(n) Output relations: path(i, j) Rules: (1) path(i, i). (2) path(i, j) :- path(i, k), edge(k, j, n), abs(n). a0 b0 b1 a1 6’ 6 6’’ a0 b0 b1 a1 1 4 c1 d0 c0 d1 7’ 7 7’’ c0 d1 d0 c1 2 5 Input tuples: edge(0, 6, a0), edge(0, 6’, a1), edge(3, 6, b0), … abs(a0)abs(a1), abs(b0)abs(b1), abs(c0)abs(c1), abs(d0)abs(d1). Programming Language Design and Implementation, 2014

  19. Iteration 1 0 3 path(0, 0). path(0, 6) :- path(0, 0), edge(0, 6, a0), abs(a0). path(0, 1) :- path(0, 6), edge(6, 1, a0), abs(a0). path(0, 7) :- path(0, 1), edge(1, 7, c0), abs(c0). path(0, 2) :- path(0, 7), edge(7, 2, c0), abs(c0). path(0, 4) :- path(0, 6), edge(6, 4, b0), abs(b0). path(0, 7) :- path(0, 4), edge(4, 7, d0), abs(d0). path(0, 5) :- path(0, 7), edge(7, 5, d0), abs(d0). … a0 b0 b1 a1 6’ 6 6’’ a0 b0 b1 a1 1 4 c1 d0 c0 d1 7’ 7 7’’ c0 d1 d0 c1 2 5 abs(a0)abs(a1), abs(b0)abs(b1), abs(c0)abs(c1), abs(d0)abs(d1). Programming Language Design and Implementation, 2014

  20. Iteration 1 - derivation graph 0 3 a0 b0 b1 a1 6’ 6 6’’ a0 b0 b1 a1 1 4 c1 d0 c0 d1 7’ 7 7’’ c0 d1 d0 c1 2 5 abs(a0)abs(a1), abs(b0)abs(b1), abs(c0)abs(c1), abs(d0)abs(d1). Programming Language Design and Implementation, 2014

  21. Iteration 1 - derivation graph abs(a0) path(0,0) edge(0,6,a0) abs(b0) edge(6,4,b0) abs(a0) edge(6,1,a0) path(0,6) abs(d0) edge(4,7,d0) edge(1,7,c0) abs(c0) path(0,4) path(0,1) abs(c0) edge(7,2,c0) edge(7,5,d0) abs(d0) path(0,7) path(0,2) path(0,5) Programming Language Design and Implementation, 2014

  22. Iteration 1 - derivation graph abs(a0) path(0,0) edge(0,6,a0) abs(b0) edge(6,4,b0) abs(a0) edge(6,1,a0) path(0,6) abs(d0) edge(4,7,d0) edge(1,7,c0) abs(c0) path(0,4) path(0,1) abs(c0) edge(7,2,c0) edge(7,5,d0) abs(d0) path(0,7) path(0,2) path(0,5) a0c0 Programming Language Design and Implementation, 2014

  23. Iteration 1 - derivation graph abs(a0) path(0,0) edge(0,6,a0) abs(b0) edge(6,4,b0) abs(a0) edge(6,1,a0) path(0,6) abs(d0) edge(4,7,d0) edge(1,7,c0) abs(c0) path(0,4) path(0,1) abs(c0) edge(7,2,c0) edge(7,5,d0) abs(d0) path(0,7) path(0,2) path(0,5) a0c0 Programming Language Design and Implementation, 2014

  24. Iteration 1 - derivation graph abs(a0) path(0,0) edge(0,6,a0) abs(b0) edge(6,4,b0) abs(a0) edge(6,1,a0) path(0,6) abs(d0) edge(4,7,d0) edge(1,7,c0) abs(c0) path(0,4) path(0,1) abs(c0) edge(7,2,c0) edge(7,5,d0) abs(d0) path(0,7) path(0,2) path(0,5) a0b0d0 Programming Language Design and Implementation, 2014

  25. Iteration 1 - derivation graph 0 3 a0 b0 b1 a1 6’ 6 6’’ a0 b0 b1 a1 1 4 c1 d0 c0 d1 7’ 7 7’’ c0 d1 d0 c1 2 5 abs(a0)abs(a1), abs(b0)abs(b1), abs(c0)abs(c1), abs(d0)abs(d1). Programming Language Design and Implementation, 2014

  26. Encoded as MAXSAT Hard Constraints Soft Constraints MAXSAT( Find that Maximize Subject to Programming Language Design and Implementation, 2014

  27. Encoded as MAXSAT Avoid all the counterexamples Hard constraints: … Minimize the abstraction cost Soft constraints: abs(a0)abs(a1), abs(b0)abs(b1), abs(c0)abs(c1), abs(d0)abs(d1). Programming Language Design and Implementation, 2014

  28. Encoded as MAXSAT Solution: Hard constraints: … Soft constraints: a1c0d0 Programming Language Design and Implementation, 2014

  29. Iteration 2 and beyond Iteration 1 Derivation a1c0d0 Programming Language Design and Implementation, 2014

  30. Iteration 2 and beyond Iteration 2 Derivation a1c0d0 Programming Language Design and Implementation, 2014

  31. Iteration 2 and beyond Iteration 2 Derivation a1c0d0 Programming Language Design and Implementation, 2014

  32. Iteration 2 and beyond Iteration 2 Derivation a1c1d0 Programming Language Design and Implementation, 2014

  33. Iteration 2 and beyond Iteration 3 Derivation q1 is proven. a1c1d0 a1c1d0 Programming Language Design and Implementation, 2014

  34. Iteration 2 and beyond Iteration 3 Derivation q2is impossible to prove. q1 is proven. a1c1d0 a1c1d0 Impossibility Programming Language Design and Implementation, 2014

  35. Mixing counterexamples Iteration 1 Iteration 3 Eliminated Abstractions: a0c0 a1c1 Programming Language Design and Implementation, 2014

  36. Mixing counterexamples Iteration 1 Mixed! Iteration 3 Eliminated Abstractions: a0c0 a0c1 a1c1 Programming Language Design and Implementation, 2014

  37. Experimental setup • Implemented in JChord using off-the-shelf solvers: • Datalog: bddbddb • MAXSAT: MiFuMaX • Applied to two analyses that are challenging to scale: • k-object-sensitivity pointer analysis: • flow-insensitive, weak updates, cloning-based • typestate analysis: • flow-sensitive, strong updates, summary-based • Evaluated on 8 Java programs from DaCapo and Ashes. Programming Language Design and Implementation, 2014

  38. Benchmark characteristics Programming Language Design and Implementation, 2014

  39. Results: pointer analysis 4-object-sensitivity < 50% < 3% of max Programming Language Design and Implementation, 2014

  40. Performance of Datalog: pointer analysis k = 4, 3h28m Baseline k = 3, 590s k = 2, 214s lusearch k = 1, 153s Programming Language Design and Implementation, 2014

  41. Performance of MAXSAT: pointer analysis lusearch Programming Language Design and Implementation, 2014

  42. Statistics of MAXSAT formulae Programming Language Design and Implementation, 2014

  43. Conclusion Datalog MAXSAT Abstraction Programming Language Design and Implementation, 2014

  44. Conclusion Datalog MAXSAT A(x, y):- B(x, z), C(z, y) Soundness Hard Constraints Tradeoffs Scalability vs. Precision Soft Constraints Sound vs. Complete … Programming Language Design and Implementation, 2014

More Related