1 / 29

Bias-Variance Tradeoffs in Program Analysis

Rahul Sharma, Aditya V. Nori , Alex Aiken Stanford MSR India Stanford. Bias-Variance Tradeoffs in Program Analysis. Observation. int i = 1, j = 0; while ( i <=5) { j = j+i ; i = i+1; }. Invariant inference Intervals Octagons Polyhedra. Increasing precision.

creda
Download Presentation

Bias-Variance Tradeoffs in Program Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rahul Sharma, Aditya V. Nori, Alex Aiken Stanford MSR India Stanford Bias-Variance Tradeoffs inProgram Analysis

  2. Observation inti = 1, j = 0; while (i<=5) { j = j+i; i= i+1; } • Invariant inference • Intervals • Octagons • Polyhedra Increasing precision D. Monniaux and J. L. Guen. Stratified static analysis based on variable dependencies. Electr. Notes Theor. Comput. Sci. 2012

  3. Another Example: Yogi A. V. Nori and S. K. Rajamani. An empirical study of optimizations in YOGI. ICSE (1) 2010

  4. The Problem • Increased precision is causing worse results • Programs have unbounded behaviors • Program analysis • Analyze all behaviors • Run for a finite time • In finite time, observe only finite behaviors • Need to generalize

  5. Generalization • Generalization is ubiquitous • Abstract interpretation: widening • CEGAR: interpolants • Parameter tuning of tools • Lot of folk knowledge, heuristics, …

  6. Machine Learning • “It’s all about generalization” • Learn a function from observations • Hope that the function generalizes • Work on formalization of generalization

  7. Our Contributions • Model the generalization process • Probably Approximately Correct (PAC) model • Explain known observations by this model • Use this model to obtain better tools http://politicalcalculations.blogspot.com/2010/02/how-science-is-supposed-to-work.html

  8. Why Machine Learning? Interpolants classifiErs + - - - + + - + - + Rahul Sharma, Aditya V. Nori, Alex Aiken: Interpolants as Classifiers. CAV 2012

  9. PAC Learning Framework c • Assume an arbitrary but fixed distribution • Given (iid) samples from • Each sample is example with a label (+/-) + - - - + + - + - +

  10. PAC Learning Framework + - - - + + - + - + • Empirical error of a hypothesis

  11. PAC Learning Framework c + - - - + + - + - + • Empirical risk minimization (ERM) • Given a set of possible hypotheses (precision) • Select that minimizes empirical error

  12. PAC Learning Framework • Generalization error: for a new sample • Relate generalization error to empirical error and precision

  13. Precision + + • Capture precision by VC dimension (VC-d) • Higher precision -> More possible hypotheses - - + + + + H For any arbitrary labeling

  14. VC-d Example + + + + + + - - - - - - - - - - - - + + + + + + + + + - + - - -

  15. Regression Example Precision is low Underfitting Y X Precision is high Overfitting Good fit

  16. Main Result of PAC Framework • Generalization error is bounded by sum of • Bias: Empirical error of best available hypothesis • Variance: O(VC-d) Generalization error Variance Bias Possible hypotheses Increase precision

  17. Example Revisited • Invariant inference • Intervals • Octagons • Polyhedra inti = 1, j = 0; while (i<=5) { j = j+i; i= i+1; }

  18. Intuition • What goes wrong with excess precision? • Fit polyhedra to program behaviors • Transfer functions, join, widening • Too many polyhedra, make a wrong choice inti = 1, j = 0; while (i<=5) { j = j+i ; i = i+1; } Intervals: Polyhedra:

  19. Abstract Interpretation J. Henry, D. Monniaux, and M. Moy. Pagai: A path sensitive static analyser. Electr. Notes Theor. Comput. Sci. 2012.

  20. Yogi A. V. Nori and S. K. Rajamani. An empirical study of optimizations in YOGI. ICSE (1) 2010

  21. Case Study • Parameter tuning of program analyses • Overfitting? Generalization on new tasks? Train Benchmark Set (2490 verification tasks) Tuned , test length =500, … P. Godefroid, A. V. Nori, S. K. Rajamani, and S. Tetali. Compositional may-must program analysis: unleashing the power of alternation. POPL 2010.

  22. Cross Validation • How to set the test length in Yogi Benchmark Set (2490 verification tasks) Train Training Set (1743) Test Test Set (747)

  23. Cross Validation on Yogi • Performance on test set of tuned ’s 350 500

  24. Comparison • On 2106 new verification tasks • 40% performance improvement! • Yogi in production suffers from overfitting

  25. Recommendations • Keep separate training and test sets • Design of the tools governed by training set • Test set as a check • SVCOMP: all benchmarks are public • Test tools on some new benchmarks too

  26. Increase Precision Incrementally • Suggests incrementally increasing precision • Find a sweet spot where generalization error is low R. Jhala and K. L. McMillan. A practical and complete approach to predicate refinement. TACAS 2006.

  27. More in the paper • VC-d of TCMs: intervals, octagons, etc. • Templates: • Arrays, separation logic • Expressive abstract domains -> higher VC-d • VC-d can help choose abstractions

  28. Inapplicability • No generalization -> no bias-variance tradeoff • Certain classes of type inference • Abstract interpretation without widening • Loop-free and recursion-free programs • Verify a particular program (e.g., seL4) • Overfit on the one important program

  29. Conclusion • A model to understand generalization • Bias-Variance tradeoffs • These tradeoffs do occur in program analysis • Understand these tradeoffs for better tools

More Related