640 likes | 742 Views
A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 each. Claire Le Goues. Michael Dewey-Vogt. Stephanie Forrest. Westley Weimer. “Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.” – Mozilla Developer, 2005.
E N D
A Systematic Study of Automated Program Repair: Fixing 55 out of 105 bugs for $8 each • Claire Le Goues • Michael Dewey-Vogt • Stephanie Forrest • Westley Weimer http://genprog.cs.virginia.edu
“Everyday, almost 300 bugs appear […] far too many for only the Mozilla programmers to handle.” • – Mozilla Developer, 2005 • Annual cost of software errors in the US: $59.5 billion (0.6% of GDP). • Average time to fix a security-critical error: 28 days. Problem: Buggy Software 10%: Everything Else 90%: Maintenance http://genprog.cs.virginia.edu
How bad is it? http://genprog.cs.virginia.edu
…Really? • Tarsnap: 125 spelling/style 63 harmless 11 minor • 1 major • 75/200 = 38% TP rate • $17 + 40 hours per TP http://genprog.cs.virginia.edu
…Really? • Tarsnap: 125 spelling/style 63 harmless 11 minor • 1 major • 75/200 = 38% TP rate • $17 + 40 hours per TP http://genprog.cs.virginia.edu
…Really? http://genprog.cs.virginia.edu
Solution: Pay Strangers http://genprog.cs.virginia.edu
Solution: Pay Strangers http://genprog.cs.virginia.edu
Solution: Automate http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic1, scalable, competitive bug repair. 1C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “GenProg: A generic method for automated software repair,” Transactions on Software Engineering, vol. 38, no. 1, pp. 54– 72, 2012. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest, “Automatically finding patches using genetic programming,” in International Conference on Software Engineering, 2009, pp. 364–367. http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic1, scalable, competitive bug repair. 1C. Le Goues, T. Nguyen, S. Forrest, and W. Weimer, “GenProg: A generic method for automated software repair,” Transactions on Software Engineering, vol. 38, no. 1, pp. 54– 72, 2012. W. Weimer, T. Nguyen, C. Le Goues, and S. Forrest, “Automatically finding patches using genetic programming,” in International Conference on Software Engineering, 2009, pp. 364–367. http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu
INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE
INPUT EVALUATE FITNESS DISCARD ACCEPT OUTPUT MUTATE
Bird’s Eye View http://genprog.cs.virginia.edu • Search: random (GP) search through nearby patches. • Approach: compose small random edits. • Where to change? • How to change it?
Input: 1 2 4 3 7 5 6 9 10 8 11 12 http://genprog.cs.virginia.edu
Input: 1 2 4 3 7 5 6 Legend: High change probability. Low change probability. Not changed. 9 10 8 11 12 http://genprog.cs.virginia.edu
1 2 4 3 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu
1 2 4 3 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu
1 2 4 3 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu
1 2 4 3 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu
1 2 4 3 4 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu
1 2 4 3 4 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 8 11 12 http://genprog.cs.virginia.edu
1 2 4 3 4 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 4’ 11 12 http://genprog.cs.virginia.edu
1 2 4 3 4 7 5 6 An edit is: • Replace statement X with statement Y • Insert statement X after statement Y • Delete statement X 9 10 4’ 11 12 http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu
Scalable: Search Space 1 2 4 3 5 6 7 9 8 10 11 12 http://genprog.cs.virginia.edu http://genprog.cs.virginia.edu 32 http://genprog.cs.virginia.edu 32
Scalable: Search Space 1 2 4 3 5 6 7 9 8 10 11 12 http://genprog.cs.virginia.edu http://genprog.cs.virginia.edu 33 http://genprog.cs.virginia.edu 33
Scalable: Search Space 1 2 4 3 5 6 7 9 8 10 11 12 http://genprog.cs.virginia.edu http://genprog.cs.virginia.edu 34 http://genprog.cs.virginia.edu 34
Scalable: Search Space 1 2 4 3 5 6 7 9 8 10 11 12 http://genprog.cs.virginia.edu http://genprog.cs.virginia.edu 35 Fix localization: intelligently choose code to move. http://genprog.cs.virginia.edu 35
Scalable: representation 1 2 3 New: Naïve: Delete(3) 4 5 5’ 1 5 2 1 4 2 Replace(3,5) 4 5 http://genprog.cs.virginia.edu Input:
Scalable: representation 1 2 3 New: Naïve: Delete(3) 4 5 New fitness, crossover, and mutation operators to work with a variable-length genome. 5’ 1 5 2 1 4 2 Replace(3,5) 4 5 http://genprog.cs.virginia.edu Input:
Scalable: Parallelism • Fitness: • Subsample test cases. • Evaluate in parallel. • Random runs: • Multiple simultaneous runs on different seeds. http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu
Automated Program Repair GenProg: automatic, scalable, competitive bug repair. http://genprog.cs.virginia.edu
How manybugs can GenProg fix? • How much does it cost? Competitive http://genprog.cs.virginia.edu
Setup http://genprog.cs.virginia.edu • Goal: systematically test GenProg on a general, indicative bug set. • General approach: • Avoid overfitting: fix the algorithm. • Systematically create a generalizable benchmark set. • Try to repair every bug in the benchmark set, establish grounded cost measurements.
Setup http://genprog.cs.virginia.edu • Goal: systematically evaluate GenProg on a general, indicative bug set. • General approach: • Avoid overfitting: fix the algorithm. • Systematically create a generalizable benchmark set. • Try to repair every bug in the benchmark set, establish grounded cost measurements.
Challenge: Indicative Bug set http://genprog.cs.virginia.edu
Systematic Benchmark Selection • Goal: a large set of important, reproduciblebugs in non-trivialprograms. • Approach: use historical data to approximate discovery and repair of bugs in the wild. http://genprog.cs.virginia.edu
Systematic Benchmark Selection http://genprog.cs.virginia.edu • Consider top programs from SourceForge, Google Code, Fedora SRPM, etc: • Find pairs of viable versions where test case behavior changes. • Take all tests from most recent version. • Go back in time through the source control. • Corresponds to a human-written repair for the bug tested by the failing test case(s).
Benchmarks http://genprog.cs.virginia.edu
Benchmarks http://genprog.cs.virginia.edu
Challenge: Grounded Cost Measurements http://genprog.cs.virginia.edu