200 likes | 303 Views
Improved Fitness Functions for Automated Program Repair. Zachary P. Fry. Improved Fitness Functions. Automatic program repair can fix bugs. GenProg. Bugs. Fixes. Improved Fitness Functions. Automatic program repair can fix bugs. GenProg. Bugs. Fixes. Fitness Functions.
E N D
Improved Fitness Functions for Automated Program Repair Zachary P. Fry
Improved Fitness Functions Automatic program repair can fix bugs. GenProg Bugs Fixes
Improved Fitness Functions Automatic program repair can fix bugs. GenProg Bugs Fixes Fitness Functions
Improved Fitness Functions • The current fitness model is imprecise • Ideas: • Not all test cases are created equal • Test cases may not describe all relevant program behavior • Different types of bugs might benefit from different kinds of fixes • We propose to address the naivety of the current fitness representation.
Fitness Distance Correlation • “Quantifying the extent to which a GA fitness function approaches an ideal of heuristic search”1 • Informally, does a given fitness function produce values that correlate with some grounded notion of “closeness to a fix”? 1) T. Jones and S. Forrest. Fitness distance correlation as a measure of problem difficulty for genetic algorithms. In International Conference on Genetic Algorithms, pages 184–192, 1995.
Improved Fitness Functions • Measuring proximity to a fix • Insert, delete, and swapping lines in the program FIX Fix d(135) i(251,205) i(774,111) s(598,324) NO FIX
Improved Fitness Functions • Measuring proximity to a fix • Insert, delete, and swapping lines in the program FIX Fix d(135) i(251,205) i(774,111) s(598,324) M1 i(251,205) i(774,111) s(598,324) d(63) NO FIX
Improved Fitness Functions • Measuring proximity to a fix • Insert, delete, and swapping lines in the program FIX Fix d(135) i(251,205) i(774,111) s(598,324) ✓ ✓ ✓ ✗ 75% M1 i(251,205) i(774,111) s(598,324) d(63) NO FIX
Improved Fitness Functions • Measuring proximity to a fix • Insert, delete, and swapping lines in the program FIX Fix d(135) i(251,205) i(774,111) s(598,324) ✓ ✓ ✓ ✗ 75% M1 i(251,205) i(774,111) s(598,324) d(63) M2 d(84) s(844,265) i(774,111) i(735,431) NO FIX
Improved Fitness Functions • Measuring proximity to a fix • Insert, delete, and swapping lines in the program FIX Fix d(135) i(251,205) i(774,111) s(598,324) ✓ ✓ ✓ ✗ 75% M1 i(251,205) i(774,111) s(598,324) d(63) ✗ ✗ ✓ ✗ 25% M2 d(84) s(844,265) i(774,111) i(735,431) NO FIX
Improved Fitness Functions • The current model of fitness does not correlate well with proximity to a fix (0.145). • Hypothesis: By taking into account previously unused information about test cases, bugs, and fixes we can better inform the evolutionary bug fixing process to fix bugs faster and more often.
Improved Fitness Functions • Approach: weight test cases based on known fixes FIX NO FIX M1 Test Case 1 M1 Test Case 2 M2 M2 M3 M3 M4 M4
Improved Fitness Functions • Approach: weight test cases based on known fixes FIX NO FIX M1 Test Case 1 M1 Test Case 2 M2 M2 M3 M3 M4 M4
Improved Fitness Functions • Approach: weight test cases based on known fixes FIX NO FIX M1 Test Case 1 M1 Test Case 2 M2 M2 M3 M3 0.2 0.8 M4 M4
Improved Fitness Functions • Evaluation: • How much can we speed up fixes? • Computational time and monetary cost • Preliminary results • How many more bugs can we fix? • Fraction of previously unfixed bugs • Future work
Preliminary Results • For a sample of 15 bugs from one program, 31.3% of test cases show no correlation with actual fitness (closeness to a fix)
Preliminary Results • Some test cases are over 23x more correlated with actual fitness than others • Suggests an adequate weighting scheme using machine learning could fix more bugs, faster • This work and additional efforts to investigate other strategies for improving fitness functions are ongoing.
Applicability • When might this work? • Programs with expensive test suites – e.g. Php (12,000+) • When there is heavy overlap between test cases • Test suites/cases that fail to specify the bug • Assumptions? • Presence of historical bug fix data to mine • Test suites do not evolve drastically from bug to bug • Bugs for a given program are related on some level
Goals • By providing GenProg a better signal for mutants’ fitness we hope to: • Better direct the search – arrive at fixes faster, lowering cost (up to 38%) • In the limit, find more fixes for previously unfixed bugs
Goals • By providing GenProg a better signal for mutants’ fitness we hope to: • Better direct the search – arrive at fixes faster, lowering cost (up to 38%) • In the limit, find more fixes for previously unfixed bugs Questions?