380 likes | 486 Views
Relaxing the Testcase Assumptions in GenProg. Jonathan Dorn. EVALUATE FITNESS. INPUT. DISCARD. ACCEPT. GenProg. OUTPUT. MUTATE. EVALUATE FITNESS. INPUT. DISCARD. ACCEPT. GenProg. OUTPUT. MUTATE. EVALUATE FITNESS. INPUT. DISCARD. ACCEPT. GenProg. OUTPUT. MUTATE.
E N D
Relaxing the Testcase Assumptionsin GenProg Jonathan Dorn
EVALUATE FITNESS INPUT DISCARD ACCEPT GenProg OUTPUT MUTATE
EVALUATE FITNESS INPUT DISCARD ACCEPT GenProg OUTPUT MUTATE
EVALUATE FITNESS INPUT DISCARD ACCEPT GenProg OUTPUT MUTATE
Guide the search. • Validate the repair. EVALUATE FITNESS Testcases in GenProg
Indicate which functionality to repair. Indicate which functionality to retain. Negative Testcases Positive Testcases Testcases in GenProg
Testcasesexist. • They are correct. • They are comprehensive. Testcase Assumptions
Human-written testcases are expensive. • Development and maintenance costs. • Legacy code or remote deployments. • “Complete coverage” approached but rarely achieved. Testcase Difficulties
Machine-generatedtestcases! How Can We Help?
Automatic Testcase Generation • Evaluation • Preliminary Results Agenda
“Competent Programmer” assumption • Correct behavior already encoded in program. • Extract and re-encode as testcases. Automatic Testcases
Input Output Oracle Comparator Pass / Fail The Oracle Comparator Model
Test Setup • Network connections, opened files, etc. • Argument values • Run Function Under Test • Check Results • Return value • Side-effects Structure of a Testcase
Test Setup • Network connections, opened files, etc. • Argument values • Run Function Under Test • Check Results • Return value • Side-effects Automatic Testcases
Test Setup • Network connections, opened files, etc. • Argument values • Run Function Under Test • Check Results • Return value • Side-effects Automatic Testcases
DART • CUTE • CREST • … • Symstra • Austin • Pex Long-established research area Test Input Generation
Generate initial input. • Run test; record constraints at branches. • Negate one constraint. • Solve for new input. • Repeat. Concolic Execution
int f(int x) { • if (x < 10) • return 1; • else • return 2; • } Input: x = 456123 pred = {} Concolic Execution
int f(int x) { • if (x < 10) • return 1; • else • return 2; • } Input: x = 456123 pred = {} Concolic Execution
int f(int x) { • if (x < 10) • return 1; • else • return 2; • } Input: x = 456123 pred = {x>10} Concolic Execution
int f(int x) { • if (x < 10) • return 1; • else • return 2; • } Input: x = 9 pred = {} Concolic Execution
Test Setup • Network connections, opened files, etc. • Argument values • Run Function Under Test • Check Results • Return value • Side-effects Automatic Testcases
Test Setup • Network connections, opened files, etc. • Argument values • Run Function Under Test • Check Results • Return value • Side-effects Automatic Testcases
Oracle: check for invariants. • What are the interesting invariants? • Checking that 1 = 1 is not useful. • Invariants are true for all runs of program but violated for some runs of not-quite-the-same program. μtest
f = min(a,b) • Identify predicates that are true for all inputs. μtest
Mutate the function. • Identify predicates that fail in the mutants. μtest
Take intersection as oracle invariants. May miss invariants if mutants do not fail. μtest
Test Setup • Network connections, opened files, etc. • Argument values • Run Function Under Test • Check Results • Return value • Side-effects Automatic Testcases
Does augmenting the human-generated test suite enable more fixes? • Do automatic testcases miss desired functionality? Research Questions
Only generated testcases. • Pretend human test cases do not exist. • Generated testcases + X% of original. • How much is human burden reduced? Evaluation
*Invalid repair. **No repair found. Preliminary Results
*Invalid repair. **No repair found. Preliminary Results
Testcasesexist.(concolic execution + μTEST) • They are correct.(compent programmer assumption) • They are comprehensive.(may need small number of human tests) Testcase Assumptions
We can create testcasesautomatically to augment human-created tests. • Initial results suggest these tests permit repairs without comprehensive test suites. Conclusion