530 likes | 539 Views
This research paper analyzes the effectiveness of selective mutation and random mutant selection techniques in mutation testing. It also explores the concept of mutant subsumption and the use of dominator mutants for improving test coverage.
E N D
Analyzing the Validity of Selective Mutation with Dominator Mutants FSE 2016, Seattle, Washington, USA Bob Kurtz, Paul Ammann, Jeff Offutt, Marcio E. Delamaro, Mariet Kurtz, and Nida Gökçe George Mason University, USA Universidade de São Paulo, Brazil The MITRE Corporation MuğlaSıtkıKoçman University, Turkey
Mutation testing in 2016 GMU MASON FSE 2016 ACME Co.
Mutation testing in 2016 int max(inti, int j) { if (i > j) { return i; } else { return j; } } int max(inti, int j) { if (i<= j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return j; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return j++; } } Mutation Operators ROR LVR AOI GMU MASON FSE 2016 ACME Co.
Mutation testing in 2016 Mutants Original int max(inti, int j) { if (i<= j) { return i; } else { return j; } } int max(inti, int j) { if (i > i) { return i; } else { return j; } } int max(inti, int j) { if (i>= j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return i; } } int max(inti, int j) { if (i > j) { return i; } else { return j++; } } int max(inti, int j) { if (i > j) { return j; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return --i; } else { return j; } } int max(inti, int j) { if (i<= j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return --i; } else { return j; } } int max(inti, int j) { if (i>= j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return j; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return j++; } } int max(inti, int j) { if (i > j) { return i; } else { return i; } } int max(inti, int j) { if (i > i) { return i; } else { return j; } } int max(inti, int j) { if (i) { return i; } else { return j; } } int max(inti, int j) { if (i) { return i; } else { return j; } } ≠ GMU assertEquals(2, max(2,1)); MASON FSE 2016 ACME Co.
Mutation testing in 2016 20,000 Mutants int max(inti, int j) { if (i > j) { return i; } else { return j; } } 2,000 SLOC FSE 2016 ACME Co.
Mutation testing in 2016 FSE 2016 ACME Co.
Mutation testing in 2016 int max(inti, int j) { if (i > j) { return i; } else { return j; } } int max(inti, int j) { if (i>= j) { return i; } else { return j; } } = FSE 2016 ACME Co.
Mutation testing in 2016 FSE 2016 ACME Co.
Mutation testing in 2016 ??? Equivalent mutants! You can’t kill ‘em! How many? GMU 10-20% Path too long, no end in sight! MASON FSE 2016 ACME Co.
Mutation testing in 2016 FSE 2016 ACME Co.
What went wrong? • Equivalent mutants • Syntactically different but semantically identical to the original program • Cannot be killed by tests, must be manually evaluated one-by-one • Requires unrealistic amounts of work! int max(inti, int j) { if (i > j) { return i; } else { return j++; } } int max(inti, int j) { if (i>= j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return j; } } = FSE 2016
What went wrong? • Redundant mutants • A mutant is redundant if it isalways killed when someother mutant is killed • ≈98% of non-equivalent mutants • How far along is testing? FSE 2016
Mutant reduction strategies • Selective mutation • Use the “best” operators toproduce fewer mutants int max(inti, int j) { if (i > j) { return i; } else { return j; } } CCDL ORAN SRSR … VLSR OAAN FSE 2016
Mutant reduction strategies • Random mutant selection • Typically select ≈5%of all mutants int max(inti, int j) { if (i > j) { return i; } else { return j; } } CCDL ORAN SRSR … VLSR OAAN FSE 2016
Research questions • Selective mutation has generally been thought of as the best approach since 1993 • Random mutant selection has been used since 1980 • How effective are these techniques? • Can we improve on them? FSE 2016
Mutant subsumption • Given a set of mutants M, mutant mi subsumes mutant mj (mi → mj) iff: • Some test kills mi • All tests that kill mi also kill mj All Tests + + + + Tests that kill mj + + + + Tests that kill mk + + + Tests that kill mi + + + + + + + mi → mj + + FSE 2016 [Ammann, et al., ICST 2014]
Subsumption graph • Shows the subsumption relationship between mutants • Leaf nodes not subsumed by any other mutant are “dominator mutants” • Tests that kill all these dominators will alsokill all other mutants • All the other mutants are redundant! m1 m1 m3 m2 m3 m2 m4 m5 m4 m5 m6 m7 m6 m7 m8 m8 FSE 2016 [Kurtz, et al., Mutation 2014]
Testing model Begin Using selective mutation, random mutants, or some other technique Select some mutant set M All Mutants Mutants Mutants Select minimal test set T that kills M All Tests Mutants Mutants Find all mutants killed by T MutationScores Dominator Scores End FSE 2016
Testing model Begin Select some mutant set M All Mutants Mutants Nondeterministic process Mutants Select minimal test set T that kills M All Tests Mutants Mutants Find all mutants killed by T MutationScores Dominator Scores End FSE 2016
Testing model Begin Select some mutant set M All Mutants Mutants Mutants Select minimal test set T that kills M All Tests Mutants Mutants Find all mutants killed by T MutationScores Dominator Scores End FSE 2016
Siemens suite scores • Mutation and dominator score using the 5 E-Selective mutation operators from Mothra[Offutt, et al., 1993] FSE 2016 [Ammann, et al., ICST 2014]
The Siemens suite? Really?? REAL SOFTWARE SIEMENS SUITE FSE 2016
The Siemens suite? Really! • The Siemens suite is useful here because: • It has a thorough test suite that elicits rich subsumption relationships between mutants • The Proteum mutation tool for C has a very large set of mutation operators • If we can show that selective mutation is not a good solution for any particular program, then it is not a good solution for programs in general FSE 2016
Mutation vs. Dominator Score Mutation score for the Siemens replace program (1,000 iterations) FSE 2016 [Kurtz, et al., Mutation 2016]
Mutation vs. Dominator Score Mutation and dominator score for the Siemens replace program FSE 2016 [Kurtz, et al., Mutation 2016]
1-4 Operators for print_tokens Our Goal We define work as the number of mutants the engineer must evaluate (write a killing test or to show it’s equivalent) FSE 2016
1-4 Operators for print_tokens 231,525 operator combinations We define work as the number of mutants the engineer must evaluate (to write a killing test or to show it’s equivalent) FSE 2016
1-4 Operators for print_tokens Optimal selective operator combinations for this program 231,525 operator combinations FSE 2016
1-4 Operators for Siemens suite Operator combination are far less optimal for all programs 489,504 operator combinations There is no set of up to 4 operators that is good for every program FSE 2016
Selective and random Traditional mutant reduction approaches are not optimal, consistent with prior research that selective and random are similar FSE 2016
More than 4 operators • We’d like to expand to sets of mutation operators larger than 4 • Brute force approach is infeasible • 1-4 operators = 489,504 sets ≈ 24 hours • 1-59 operators ≈ 6x1017 sets ≈ 3x109 years • Approximate test results based on subsumption • Spearman’s rank ρ=0.966 • Recalculate best operator combinations in the usual way m1 m1 m3 m2 m3 m2 m4 m5 m4 m5 m5 m6 m7 m6 m6 m7 m8 m8 FSE 2016
Optimal selective solutions Best 1-program solutions Best 1-59 operator combinations Even the optimal operator combinations are not very good FSE 2016
Is mutation testing dead? I don’t think so, but we need to do something better! FSE 2016
Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features int max(inti, int j) { if (i > j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return --i; } else { return j; } } int max(inti, int j) { if (i) { return i; } else { return j; } } int max(inti, int j) { if (i > i) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return i; } } int max(inti, int j) { if (i > j) { return i; } else { return j++; } } int max(inti, int j) { if (i > j) { return j; } else { return j; } } FSE 2016 [Future work]
Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features Use static analysis to determine partial subsumption & tests int max(inti, int j) { if (i > j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return --i; } else { return j; } } int max(inti, int j) { if (i) { return i; } else { return j; } } int max(inti, int j) { if (i > i) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return i; } } int max(inti, int j) { if (i > j) { return i; } else { return j++; } } int max(inti, int j) { if (i > j) { return j; } else { return j; } } FSE 2016 [Kurtz, et al., Mutation 2015]
Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features Use static analysis to determine partial subsumption & tests Execute tests to refine subsumption and kill mutants int max(inti, int j) { if (i > j) { return i; } else { return j; } } FSE 2016 [Kurtz, et al., Mutation 2015]
Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features Use static analysis to determine partial subsumption & tests Execute tests to refine subsumption and kill mutants Remove subsumed mutants and redundant tests int max(inti, int j) { if (i > j) { return i; } else { return j; } } FSE 2016 [Kurtz, et al., Mutation 2014]
Mutation testing in 2020? Use machine learning to generate optimized mutants based on program features Use static analysis to determine partial subsumption & tests Execute tests to refine subsumption and kill more mutants Remove subsumed mutants and redundant tests Output a set of tests and a FEW probable-high-value mutants for the engineer to kill int max(inti, int j) { if (i > j) { return i; } else { return j; } } int max(inti, int j) { if (i > j) { return i; } else { return i; } } int max(inti, int j) { if (i > j) { return --i; } else { return j; } } FSE 2016
Conclusions • Mutation score is an imprecise metric inflated by redundant mutants • Researchers should use dominator score instead • Current mutant selection techniques are not optimal • There are no mutation operator combinations that are effective for a range of programs • To optimize dominator score per unit of work, we need to customize mutants to the program under test! FSE 2016
Backup Slides FSE 2016
Future work • Develop a mutant selection approach that is customized for a program • Can’t tell if a mutation isgood or bad withoutprogram context • Use machine learning • Correlate program featuresand mutation operatorswith mutant “goodness” • Select more dominatorand near-dominatormutants • Select fewer highly-subsumed and equivalentmutants FSE 2016
Mutant Subsumption Graphs • Given the following score function: m1 m3,m6 m5 m2 m7,m9 m4 m8,m10 When we construct the mutant subsumption graph from the score function, we see three root nodes that are not subsumed by any other mutants. One mutant from each of these nodes forms a dominator set: { m1, m3, m5}, { m1, m6, m5} All other mutants are redundant! FSE 2016
Dominator mutation score • Assume we execute test t1 m1 m3,m6 m5 m2 m7,m9 m4 m8,m10 Killed mutants are shown in gray Mutation score: 7 of 9 killable mutants = 0.78 Dominator score: 1 of 3 mutants in a dominator set = 0.33 FSE 2016
Redundancy and equivalency • We want to investigate how the accuracy changes as the number of redundant and equivalent mutants change, we need a way to measure redundancy and equivalency, preferably in a decoupled manner FSE 2016
Work and Normalized Work • How much effort is required? • We use a simple definition: the number of mutants that a tester must examine • To effectively compare work between different programs with different numbers of mutants, we define equivalent work: FSE 2016
Redundancy and Work FSE 2016
Equivalency and Work FSE 2016
Average / worst-case correlation Spearman’s rank correlation ρ=0.966 FSE 2016
What’s an optimal solution? We score non-optimal points using the Hausdorff distance (dH), the distance from the nearest optimal point Hausdorff Distance dh FSE 2016
Today’s common techniques FSE 2016