Adaptive Random Test Case Prioritization

Adaptive Random Test Case Prioritization Speaker: Bo Jiang* Co-authors: Zhenyu Zhang*, W.K.Chan†, T.H.Tse* *The University of Hong Kong †City University of Hong Kong

Contents • Background • Motivation • Adaptive Random Test Case Prioritization • Experiments and Results Analysis • Related Works • Conclusion & Future work

Regression Testing Techniques Obsolete Test Case Elimination Program P Test Suite T Test Suite T’ Test Suite T’ Test Suite T’ Test Suite T • Accounts for 50% of the cost of software maintenance. Test Case Reduction Test Case Augmentation Test Case Selection Test Case Prioritization Program P’ Test Suite T’

Test Case Prioritization • Definition • Test case prioritization permutes a test suite T for execution to meet a chosen testing goal. • Typical testing goals • Rate of code coverage • Rate of fault detection • Rate of requirement coverage • Merits • No impact on the fault detection ability

Coverage-based Test Case Prioritization Technique • Total-statement/function/branch • Highest code coverage first • Resolve tie-case randomly • Additional-statement/function/branch • Additional highest code coverage first • Reset when no more coverage can be achieved • Resolve tie-case randomly • Disadvantages • Hard to scale to larger programs

Problem With Total Techniques GREP FLEX APFD Elbaum et al. @ TSE 2002

Problem With Total(greedy) Techniques GREP FLEX APFD Total strategy may NOT be effective for real-life program Elbaum et al. @ TSE 2002

45 40 35 30 25 Time Used for Prioritization 20 15 10 5 0 1 2 3 4 5 6 Random Siemens Problems with Additional Techniques Total Siemens Total Unix Random Unix Additional Siemens Additional Unix

45 40 35 30 25 Time Used for Prioritization 20 15 10 5 0 1 2 3 4 5 6 Random Siemens Problems with Additional Techniques Additional Techniques may NOT be efficient for real-life programs. Total Siemens Total Unix Random Unix Additional Siemens Additional Unix

45 40 35 30 25 Time Used for Prioritization 20 15 10 5 0 1 2 3 4 5 6 Random Siemens Problems with Additional Techniques Can we find a prioritization techniques that is both effective and efficient for real life program? Total Siemens Total Unix Random Unix Additional Siemens Additional Unix

Adaptive Random Testing (ART) • Adaptive Random Testing (ART) • A technique for test case generation • Evenly spread randomly generated test cases across the input domain. • In empirical study, ART can detect failures using up to 50% fewer test cases than random testing.

Fixed-Sized-Candidate-Set ART Algorithm • Random generate a test case and execute it.

Fixed-Sized-Candidate-Set ART Algorithm • Randomly generate a set of candidate test cases.

Fixed-Sized-Candidate-Set ART Algorithm • For each candidate test case, find its nearest neighbor within the executed test cases.

Fixed-Sized-Candidate-Set ART Algorithm • Select the test case which has longest distance with its nearest neighbor and execute it.

Fixed-Sized-Candidate-Set ART Algorithm • Randomly generate a set of candidate test cases.

Fixed-Sized-Candidate-Set ART Algorithm • For each candidate test case, find its nearest neighbor within the executed test cases.

Fixed-Sized-Candidate-Set ART Algorithm • Select the test case which has longest distance with its nearest neighbor and execute it.

Fixed-Sized-Candidate-Set ART Algorithm • Repeat until a failure is encountered. X

Adaptive Random Testing (ART) • ART is based on the observation that failure turned to cluster across the input domain. • Intuitively, evenly spread the test case may increase the probability of exposing the first fault faster. • In test case prioritization, we also want to increase the rate of fault detection.

Use ART directly for test case prioritization? • The variety of black-box input information makes it hard to define a general distance metric. • Video streams • Images • Xml • … • The white-box coverage information of the previously executed test cases are readily available • Statement coverage • Branch coverage • Function coverage • And…

Distribution of Failures in Profile Space on LilyPond William Dickinson et al. @ FSE, 2001.

MDS Display of Distribution of Failures in Profile Space on LilyPond Failures tend to cluster together. William Dickinson et al. @ FSE, 2001.

MDS Display of Distribution of Failures in Profile Space on GCC William Dickinson et al. @ FSE, 2001.

Distribution of Failures in Profile Space on GCC Failures tend to cluster together. William Dickinson et al. @ FSE, 2001.

Use ART directly for test case prioritization? • The variety of black-box input information makes it hard to define a uniform distance metric. • Video streams • Images • Xml • … • The white-box coverage information of the previously executed test cases are readily available • Statement coverage • Branch coverage • Function coverage • … Why NOT use such low-cost white-box information to evenly spread test cases across the code coverage space?

Adaptive Random Test Case Prioritization • Generate candidate set • Random select a test case into the candidate set • If code coverage improve, continue; Otherwise, stop. • Merits: No magic number, non-parametric • Select the farthest candidate from the prioritized set • Distance between test cases • Distance between a candidate test case and the already prioritized test cases • Repeat until all test cases are prioritized

Adaptive Random Test Case Prioritization • How to measure the distance of test cases • Jaccard Distance • General distance metric for binary data • Can also use other distance metric for substitution. • How to select the test case from the candidate set that is farthest away from the already prioritized test cases? • Maximize the minimumdistance (maxmin for short) • Chen et al. @ ASIAN '04, LNCS 2004 • Maximize the average distance (maxavg for short) • Ciupa et al. @ ICSE 2008 • Maximize the maximum distance (maxmax for short)

Contents • Background • Motivation • Adaptive Random Test Case Prioritization • Experiments and Results Analysis • Related Works • Conclusion & Future Work

Research Questions • Do different levels of coverage information have significant impact on ART techniques? • Do different definitions of test set distances have significant impacts on ART techniques? • Are ART techniques efficient?

Subject Programs

Techniques Studied in the Paper

Experiment Setup • Dynamic coverage information collection • gcov tool • Effectiveness Metric • APFD: weighted average of the percentage of faults detected over the life of the suite • Process • For each of the 11 subject programs, randomly select 20 test suite, and repeat 50 times for each ART techniques.

Research Questions • Do different levels of coverage information have significant impact on ART techniques? • Do different definitions of test set distances have significant impacts on ART techniques? • Are ART techniques efficient?

Do different levels of coverage information have significant impact on ART techniques? • Fix the other variable: definitions of test set distances. • Perform multiple comparison between each pair of coverage information and gather the statistics.

Do different levels of coverage information have significant impact on ART techniques? • Fix the other variable: definitions of test set distances. • Perform multiple comparison between each pair of coverage information and gather the statistics. As confirmed by previous research: Branch > Statement > Function

Research Questions • Do different levels of coverage information have significant impact on ART techniques? • Branch > Statement > Function • Do different definitions of test set distances have significant impacts on ART techniques? • Is ART techniques efficient?

The Impact of Test Set Distance • Fix the other variable: definitions of coverage information • Perform multiple comparison between each pair of test set distance and gather the statistics.

The Impact of Test Set Distance • Fix the other variable: definitions of coverage information • Perform multiple comparison between each pair of test set distance and gather the statistics. Max-Min > Max-Avg≈ Max-Max

Best ART Technique ART-br-maxmin is the best ART prioritization Technique

Research Questions • Do different levels of coverage information have significant impact on ART techniques? • Branch > Statement > Function • Do different definitions of test set distances have significant impacts on ART techniques? • Max-Min > Max-Avg > Max-Max • How doesART-br-maxmincompare with greedy? • Is ART techniques efficient?

Multiple Comparisons for ART-br-maxmin on Siemens

Multiple Comparisons for ART-br-maxmin on Siemens Only maginal difference difference between ART-br-maxmin and traditional coverage-based techniques, and it is not statistical significant.

Multiple Comparisons for ART-br-maxmin on UNIX

Adaptive Random Test Case Prioritization