1 / 46

The Influence of Size and Coverage on Test Suite Effectiveness

The Influence of Size and Coverage on Test Suite Effectiveness. International Symposium on Software Testing and Analysis, ISSTA 2009, Chicago, USA July 2009. Outline. Motivation Related work Experimental procedure Data analysis Case studies Discussion Conclusion and research direction.

halen
Download Presentation

The Influence of Size and Coverage on Test Suite Effectiveness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Influence of Size and Coverage on Test Suite Effectiveness International Symposium on Software Testing and Analysis, ISSTA 2009, Chicago, USA July 2009

  2. Outline • Motivation • Related work • Experimental procedure • Data analysis • Case studies • Discussion • Conclusion and research direction

  3. MotivationA Familiar Procedure: Coverage-based Test Adequacy Test Suite 0/3 0/20 Program P Coverage Degree (Line Coverage) Test Suite Effectiveness = Faults 3

  4. MotivationA FamiliarProcedure: Coverage-based Test Adequacy Test Suite TC-1 1/3 9/20 Program P Coverage Degree (Line Coverage) Test Suite Effectiveness = Faults 4

  5. MotivationA FamiliarProcedure: Coverage-based Test Adequacy Test Suite TC-2 TC-1 2/3 13/20 Program P Coverage Degree (Line Coverage) Test Suite Effectiveness = Faults 5

  6. MotivationA FamiliarProcedure: Coverage-based Test Adequacy Test Suite TC-3 TC-2 TC-1 2/3 16/20 Program P Coverage Degree (Line Coverage) Test Suite Effectiveness = Faults 6

  7. MotivationA FamiliarProcedure: Coverage-based Test Adequacy Test Suite TC-4 TC-3 TC-2 TC-1 2/3 18/20 Program P Coverage Degree (Line Coverage) Test Suite Effectiveness = Faults 7

  8. MotivationA FamiliarProcedure: Coverage-based Test Adequacy Test Suite TC-5 TC-4 TC-3 TC-2 TC-1 3/3 20/20 Program P Coverage Degree (Line Coverage) Test Suite Effectiveness = Faults 8

  9. Motivation(Known) Variables Involved in Coverage-based Test Adequacy Test Suite TC-5 TC-4 TC-3 TC-2 TC-1 3/3 20/20 Program P Coverage Degree (Line Coverage) Test Suite Effectiveness = Faults 9

  10. MotivationCoverage Degree: The Influencing Variable Test Suite TC-5 TC-4 TC-3 TC-2 TC-1 3/3 20/20 Program P Coverage Degree (Line Coverage) Test Suite Effectiveness Influence? 10 = Faults

  11. MotivationIs Coverage Degree the Only Influencing Variable? Test Suite TC-5 Size TC-4 TC-3 TC-2 TC-1 3/3 20/20 Influence? Program P Coverage Degree (Line Coverage) Test Suite Effectiveness Influence? 11 = Faults

  12. MotivationThe Size of Test Suite has Increased from 1 to 5 Test Suite The purpose of this study TC-5 Size TC-4 TC-3 TC-2 TC-1 3/3 20/20 Influence? Program P Coverage Degree (Line Coverage) Test Suite Effectiveness Influence? 12 = Faults

  13. MotivationThe Research Question • Is the effectiveness of a test suite because of: • Its size? • Its structural coverage degree? • What are the impacts of size and coverage on a test suite? 13

  14. MotivationVisualizations – Relationships Between Pairs of Variables(2D) 14

  15. MotivationVisualizations - Relationship Among All Three Variables (3D) 15

  16. Related work • [Frankl and Weiss,1993; Frankl and Iakounenko, 1998] • Test suites that achieve higher coverage tend to be more effective (fault detection power) • Effectiveness is constant until high coverage levels are achieved at which point it increases rapidly • [Andrews et al.; 2005, 2006] • Mutants can act similarly to real faults • Study confirmed the linkage between coverage and effectiveness (mutant detection power) • Coverage-based test suites more effective than random-based test suites of the same size

  17. Related workCont’d • [Rothermel et al., 2002] • The reduced test suites while preserving coverage are more effective than those that were reduced to the same size randomly eliminating test cases • In all above: • The support is indirect because the increase in effectiveness might be a result of the method of construction

  18. Experimental ProcedureGoal and Approach • Goal - Study relationships among coverage degree, size and effectiveness • Approach • Generate a set of test suites of various sizes • Compute their coverage degree • Compute their mutant detection power • Apply appropriate statistical analysis to determine the relationship among: • Independent variables “size” and “coverage degree” • Dependent variable “mutant detection rate”

  19. Experimental ProcedureTest suite generation and coverage measurement • For each program • 100 random-based test suites of each size from 1 to 50 • Variable “SIZE”, 0 < SIZE < 51 • For each test suite: • Measured the block, decision, C-use, and P-use coverage degrees using ATAC • Variable “CovDeg” with four instances for each of four coverage criterion • Conducting four similar analyses

  20. Experimental ProcedureTest suite effectiveness • For each program • Re-used mutants from an earlier study [Siami Namin et al., 2008] • Using Proteum mutant generator • Also, for each test suite • Measured mutant detection rate “AM” • Test suite effectiveness

  21. Experimental ProcedureDescription of subject programs – The Siemens set

  22. Data AnalysisProportions of feasible coverage for all criteria using ATAC

  23. Data AnalysisStatistical Techniques Applied • Visualizations – Shown earlier in this talk • ANCOVA • Principal component analysis • Correlation of coverage and effectiveness • Regression models

  24. Data AnalysisANCOVA • Variables (factors) • Continuous dependent variable: Mutant detection rate • Continuous independent variable: Coverage degree • Discrete independent variable: Size • p-values • < 0.001 for the two independent variables (factors) • Both size and coverage degree strongly influence effectiveness • Often an interaction between two variables

  25. Data AnalysisPrincipal Component Analysis

  26. Data AnalysisCorrelation of Coverage and Effectiveness

  27. Data AnalysisPurposes of Generating Regression Models • To determine whether: • including COVDEG in the models improves goodness of their fits • Transforming the data would affect the goodness of fits

  28. Data AnalysisRegression Models Generated and Examined • AM | log(AM) ~ SIZE • AM | log(AM) ~ log(SIZE) • AM | log(AM) ~ COVDEG • AM | log(AM) ~ log(COVDEG) • AM | log(AM) ~ SIZE + COVDEG • AM | log(AM) ~ log(SIZE) + COVDEG • AM | log(AM) ~ SIZE + log(COVDEG) • AM | log(AM) ~ log(SIZE) + log(COVDEG)

  29. Data AnalysisA Summary Comparison of the Regression Models • AM | log(AM) ~ SIZE + COVDEG: Better than • AM | log(AM) ~ SIZE • AM | log(AM) ~ COVDEG • AM | log(AM) ~ log(SIZE): Better than • AM | log(AM) ~ SIZE • Important indication: • Information about SIZE or COVDEG alone does not yield as good a prediction of effectiveness as information both SIZE and COVDEG

  30. Data AnalysisThe Best Regression Model • AM | log(AM) ~ SIZE • AM | log(AM) ~ log(SIZE) • AM | log(AM) ~ COVDEG • AM | log(AM) ~ log(COVDEG) • AM | log(AM) ~ SIZE + COVDEG • AM | log(AM) ~ log(SIZE) + COVDEG • AM | log(AM) ~ SIZE + log(COVDEG) • AM | log(AM) ~ log(SIZE) + log(COVDEG)

  31. Data AnalysisA summary of the linear models AM=B1.log(SIZE)+B2.CovDeg

  32. Data AnalysisPredicted vs. actual AM for AM ~ B1.log(size)+B2.coverage

  33. Data AnalysisPredicted vs. actual AM for AM ~ B1.log(size)+B2.coverage

  34. Data AnalysisPredicted vs. actual AM for AM ~ B1.log(size)+B2.coverage

  35. Data AnalysisPredicted vs. actual AM for AM ~ B1.log(size)+B2.coverage

  36. Case StudiesCross-checking With Other Programs • gzip.c (SIR Repository) • 5680 LOC; 214 test cases • concordance.c • Introduced for the first time as a subject program • Originally developed by Ralph L. Meyer • Jamie Andrews at UWO • Organized the code into one single file • 13 real faults identified • 372 test cases designed (black-box testing) • 1490 LOC

  37. Case StudiesMutants Generation for the Subject Programs of Case Studies • gzip.c • Used Proteum (Delamaro et al.) to generate mutants • 108 operators, 493402 mutants • Used the sufficient set of operators identified by Siami Namin et al. • 28 operators, 38621 mutants (7.8%) • For feasibility selected 1% of the sufficient set • 28 operators, 317 mutants • concordance.c • 867 non-equivalent mutants generated using the mutant generator used by Andrews et al.

  38. Case StudiesProcedures for the Subject Programs of Case Studies • Similar procedure for generating test suites • Coverage tool: gcov • Line coverage • Mutant detection rates also computed

  39. Case StudiesGoodness of fit of models measured by adjusted R2

  40. Case Studies Predicted vs. actual AM for AM ~ B1.log(size)+B2.coverage

  41. Case Studies Predicted vs. actual AF for AM ~ B1.log(size)+B2.coverage

  42. DiscussionA Non-Linear Relationship among SIZE, COVDEG, and AM AM ~ B1.log(SIZE) + B2.COVDEG • Explaining log(SIZE) part of the model: • Harder to find faults • Adding a test case to a test suite improves the effectiveness if the added test case finds another faults • The detected faults by the added test case is likely to be revealed by the test suite • The added test case is unlikely to improve the effectiveness if the test suite is already big enough

  43. DiscussionA Non-Linear Relationship among SIZE, COVDEG, and AM AM ~ B1.log(SIZE) + B2.COVDEG • Explaining COVDEG part of the model: • Faults associated with particular elements in the code • A test case exercising some elements associated with the faults is more likely to force a failure than one that does not • Regardless of the size of a test suite, a fault is more likely to be exposed by a test case if it covers new elements

  44. DiscussionImplications for Software Testers • Achieving high coverage leads to higher effectiveness • Because of log(SIZE) +COVDEG • Achieving higher coverage becomes more important than size as size grows

  45. Conclusion & Research DirectionsInfluence of SIZE and COVDEG on Effectiveness of Test Suites • Conclusion • Both SIZE and COVDEG independently influence the effectiveness • The relationship is not linear • AM ~ B1.log(SIZE) + B2.COVDEG • concordance.c as a new subject program • Future work • More experimental studies are needed • To validate the results • Validate generated models

  46. Thank You International Symposium on Software Testing and Analysis, ISSTA 2009, Chicago, USA July 2009

More Related