460 likes | 664 Views
The Effect of Code Coverage on Fault Detection Capability: An Experimental Evaluation and Possible Directions. Teresa Xia Cai Group Meeting Feb. 21, 2006. Outline. Testing coverage and testing strategies Research questions Experimental setup Results and analysis
E N D
The Effect of Code Coverage on Fault Detection Capability: An Experimental Evaluation and Possible Directions Teresa Xia Cai Group Meeting Feb. 21, 2006
Outline • Testing coverage and testing strategies • Research questions • Experimental setup • Results and analysis • Discussions and conclusions
Introduction • Test case selection and evaluation is a key issue in software testing • Testing strategies aim to select an effective test set to detect as more faults as possible • Black-box testing (functional testing) • White-box testing (structural testing)
White-box testing schemes:Control/data flow coverage • Code coverage - measured as the fraction of program codes that are executed at least once during the test. • Block coverage - the portion of basic blocks executed. • Decision coverage - the portion of decisions executed • C-Use- computational uses of a variable. • P-Use- predicate uses of a variable
Code coverage: an indicator for test effectiveness? • Supportive empirical studies • high code coverage brings high software reliability and low fault rate • both code coverage and fault detected in programs grow over time, as testing progresses. • Weyuker et al (1985, 1988, 1990) • Horgan, London & Lyu (1994) • Wong, Horgan, London & Mathur (1994) • Frate, Garg, Mathur & Pasquini (1995) • Oppositive empirical studies • Can this be attributed to causal dependency between code coverage and defect coverage? • Briand & Pfahl (2000)
Black-box testing schemes:testing profiles • Functional testing– based on specified functional requirements • Random testing- the structure of input domain based on a predefined distribution function • Normal operational testing– based on normal operational system status • Exceptional testing- based on exceptional system status { {
Outline • Testing coverage and testing strategies • Research questions • Experimental setup • Results and analysis • Discussions and conclusions
Research questions • Is code coverage a positive indicator for testing effectiveness? • Does such effect vary under various testing profiles? • Does such effect vary with different coverage measurements? • Is code coverage a good filter to reduce the size of effective test set?
Outline • Testing coverage and testing strategies • Research questions • Experimental setup • Results and analysis • Discussions and conclusions
Experimental setup • In spring of 2002, 34 teams are formed to develop a critical industry application for a 12-week long project in a software engineering course • Each team composed of 4 senior-level undergraduate students with computer science major from the Chinese University of Hong Kong
Experimental project description Redundant Strapped-Down Inertial Measurement Unit (RSDIMU) • Geometry • Data flow diagram
Software development procedure • Initial design document ( 3 weeks) • Final design document (3 weeks) • Initial code (1.5 weeks) • Code passing unit test (2 weeks) • Code passing integration test (1 weeks) • Code passing acceptance test (1.5 weeks)
Mutant creation • Revision control was applied in the project and code changes were analyzed • Faults found during each stage were also identified and injected into the final program of each version to create mutants • Each mutant contains one design or programming fault • 426 mutants were created for 21 program versions
Setup of evaluation test • A test coverage tool was employed to analyze the compare testing coverage • 1200 test cases were exercised on 426 mutants • All the resulting failures from each mutant were analyzed, their coverage measured, and cross-mutant failure results compared • 60 Sun machines running Solaris were involved in the test, where one cycle took 30 hours and a total of 1.6 million files around 20GB were generated
Outline • Testing coverage and testing strategies • Research questions • Experimental setup • Results and analysis • Effective of code coverage • Under various testing profiles • With different coverage measurements • Effective test set • Discussions and conclusions
Relations between numbers of mutants against effective percentage of coverage
The correlation: various test regions • Test case coverage contribution on mutant coverage • Test case coverage contribution on block coverage I II III IV V VI I II III IV V VI
I II III IV V VI Test cases description
Outline • Testing coverage and testing strategies • Research questions • Experimental setup • Results and analysis • Effective of code coverage • Under various testing profiles • With different coverage measurements • Effective test set • Discussions and conclusions
In various test regions • Linear regression relationship between block coverage and defect coverage in the whole test set • Linear modeling fitness in various test case regions
In various test regions (cont’) • Linear regression relationship between block coverage and defect coverage in region IV • Linear regression relationship between block coverage and defect coverage in region VI
In various test regions (cont’) Observations: • Code coverage: a moderate indicator • Reasons behind the big variance between region IV and VI
With functional/random testing • Code coverage: – a moderate indicator • Random testing – a necessary complement to functional testing • Similar code coverage • Both have high fault detection capability
With functional/random testing (cont’) • Failure number of mutants detected only by functional testing or random testing
Under normal operational / exceptional testing • The definition of operational status and exceptional status • Defined by specification • Application-dependent • For RSDIMU application • Operational status: at most two sensors failed as the input and at most one more sensor failed during the test • Exceptional status: all other situations • The 1200 test cases are classified to operational and exceptional test cases according to their inputs and outputs
Under normal operational / exceptional testing (cont’) • Normal operational testing • very weak correlation • Exceptional testing • strong correlation
Under normal operational / exceptional testing (cont’) • Normal testing: small coverage range (48%-52%) • Exceptional testing: two main clusters
Under normal operational / exceptional testing (cont’) • Failure number of mutants detected only by normal operational testing or exceptional testing
Under testing profile combinations • Combinations of testing profiles • Observations: • Combinations containing exceptional testing indicate strong correlations • Combinations containing normal testing inherit weak correlations
Outline • Testing coverage and testing strategies • Research questions • Experimental setup • Results and analysis • Effective of code coverage • Under various testing profiles • With different coverage measurements • Effective test set • Discussions and conclusions
With different coverage measurements • Similar patterns as block coverage • Insignificant difference under normal testing • Decision and P-use have a bit larger correlation, as they relate to change of control flow
Outline • Testing coverage and testing strategies • Research questions • Experimental setup • Results and analysis • Effective of code coverage • Under various testing profiles • With different coverage measurements • Effective test set • Discussions and conclusions
The reduction of the test set size using coverage increase information
Outline • Testing coverage and testing strategies • Research questions • Experimental setup • Results and analysis • Discussions and conclusions
Answers to RQs • Is code coverage a positive indicator for testing effectiveness? • Our answer is supportive • At most situations (61.5%), there is an coverage increase when a test case detect additional faults. • Under some functional and exceptional testing region, the correlation between code coverage and fault coverage is pretty high • When more cumulated code coverage have been achieved, more faults are detected.
Answers to RQs (cont’) • Does such effect vary under various testing profiles? • A significant correlation exists in exceptional test cases, while no correlation in normal operational test cases. • Higher correlation is revealed in functional testing than in random testing, but the difference is insignificant
Answers to RQs (cont’) • Does such effect vary with different coverage measurements? • Not obvious with four coverage measurements • Is code coverage a good filter to reduce the size of effective test set? • Yes, 203 test cases (17% of the original test set) which achieve any coverage increase can detect 98% of the faults.
Conclusion • Code coverage is a reasonably good indictor for fault detection capability. • The strong correlation revealed in exceptional testing implies that coverage works predictably better in certain testing profiles than others. • Testing guidelines and strategy can be established for coverage-based testing: • For normal operational testing: specification-based, regardless of code coverage • For exceptional testing: code coverage is an important metrics for testing capability • A quantifiable testing strategy may emerge by combining black-box and white-box testing strategies appropriately.