PREDICTION = POWER

PREDICTION = POWER Elaine Weyuker AT&T Labs - Research Florham Park, NJ TESTCOM 2003 MAY 23, 2003

Why Predict? Predicting how software will behave once it is released to the field, allows us to intelligently determine ways of allocating resources. This might include determining where testing effort should be concentrated or when to buy additional hardware.

Things To Predict • Whether a selective regression testing strategy is worth using. • Whether a software system is likely to be problematic based on architecture reviews. • Whether a software system will be able to continue to function acceptably when the workload has been increased significantly (predicting scalability).

Scalability 0 if the performance behavior in state s is acceptable 1 if the performance behavior in state s is unacceptable. C(s) = PNL(P,Q) = Pr(s)C(s) s where Pr(s) is the steady state probability for state s given operational distribution Q. For a given Q, PNL reflects the probability that the system’s performance objective will not be met.

Another Thing to Predict • What will be the risk (expected loss) associated with releasing a system in its current state.

Risk AssessmentTraditional Definitions Expected loss attributable to the failures caused by faults remaining In software. This can be computed in different ways: • The product of the probability of failure and the loss due to the failure. • The sum over all members of the input domain, of the product of the probability of an input occurring in the field, the probability of that input failing, and a non-zero cost of failure associated with every element of the domain. • The sum over all members of the input domain, of the product of the probability of an input occurring in the field and the cost of that input failing.

Risk Predictor Let P be a program, S its specification, M a test case selection method, and T a test suite selected by M for P. Then we define: R(P,S,M,T)= Pr(t)C(P,S,M,t) tÎT where Pr(t) is the probability of t being selected using M, and C(P,S,M,t) denotes the cost of failure for test case t provided P fails on t or t “should have been” run according to M, but wasn’t.

And Yet Another Thing to Predict • Which files of a software system are particularly fault-prone.

Determining Fault-PronenessQuestions • How are faults distributed over the different files? • How does the size of files affect their fault density? • Do files that contain large numbers of faults during early stages of development, typically have large numbers of faults during later stages?

More Questions • Do files that contain large numbers of faults during early releases, typically have large numbers of faults during later releases? • Are newly written files more fault-prone than ones that were written for earlier releases? • Are files that have been changed in earlier releases more fault-prone than those that have not changed? • Does the fault density drop as a file matures?

System Information The system examined was an inventory tracking system.We collected fault data from 13 successive releases, andduring each of nine periods: requirements, design, development, unit testing, integration testing, system testing, beta release, controlled release, and general release.About ¾ of the files were written in java, with smaller numbers written in shell script, makefiles, xml, html, perl, c, sql, and awk.

Number of Files Release Number

Size of System (KLOCs) Release Number

Numbers of Files and Faults Release Number

How are faults distributed over the different files?

Distribution of Faults Percent Faulty Files Release Number

How does the size of files affect their fault density?

Fault Density for File Groups by Size

Observations About Fault Density • Fault rates for a given release tend to be higher for bins 4 and 5 which contain the largest files as compared to bins 1 and 2 which contain the smallest files, but it is not monotonic for any of the 13 releases. • Bin 1 had the lowest fault density for only 3 releases, bin 5 had highest fault density for only 5 releases. • Aggregated, fault rates for large files are only ~ 20% higher than for the shortest files. • File size is not a strong predictor of fault-proneness.

Do files that contain large numbers of faults during early stages of development, typically have large numbers of faults during later stages?

Distribution of Post-Release Faults

Faults by Stages(Sorted by Decreasing Number Unit Testing Faults) Number of Faults Release Number

Do files that contain large numbers of faults during early releases, typically have large numbers of faults during later releases?

Once Faulty Always Faulty?From Release to Release For each release, we selected the files containing the most faults (defined to be the top 20% of files). On average, roughly 35% of these files were also high-fault files in the preceding and/or succeeding releases. For Release 12, more than 40% of its high-fault files were also high-fault files in Release 1.

Are newly written files more fault-prone than ones that were written for earlier releases?

Old Files, New FilesPercent Containing Faults Percent Containing Faults Release Number

Fault Density: Old Files, New Files Faults/KLOC Release Number

Fault Density: New, Changed, Unchanged

Is the Age of a File a Good Predictor of Fault-Proneness?

Fault Density by File Age 6 8 10 12

Does the System Become More Stable as it Matures?

Fault Density by Release

Fault Study Conclusions * Newly written files have higher average fault densities than older files. * Unchanged files have the lowest average fault densities. Changed files often have even higher average fault densities than new files. * Files that were particularly fault-prone in early releases tended to remain fault-prone in later releases. * The average fault density of files decreased as the system matured * File size is not a strong predictor of fault-proneness. * The age of a file is not a strong predictor of fault-proneness.

Conclusions The potential payoff from good prediction is very large. It can help determine where to focus resources, when to deploy additional resources, when it is safe to release software and similar issues.

PREDICTION = POWER