240 likes | 372 Views
Software Defect Survey. CS598 YYZ James Newell Lin Tan. Outline . Motivation Defect characteristics Generally True Controversial Use of the results Prediction Classification Conclusions. Motivation. Many papers study bug characteristics 2-3 applications
E N D
Software Defect Survey CS598 YYZ James Newell Lin Tan
Outline • Motivation • Defect characteristics • Generally True • Controversial • Use of the results • Prediction • Classification • Conclusions
Motivation • Many papers study bug characteristics • 2-3 applications • Open source vs. closed source software • Various results • What are generally true? • What are the differences between open source software and closed source software • In terms of bug characteristics • What have been done to use these results? • How well have we done in prediction?
Defect Characteristics • A small number of modules contains most of the bugs. • Support: Commercial [Basili84] [Compton90] [Munson92] [Ohlsson 96] [Kaaniche96] [Fenton00] [Ostrand02] [Pighin03] [Ostrand05] • Weak Support: 30% files contain 55% bugs. Open source [Chou01]
Defect Characteristics • A small number of modules contains most of the bugs. – Simply because these modules contain most of the code • Support: Commercial [Compton90] • Reject/Weak Reject: Commercial [Kaaniche96] [Fenton00] [Ostrand02] [Ostrand05]
Defect Characteristics • Files with the largest numbers of faults in an early release, seem to be more likely to have large numbers of faults in the next release and later releases. • Support: Commercial [Ostrand02] [Ostrand05] [Pighin03]
Defect Characteristics • Faultier in PRE-releases, faultier in POST-releases • Reject: Commercial [Fenton00] [Ostrand02]
Defect Characteristics • OSS developments exhibit very rapid responses to customer problems. • 50%: within 1 day, 75%: 42 days, 90%: 140 days. • The higher the priority, the faster they are fixed. • Priority: How many users depend on the bug. [Mockus00] [Mockus02]
Defect Characteristics • Size is the best predictor when assessed in terms of number of faults; not good if considering fault density. • Support: Commercial [Fenton00] [Ostrand05]
Defect Characteristics • The smaller the average component size, the more satisfied the users. • Support: Open Source [Stamelos02] • User satisfaction is subjective
Defect Characteristics • Defect density in open source releases is lower than commercial code that has received a comparable level of testing. • Support: [Mockus00] [Mockus02] [Paulson04] • Weak reject [Stamelos02]
Outline • Motivation • Defect characteristics • Generally True • Controversial • Use of the results • Prediction • Classification • Conclusions
Defect Characteristics • Most bugs in release software are transient bugs. • Support: Commercial [Gray91] • Reject: • Commercial [Sullivan91] [Sullivan92] [Lee93] • Open Source [Chandra00]
Defect Characteristics • Small modules are more fault-prone. • No relationship. Commercial [Fenton00] • Support/Weakly Support: Commercial [Basili1984] [Moller95] [Ostrand02] • Reject: Open Source [Chou01]
Defect Characteristics • Newly written files are more likely to be faulty than old files. • Support: Commercial [Ostrand02] [Chou01] • NO substantial difference: Commercial [Pighin03]
Generally True Characteristics • A small number of modules contains most of the bugs. • Files with the largest numbers of faults in an early release, seem to be more likely to have large numbers of faults in the next release and later releases. • Faultier in PRE-releases, less faultier in POST-releases (faultier: higher number of faults) • Defect density in open source releases is lower than commercial code that has received a comparable level of testing.
Outline • Motivation • Defect characteristics • Generally True • Controversial • Use of the results • Prediction • Classification • Conclusions
Prediction - Commercial • Top 20% of files: ~80% of faults [Ostrand04] [Ostrand05] • Negative Binomial Regression Model • Ideal: Top 20% of files: ~100% of faults • Top 20% of files: ~47% of faults [Ohlsson96] • Four models: equivalent performance • Ideal: 20% of files contain 60% of faults
Classification – Open Source • Classify bugs according to root causes [Podgurski03] • Motivation • Accuracy: In 71-86% of clusters, the majority has the same root cause • Auto-Assign bugs to developers [Cubranic04] • Motivation • Bayesian learning approach • Accuracy: 30%
Classification • Error rates of device drivers are 3-7 times higher than the rest of the kernel. [Chou01] • Bug life time: 1.8 years [Chou01] • Open Source • Undefined state errors dominate the error type distribution [Sullivan92] • Commercial
Conclusions • Many controversial results • Not many studies on OSS • No general characteristics for OSS • Prediction accuracy is reasonably good • Different prediction models produce similar accuracy • Classification reveals interesting results
Reference • [Basili84] Software Errors and Complexity: An Empirical Investigation, Comm. ACM • [Chandra00] • [Chou01] An Empirical Study of Operating System Errors, OSDI • [Compton90] Prediction and Control of ADA software Defects, J. Systems Software • [Cubranic04] Automatic bug triage using text categorization, SEKE • [Fenton00] Quantitative Analysis of Faults and Failures in a Complex Software System, TSE, • [Gray91] Why Do Computers Stop and What Can Be Done About It? Technical Report • [Kaaniche96] Software Reliability Analysis of T hree Successive Generations of a Switching System, EDCC-1 • [Lee93] • [Mockus00] A Case Study of Open Source Software Development: The Apache Server, ICSE • [Mockus02] Two case studies of open source software development: Apache and Mozilla, TSEM • [Moller95]
Reference • [Munton92] The Detection of Fault-Prone Programs, TSE • [Ohlsson96] Predicting Fault-Prone Software Modules in Telephone Switches, TSE • [Ostrand02] The Distribution of Faults in a Large Industrial Software System, ISSTA • [Ostrand04] • [Ostrand05] Predicting the Location and Number of Faults in Large Software Systems”, TSE • [Paulson04] An Empirical Study of Open-Source and Closed-Source Softwaree Products, TSE • [Pighin03] An Empirical Analysis of Fault Persistence through Software Releases, ISESS • [Podgurski03] Automated Support for Classifying Software Failure Reports, ICSE • [Stamelos02] Code Quality analysis in open source software development, Info Systems J • [Sullivan91] • [Sullivan92]