SOFTWARE DEFECT REMOVAL: THE STATE OF THE ART IN 2011

Capers Jones & Associates LLC SOFTWARE DEFECT REMOVAL: THE STATE OF THE ART IN 2011 Capers Jones, President Capers Jones & Associates LLC Chief Scientist Emeritus, SPR LLC Quality Seminar: talk 1 http://www.spr.com Capers.Jones3@gmail.com June 11, 2011

BASIC DEFINITIONS DEFECT An error in a software deliverable that would cause the software to either stop or produce incorrect results if it is not removed. DEFECT The predicted sum of errors in requirements, POTENTIAL specifications, source code, documents, and bad fix categories. DEFECT The set of static and dynamic methods REMOVAL applied to software to find and remove bugs or errors. UNREPORTED Defects excluded from formal counts of DEFECTS bugs. Usually desk checking, unit test and other “private” forms of defect removal.

BASIC DEFINITIONS DEFECT The ratio of defects found and removed REMOVAL to total defects present at the time of EFFICIENCY the removal activity. Best overall quality (DRE) metric; good process improvement metric. DEFECT The percentage of defects found by DETECTION inspections, static analysis, or EFFICIENCY testing. DDE < DRE by about 10%. (DDE) DEFECT An arbitrary method for categorizing software SEVERITY defects into descending levels of importance. Normally “severity 1” implies total failure. BAD FIX A bug or defect in the repair of a previous defect (average is 7%).

BASIC DEFINITIONS VALID An error in a software deliverable which is DEFECT determined by analysis to produce incorrect results or violate standards. INVALID A problem reported against a software DEFECT application that is found not to be caused by the software. Hardware problems are the most common form of invalid defect reports. DUPLICATE Any report of a software defect other than the DEFECT initial report. (Some defects may be reported REPORT thousands of times.)‏ ABEYANT A reported defect that cannot be reproduced DEFECT by the maintenance team. This situation may occur in complex hardware and software environments with many vendor packages.

BASIC DEFINITIONS STATIC Defect removal methods that do not ANALYSIS utilize execution of the software. Examples include design and code inspections; audits, and automated static analysis. DESIGN A formal, manual method in which a team INSPECTION of design personnel including a moderator and recorder examine specifications. CODE A formal, method in which a team of INSPECTION programmers and SQA personnel including a moderator and recorder examine source code. AUDIT A formal review of the process of software development, tools used, and records kept. Purpose is to ascertain good practices.

BASIC DEFINITIONS DYNAMIC Methods of defect removal that involve ANALYSIS executing software in order to determine its properties under usage conditions. TESTING Executing software in a controlled manner in order to judge its behavior against predetermined results TEST CASE A set of formal initial conditions and expected results against which software execution patterns can be evaluated FALSE POSITIVE A defect report made by mistake. Not a real defect. (Common with automated tests and automated static analysis.)

IBM’S ORGINAL DEFECT SEVERITY SCALE IBM developed a four-level severity scale circa 1956. Many other companies adopted this scale with or without modifications. SEVERITY 1 Application does not run at all. (< 5% of defects) SEVERITY 2 Major features disabled or wrong. (> 25% of defects) SEVERITY 3 Minor problem. (> 45% of defects) SEVERITY 4 Cosmetic problem such as a spelling error. Execution is not degraded. (< 25% of defects)

DEFECT SEVERITY DISTRIBUTION AND RESPONSE SEVERITY FREQUENCY RESPONSE TIME LEVEL OF OCCURRENCE BY VENDOR SEVERITY 1 < 1% < 24 HOURS SEVERITY 2 < 15% < 1 WEEK SEVERITY 3 < 50% < 1 MONTH SEVERITY 4 < 35% NEXT RELEASE NOTE: Because of quicker response times, clients prefer to have defects they care about classified as Severity 2.

CAUTIONS ABOUT UNREPORTED DEFECTS (Private, unreported defect removal lowers removal efficiency!)‏ Defect Counts Case A Case B Difference Defect potential 100 100 0 Excluded (unit test) 50 0 - 50 Included (total defects) 40 90 + 50 Delivered Defects 10 10 0 Removal Efficiency 80% 90% + 10% All defects should be counted; only serious defects need to be analyzed. Volunteers are needed to count “private” defects before formal testing.

ACHIVING HIGH DEFECT REMOVAL EFFICIENCY PRE-TEST DEFECT REMOVAL ACTIVITIES = > 85% defect removal efficiency before testing starts • Desk checking • Formal requirements, design, and code inspections • Formal test plan, test script, and test case inspections • Independent verification and validation (military software) • Automated static analysis of code FORMAL TESTING ACTIVITIES = > 85% defect removal efficiency from all test stages • Unit test • New function test • Regression test • Performance test • Security test • Usability test • Subsystem test • System test • Beta test or acceptance test‏ CUMULATIVE DEFECT REMOVAL EFFICENCY FROM PRE-TEST + TESTS = > 97%

TYPES OF SOFTWARE DEFECTS DEFECT SOURCE REQUIREMENTS • Defect potential 1.0 per function point; 10.0 per KLOC • Volume 0.5 pages per function point • Completeness < 75% of final features • Rate of change 2% per month • Defect types Errors of omission Errors of clarity and ambiguity Errors of logical conflict Errors of judgement (Y2K problem)‏ • Defect severity > 25% of severity 2 errors

TYPES OF SOFTWARE DEFECTS DEFECT SOURCE DESIGN • Defect potential 1.25 per function point; 12.5 per KLOC • Volume 2.5 pages per function point (in total)‏ • Completeness < 65% of final features • Rate of change 2% per month • Defect types Errors of omission Errors of clarity and ambiguity Errors of logical conflict Errors of architecture and structure • Defect severity > 25% of severity 2 errors

TYPES OF SOFTWARE DEFECTS DEFECT SOURCE SOURCE CODE • Defect potential 1.75 per function point; 17.5 per KLOC • Volume Varies by programming language • Completeness 100% of final features • Dead code > 10%; grows larger over time • Rate of change 5% per month • Defect types Errors of control flow Errors of memory management Errors of complexity and structure • Defect severity > 50% of severity 1 errors

VARIATIONS IN CODE DEFECTS RANGE OF CODE DEFECT POTENTIALS • Small applications < 100 FP = 1.0 per function point • Large applications > 10,000 FP = 3.0 per function point • Low-level languages = > 2.0 per function point • High-level languages = < 1.0 per function point • Low cyclomatic complexity = < 1.0 per function point • High cyclomatic complexity = < 3.5 per function point • New development = 1.75 per function point • Enhancements to complex code = 2.5 per function point • Enhancements to well-structured code = 1.0 per function point • Conversion to a new language = < 1.0 per function point • Conversion to a new operating system = < 1.0 per function point‏ • Conversion to a new hardware platform = 1.5 per function point

EXAMPLES OF TYPICAL CODE DEFECTS SOURCES: SANS INSTITUTE AND MITRE (www.SANS.org and www.CWE-MITRE.org) • Errors in SQL queries • Failure to validate inputs • Failure to validate outputs • Race conditions • Leaks from error messages • Unconstrained memory buffers • Loss of state data • Incorrect branches; hazardous paths • Careless initialization and shutdown • Errors in calculations and algorithms • Hard coding of variable items‏ • Reusing code without validation or context checking • Changing code without changing comments that explain code

TYPES OF SOFTWARE DEFECTS DEFECT SOURCE USER DOCUMENTS • Defect potential 0.6 per function point; 6.0 per KLOC • Volume 2.5 pages per function point • Completeness < 75% of final features • Rate of change 1% per month (lags design and code)‏ • Defect types Errors of omission Errors of clarity and ambiguity Errors of fact • Defect severity > 50% of severity 3 errors

TYPES OF SOFTWARE DEFECTS DEFECT SOURCE BAD FIXES • Defect potential 0.4 per function point; 4.0 per KLOC • Volume 7% of defect repairs • Completeness Not applicable • Rate of change Not applicable • Defect types Errors of control flow Errors of memory management Errors of complexity and structure • Defect severity > 15% of severity 1 errors > 20% of severity 2 errors

TYPES OF SOFTWARE DEFECTS DEFECT SOURCE BAD TEST CASES • Defect potential 1.0 per function point; 10.0 per KLOC • Volume 5.0 per function point: 20% are bad • Completeness < 50% of paths through code • Rate of change Not applicable • Defect types Errors of control flow Errors of memory management Errors of complexity and structure Errors of accidental redundancy (> 25% of test cases are redundant)‏ • Defect severity > 15% of severity 2 errors

TYPES OF SOFTWARE DEFECTS DEFECT SOURCE AUTOMATIC TEST CASES • Defect potential 0.5 per function point; 5.0 per KLOC • Volume 5.0 per function point: 15% are bad • Completeness < 75% of paths through code • Rate of change 2% per month • Defect types Errors of control flow Errors of omission (not in design)‏ Errors of complexity and structure Errors of accidental redundancy (> 25% of test cases are redundant)‏ • Defect severity > 15% of severity 2 errors

FORMS OF SOFTWARE DEFECT REMOVAL STATIC ANALYSIS • Requirement inspections • Design inspections • Code inspections • Automated static analysis • Informal peer reviews • Personal desk checking • Test plan inspections • Test case inspections • Document editing • Independent verification and validation (IV&V)‏ • Independent audits

FORMS OF SOFTWARE DEFECT REMOVAL DYNAMIC ANALYSIS • General forms of testing • Specialized forms of testing • Automatic testing • Testing involving users or clients • Virtualization

FORMS OF SOFTWARE TESTING GENERAL FORMS OF SOFTWARE TESTING • Subroutine testing • Extreme Programming (XP) testing • Unit testing • Component testing • New function testing • Regression testing • Integration testing • System testing

FORMS OF SOFTWARE TESTING AUTOMATIC FORMS OF SOFTWARE TESTING Test cases derived from parsing formal specifications: • Unit testing • Component testing • New function testing • Regression testing • Integration testing • System testing Specifications are < 75% complete. Specifications contain1.25 defects per function point.

FORMS OF SOFTWARE TESTING SPECIALIZED FORMS OF SOFTWARE TESTING • Virus testing • Limits and capacity testing • Performance testing • Nationalization testing • Security testing • Platform testing • Independent testing • Clean-room statistical testing • Supply chain testing

FORMS OF SOFTWARE TESTING FORMS OF SOFTWARE TESTING WITH USERS • Pre-purchase testing • Acceptance testing • External Beta testing • Usability testing • Laboratory testing

FORMS OF DEFECT REMOVAL SOFTWARE STATIC ANALYSIS

FORMS OF STATIC ANALYSIS REMOVAL STAGE REQUIREMENT INSPECTIONS • Occurrence < 5% of mission-critical software • Performed by Clients, Designers, Programmers, SQA • Schedule 75 function points per hour • Purpose Requirements error removal • Limits Late additions not covered • Scope Full requirement specifications • Size of software > 100,000 LOC or 1000 function points • Defect potential 1.0 per function point; 10.0 per KLOC • Removal Efficiency 65% to 85% of significant errors • Bad fix injection 2% to 5% • Comment Reduces creep by > 50%

FORMS OF STATIC ANALYSIS REMOVAL STAGE DESIGN INSPECTIONS • Occurrence Systems software primarily • Performed by 3 to 8 Designers, Programmers, SQA • Schedule 100 function points per hour • Purpose Design error removal • Limits Late features not covered • Scope Initial and final specifications • Size of software > 10,000 LOC or 100 function points • Defect potential 1.25 per function point; 12.5 per KLOC • Removal Efficiency 65% to 85% of all defect types • Bad fix injection 2% to 7% • Comment Raises test efficiency by > 10%

FORMS OF STATIC ANALYSIS REMOVAL STAGE AUTOMATED STATIC ANALYSIS • Occurrence Systems, embedded, open-source • Performed by Developers, Testers, SQA • Schedule 500 function points 25,000 LOC per hour • Purpose Coding error detection • Limits Only works for 25 languages out of 2,500 • Scope Source code after clean-compilation • Size of software Flexible: 1 to > 10,000 function points • Defect potential 1.75 per function point; 17.5 per KLOC • Detection Efficiency > 85% except for performance • Bad fix injection 2% to 5% • Caution Performance and some security issues

FORMS OF STATIC ANALYSIS REMOVAL STAGE CODE INSPECTIONS • Occurrence Systems software primarily • Performed by 3 to 6 Programmers, Testers, SQA • Schedule 2.5 function points or 250 LOC per hour • Purpose Coding error removal • Limits Late features not covered • Scope Source code after clean-compilation • Size of software > 1,000 LOC or 10 function points • Defect potential 1.75 per function point; 17.5 per KLOC • Removal Efficiency 65% to 85% except for performance • Bad fix injection 2% to 5% • Caution Hard if cyclomatic complexity >10

FORMS OF STATIC ANALYSIS REMOVAL STAGE INFORMAL PEER REVIEWS • Occurrence All forms of software • Performed by 2 to 5 Programmers, Testers, SQA • Schedule 10 function points or 1 KLOC per hour • Purpose Coding error removal • Limits Subtle problems not covered • Scope Source code after clean-compilation • Size of software > 1,000 LOC or 10 function points • Defect potential 1.75 per function point; 17.5 per KLOC • Removal Efficiency 25% to 45% except for performance • Bad fix injection 5% to 15% • Caution Hard if cyclomatic complexity >10

FORMS OF STATIC ANALYSIS REMOVAL STAGE PERSONAL DESK CHECKING • Occurrence All forms of software • Performed by 1 programmer • Schedule 8 function points or 0.8 KLOC per hour • Purpose Coding error removal • Limits Hard to see you own mistakes • Scope Source code after clean-compilation • Size of software < 100 LOC to > 1000 LOC • Defect potential 1.75 per function point; 17.5 per KLOC • Removal Efficiency 25% to 55% except for performance bugs • Bad fix injection 3% to 10% • Caution Hard to do if cyclomatic complexity > 10

FORMS OF STATIC ANALYSIS REMOVAL STAGE TEST PLAN INSPECTIONS • Occurrence Systems software primarily • Performed by 3 to 6 Testers, Programmers, SQA • Schedule 250 function points per hour • Purpose Testing error removal • Limits External tests not covered • Scope All formal test plans • Size of software > 10,000 LOC or 100 function points • Defect potential 0.15 per function point; 1.5 per KLOC • Removal Efficiency 65% to 85% of test omissions • Bad fix injection 1% to 3%

FORMS OF STATIC ANALYSIS REMOVAL STAGE TEST CASE INSPECTIONS • Occurrence Systems software primarily • Performed by 3 to 6 Testers, Programmers, SQA • Schedule 200 function points per hour • Purpose Test case error and duplicate removal • Limits Informal test cases not covered • Scope Formal test cases • Size of software >10,000 LOC or 100 function points • Defect potential 1.0 per function point; 10.0 per KLOC • Removal Efficiency 65% to 85% of test case problems • Bad fix injection 1% to 3% • Comment Test case errors are poorly researched

FORMS OF STATIC ANALYSIS REMOVAL STAGE DOCUMENT EDITING • Occurrence Commercial software primarily • Performed by Editors, SQA • Schedule 100 function points per hour • Purpose Document error removal • Limits Late code changes not covered • Scope Paper documents and screens • Size of software >10,000 LOC or 100 function points • Defect potential 0.6 per function point; 6.0 per KLOC • Removal Efficiency 70% to 95% of errors of style • Bad fix injection 1% to 3% • Comment Document errors often severity 3

FORMS OF STATIC ANALYSIS REMOVAL STAGE IND. VERIF. & VALID. (IV&V)‏ • Occurrence Military software primarily • Performed by 5 to 10 IV&V, SQA personnel • Schedule 50 function points per hour • Purpose Standards adherence • Limits Some technical problems not covered • Scope Initial and final specifications • Size of software > 10,000 LOC or 100 function points • Defect potential 1.25 per function point; 12.5 per KLOC • Removal Efficiency 25% to 45% of standards errors • Bad fix injection 3% to 15% • Comment Required by DoD standards

FORMS OF STATIC ANALYSIS REMOVAL STAGE INDEPENDENT AUDITS • Occurrence Contract software primarily • Performed by 1 to 10 SQA or audit personnel • Schedule 150 function points per hour • Purpose Best practice adherence • Limits Some technical problems not covered • Scope Plans and specifications • Size of software > 10,000 LOC or 100 function points • Defect potential 5.0 per function point; 50.0 per KLOC • Removal Efficiency 20% to 50% of standards errors • Bad fix injection 1% to 10% • Comment Defense against litigation

NORMAL DEFECT ORIGIN/DISCOVERY GAPS Requirements Design Coding Documentation Testing Maintenance Defect Origins Defect Discovery Requirements Design Coding Documentation Testing Maintenance Zone of Chaos

DEFECT ORIGINS/DISCOVERY WITH INSPECTIONS Requirements Design Coding Documentation Testing Maintenance Defect Origins Defect Discovery Requirements Design Coding Documentation Testing Maintenance

LOGISTICS OF SOFTWARE INSPECTIONS • Inspections most often used on “mission critical” software • Full inspections of 100% of design and code are best • From 3 to 8 team members for each inspection • Team includes SQA, testers, tech writers, software engineers • Every inspection has a “moderator” and a “recorder” • Preparation time starts 1 week prior to inspection session • Many defects are found during preparation • Inspections can be live, or remote using groupware tools

LOGISTICS OF SOFTWARE INSPECTIONS • Inspection sessions limited to 2 hour duration • No more than 2 inspection sessions per business day • Inspections find problems: repairs take place off-line • Moderator follows up to check status of repairs • Team determines if a problem is a defect or an enhancement • Inspections are peer reviews: no managers are present • Inspection defect data should not be used for appraisals • Remote on-line inspections are now very cost effective

LOGISTICS OF SOFTWARE AUDITS • Audits most often used on software with legal liabilities • Audits use standard questions: ISO,SEI, SPR, TickIt, etc. • Audits concentrate on processes and record keeping • Audits may also examine deliverables (specifications, code) • Full audits of 100% of deliverables are best • From 1 to 5 team members for each audit

LOGISTICS OF SOFTWARE AUDITS • Audit team includes SQA and process specialists • Preparation time starts 2 weeks prior to on-site audit • Audit interviews average 60 minutes in length • Audits find problems: repairs take place off-line • Audits involve both managers and technical personnel • Audit reports go to top executives • Audit reports may be used in litigation

ECONOMICS OF STATIC ANALYSIS EXAMPLE OF DESIGN INSPECTION • Assignment scope 200 function points • Defect potential 266 design errors • Preparation 50.0 function points per staff hour • Execution 25.0 function points per hour • Repairs 0.5 staff hours per defect • Efficiency > 75.0% of latent errors detected

ECONOMICS OF STATIC ANALYSIS EXAMPLE OF DESIGN INSPECTION • 1000 Function point (100 KLOC) application with 266 defects • Preparation = 5 staff members (1000 / 200)‏ • Preparation time = 20 hours or 4 hours per staff member • Execution = 40 clock hours or 200 staff hours • Repairs = 100 hours (200 defects * 0.5 hours per defect)‏ • Remaining = 66 latent defects (75% removal efficiency rate)‏

FORMS OF DEFECT REMOVAL GENERAL SOFTWARE TESTING

GENERAL FORMS OF SOFTWARE TESTING Test SUBROUTINE TEST • Occurrence All types of software • Performed by Programmers • Schedule 2.0 function points or 200 LOC per hour • Purpose Coding error removal • Limits Interfaces and control flow not tested • Scope Individual subroutines • Size Tested 10 LOC or 0.1 function point • Test cases 0.25 per function point • Test case errors 20% • Test runs 3.0 per test case • Removal Efficiency 50% to 70% of logic and coding errors • Bad fix injection 2% to 5%

GENERAL FORMS OF SOFTWARE TESTING Test UNIT TEST • Occurrence All types of software • Performed by Programmers • Schedule 10 function points; 1 KLOC per hour • Purpose Coding error removal • Limits Interfaces and system errors not found • Scope Single modules • Size Tested 100 LOC or 1 function point • Test cases 5.0 per function point • Test case errors 5% to 20% based on test skill • Test runs 10.0 per test case • Removal Efficiency 30% to 50% of logic and coding errors • Bad fix injection 3% to 7%

GENERAL FORMS OF SOFTWARE TESTING Test Extreme Programming (XP) test • Occurrence All types of software • Performed by Programmers • Schedule 10 function points; 1 KLOC per hour • Purpose Coding error removal • Limits Interfaces and system errors not found • Scope Single modules or new features • Size Tested 100 LOC or 1 function point • Test cases 5.0 per function point • Test case errors 10% • Test runs 5.0 per test case • Removal Efficiency 45% of logic and coding errors • Bad fix injection 3%

GENERAL FORMS OF SOFTWARE TESTING Test COMPONENT TEST • Occurrence Software > 100 function points • Performed by Programmers or test specialists • Schedule 100 function points; 10 KLOC per hour • Purpose Interface errors between modules • Limits Internal problems not covered • Scope Multiple modules • Size Tested 1000 LOC or 10 function point and up • Test cases 2.0 per function point • Test case errors 5% • Test runs 5.0 per test case • Removal Efficiency 35% of interface errors • Bad fix injection 7%

SOFTWARE DEFECT REMOVAL: THE STATE OF THE ART IN 2011