Mutation Analysis with Coverage Discounting

Mutation Analysis with Coverage Discounting Peter Lisherness, Nicole Lesperance, and Kwang-Ting (Tim) Cheng University of California – Santa Barbara

Motivation Functional Coverage + Functionally Meaningful + Evaluates design activation • Ignores propagation and • checker quality Coverage Discounting • + Functionally Meaningful • + Evaluates design activation, propagation and checkers Mutation Analysis • + Evaluates propagation and checker quality • - Mutants hard to analyze

Coverage Discounting: The Main Idea Use fault insertion to discount (revise downward) coverage scores Coverage Score Functional Coverage 100% Discounted Coverage Score Coverage Discounting 0% Mutation Analysis Detected + undetected mutants Undetected Mutants

An Existing Solution: Mutation Analysis • Benefits • Evaluates quality of testbench checkers • Indicates if tests fail to propagate potential errors to the checker or fail to activate mutants Add/Fix Tests Detected DUT Checker Fix Undetected

Mutation Analysis Drawbacks • Long runtime for simulation • Many man-hours required to analyze results: 2.1 Mutants are synthetic - difficult to identify testbench improvements to target mutant detection 2.2 Redundant mutants never detectable: Some undetected faults are redundant if (a >=0) b=1+a; else b=1-a; if (a >0) b=1+a; else b=1-a; Example:

Premise of Coverage Discounting 1) A mutant may change design functionality 2) A coverpoint, covered by testbench in the original design, may be suppressed (i.e. no longer be covered) in the mutated design 3) If the mutant suppressing the coverpoint is not detected, then the coverpoint is no longer considered covered in the original design

Coverage Discounting Flow Tests Detected DUT Checker Undetected Add more tests Fix Coverage Changed? Yes No

An Example Original Design if (remaining_length >= 5) begin burst_en=true; length_next:=remaining_length - 4 end else begin burst_en=false; length_next:=remaining_length- 1 end ... BURST_MODE : coverpoint burst_en; Number of packets sent in burst mode increased from 4 to 5 in the condition, but designers forgot to increase packet number everywhere else

An Example Mutated Design if (false) begin burst_en=true; length_next:=remaining_length - 4 end else begin burst_en=false; length_next:=remaining_length- 1 end ... BURST_MODE : coverpoint burst_en; *The mutant goes undetected because the checkers are only sensitive to bus content and ordering, not timing

An Example • Since the coverpoint BURST_MODE is suppressed by the mutant and the mutant is undetected, BURST_MODE is discounted • It is now clear that the checker must check for timing in order to adequately cover burst mode functionality • If timing is checked, the real bug is caught

Experimental Results Experimental Results

OpenRISC CPU • Experiment #1: Measure test quality • Simulation Infrastructure • Functional simulator (ISS) • Ad-hoc fault insertion • Tests: Instruction-Set v. Random • Coverpoints: Opcodes • Experiment #2: Identify checker weaknesses • Simulation Infrastructure: • SoC full-chip RTL simulation • Certitude fault insertion • Tests: SoC Regression Suite • Coverpoints: CPU top-level signals

MeasureTest Quality Functional Tests Original Discounted Random Tests Original Discounted

Identify Checker Weaknesses Tests

Identify Checker Weaknesses

UART • Illustrates success of coverage discounting applied to a sophisticated testbench • Simulation Infrastructure: • Design RTL w/OVM Testbench • SystemVerilogcoverpoints (ModelSim) • Certitude fault insertion • Tests: OVM Sequences • Coverpoints: Hand-written spec-based

UART Results • Reduces debug effort from analysis of 146 mutants to examining 3 functional coverpoints • 1588 Mutants: • 7 not activated • 106 not propagated • 33 not detected • Total 146 mutants demand attention • 846 Coverpoints: • 4 uncovered • 3 discounted • All discounted relate to specific unchecked functions • Loopback, timeout interrupt identification register

Results Examined • Coverage discounting: • Identified "good" tests (propagation) • Diagnosed checker problems • Identified coverpoints vacuously covered • However: • Runtime of mutation analysis still an issue • Technique sensitive to the quantity and quality of faults

Confidence Metric • Aims to answer: • Is a coverpoint adequately challenged by the set of mutants? • When have enough faults been inserted (when can simulation be stopped)? • What is the optimal simulation ordering of mutants for coverage discounting?

Detection Confidence (DECO) Score • Discounting relies on coverpoints being suppressed by faults • Point confidence: the number of times a coverpoint is suppressed • DECO(n): computes the percentage of coverpoints with point confidence greater than n

DECO-Directed Ordering • Optimize test/mutant simulation order to increase DECO score • Test Selection: Choose test covering the most low confidence points • Mutant Selection: Select mutant activated by the fewest tests

Quicker Confidence CPU UART

Quicker Discounting CPU UART

Summary – DECO • Provides feedback earlier • Improves confidence (both over time and at termination) • Enables early termination

Summary • Analyzed information gained from coverage discounting for two designs • Developed a confidence metric to gauge mutant effectiveness and an ordering heuristic to reduce runtime

Mutation Analysis with Coverage Discounting

Mutation Analysis with Coverage Discounting

Presentation Transcript

Mutation

Mutation

Mutation

Discounting

?????? Mutation

Mutation

Discounting

Bacterial Mutation Analysis

Mutation

Discounting Overview

Discounting Overview

Discounting

Discounting

Mutation

MUTATION

DISCOUNTING

Discounting Factors

Bill Discounting

Invoice Discounting

Mutation

Cost Benefit Analysis and Discounting

INVOICE DISCOUNTING