Software Testing Metrics: A Comprehensive Guide

Why Metrics in Software Testing? • How would you answer questions such as: • Project oriented questions • How long would it take to test? • How much will it cost to test? • Product oriented questions • How bad/good is the product ? • How many problems still remain in the software? • Test activities oriented questions • Will testing be completed on time? • Was the testing effective? • How much effort went into testing • All these Questions require some type of measurements and record keeping in order to answer properly.

Some Basic Concepts on Measurement • What do we need before we can measure something? • Clear understanding and definition of the attribute/characteristic that we are trying to gauge • The metric that may be used to gauge that attribute • The methodology for performing the measurement. (often forgotten once we get the first two done ---- including yours truly.)

Clarifying & Defining the Attribute to be Measured • Characterizing the attribute of interest • Size Attribute: • Physical height is a sizesub-attribute of many items. • Height of a building, person, tree - - - not a problem • Height of a ball or ocean ? - - - not comfortable? Why? • Physical weight is a sizesub-attribute of many items • What is the size attribute for software? What does it address? • The source statements - - - with screens? with db tables? • The storage space that the object code occupies in memory ? • Quality Attribute: • For a car ? - - - how fast it can accelerate? Number of times the car stalled? Number times the lights don’t work? • For software? - - - how many times we need to “re-boot”?, how good does the screen look? How many times do we need to call help-line? Or (# of times not Meeting customer requirements)

2. Metric for Gauging the Attribute • Metric – a unit used for describing or for measuring an attribute • Inches is a metric used for measuring the length attribute (simple metric) • Miles per hour is a metric for measuring the speed attribute (complex metric – requires 2 metrics) • Lines of code is a metric for measuring the size attribute of software (not a very good one) • Problems found per thousand lines of sourcecode is a metric for defect discovery rate attribute of software. (or is this for software quality attribute?

3. Conducting the Measurement • Once the attribute is defined and the associated metric is defined, the actual methodology to determine the extent of an attribute using that metric has to be spelled out. • How do you measure the length of a person using inches? • How do you measure the distance from earth to the moon using inches? • How do you measure the size of the computer program using bytes? • How do you measure the defects in a program using problems found during program testing? ( note: problems found may be counted in many ways - - - unique ones, accepted ones, etc.)

Some General, Test Measurements • Time is used to measure the length of period expended for testing • Time to setup and conduct (run) a test or a set of tests • Units of measurement in minutes or hours • Time to design and document test cases • Units of measurement in minutes or hours • Keeping track of time gives us one parameter to help us plan for future testing; but time must be balanced with the “size” of the test. • 2 seconds to run a simple query • 5 seconds to run a complete purchase transaction with confirmation • “Size” of test is needed to make “time of test” more meaningful or conversely can amount of “test time” be used as a metric for size of test attribute?

Size of Test • Test size attribute may use different metrics: • Amount of time to run test: (bit convoluted ?) • Small size : less than or equal to 3 seconds • Medium size: between 3 seconds and 1 minute • Large size: 1 minute or above • Number of lines of statements to document the test case: • Small size: less than or equal to 3 statements • Medium size: between 4 and 7 statements • Large size: 8 or more statements Any suggestions - - - - ? Number of test cases? --- or --- type of test such as unit test versus integration test ?----

Quality : # of Problems • The attribute , Quality, is often measured with the metric of number of problems found; but number of problems alone does not tell the whole story - - - consider • Severity of problems • High • Medium • low • Type of problems • UI • Database • Network outage • Etc.

Quality (cont.) • Both Severity and Type are important • # of problems found by severity • # of problems found by type • # of problems found when (when during development) • # of problems found when (months after release) • # of problems found where (UI,DB, Logic, Network, etc.) • Quality Information is relevant to both: • Software providers • Customers/users Why important to users? What would they do with it?

Problem Find Rate The Weibull probability density curve: f(t) = (m/t) (t/2)m e –z where z = (t/c)m - for m= 1, the curve looks as dotted line - for m = 2, the curve looks as solid line and is called Rayleigh Problem Find Rate # of Problems Found per hour Time Day 1 Day 2 Day 3 Day 4 Day 5 Does severity of problem matter here? (it should , but not considered here)

Problem Fix Rate Problem Fix Rate Problem Find Rate During Functional Test # of Problems Fixed per hour Problem Fix Rate During Functional Test Time Day 1 Day 2 Day 3 Day 4 Day 5 Would this fix rate present a problem ? Would you also want to keep a backlog # by day ?

Problem Density Density Note: Just the # of problems found by area does not normalize the measurement; we need the per KLOC. 6 5 # of problems found per KLOC 4 3 2 1 Area Module 1 Module 2 Module 3 Module 4

Test CoverageRate • Not all the planned test cases are actually run. • # of test cases executed / # of test cases planned • By functional areas • By test phases • # of source statements executed / total # of source statements • By functional areas • By modules

Test Activity Effectiveness • Defect discovery and eradication activities occur at all phases of development. To see which is more effective one may use: • # of problems found / total # of problems found • By development phase (req. rev., design rev., func. test, system, etc.) • # of problems found / person-days of effort • By test activities (e.g. boundary value testing, branch testing, d-u testing, etc.)

Fix Effectiveness • Not all problem fixes resolve the problems. • # of fixes that worked / total # of fixes • The first time • # of fixes that required more than 1 fix / total number of fixes

Fix Cost • Fix cost is usually measured by amount of effort expended. • # of person-hours expended / fix • By severity • By areas • By phase type (including post-release) If the fix cost for post-release is higher than that of all of the pre-release phases, then that will be one reason for test and reviews.

Problem Cost Comparison • Effort expended in discovering a problem and the effort expended in fixing that problem is the “test”cost during pre-release. • Effort expended in fixing a problem and releasing it to the customer is the “support” (problem resolution) cost during post-release. • Compare: (effort in people hours) effort expended / problem found and “fixed” (pre-release) .vs. effort expended / problem “resolved” (post-release) Post-release resolution usually cost more

How “Big” is it (testing w/o fix) ? How would you answer this? • Assume --- # of test cases planned by size (or complexity): • large – 35 test cases • Medium – 200 test cases • small – 40 test cases • Assume --- average effort required to design and test • large – 1 person hour • Medium – 15 person minutes • small – 5 minutes • Then ---- “How Big is Testing?” may be answered • (35X60) + (200x15) + (40x5) = 5,330 person-minutes or 88.33 person-hours So, In this case --- how big is testing? - It is 275 test cases. - It is 88.33 person hours of effort.

How Long Would it take? • Use the same example of 88.33 person-hours of test planning and execution effort. • You need to make some assumptions: • assume 2 testers of about equal ability • split the work effort evenly • 88.33people-hours/2 people = 44.17 hours • further assume that each person works 6 hours a day • 44.17 hours/ 6hours-perday = 7.3 days • So this will take 2 testers working 6 hours a day for 7.3 days

Software Testing Metrics: A Comprehensive Guide