390 likes | 552 Views
Some Thoughts on Metrics Test Management Forum Paul Gerrard. Systeme Evolutif Limited 3 rd Floor 9 Cavendish Place London W1G 0QD email: paulg@evolutif.co.uk http://www.evolutif.co.uk. I’m a Metrics Sceptic. A great book on metrics.
E N D
Some Thoughts on Metrics Test Management Forum Paul Gerrard Systeme Evolutif Limited 3rd Floor 9 Cavendish Place London W1G 0QD email: paulg@evolutif.co.uk http://www.evolutif.co.uk
A great book on metrics • “The Tyranny of Numbers – why counting can’t make us happy” by David Boyle • Nothing to do with software • More to with government statistics • Written in the same spirit as “How to Lie with Statistics” – another counting classic • I’ve appeared to be fairly negative about metrics in the past • Not true – its blind faith in metrics I don’t like!
“What matters, we cannot count, so let’s count what does not matter” • A lovely quote from an economist called Chambers (having a dig at economists) • I’ve changed it to reflect a tester’s mantra: • “Testers have come to feelWhat can’t be measured, isn’t realThe truth is always an amountCount defects, only defects count.”
Some problems with metrics • Too often, numbers, and counting, being incontrovertible, are regarded as ‘absolute truth’ • Numbers are incontrovertible, but the object being counted may be subjective • The person who collects the raw data for counting isn’t usually independent (and has an ‘agenda’) • There is a huge amount of filtering going on: • by individuals • but also, the processes we use, by definition, are selective.
When I were a lad (development team leader)… • I collected all sorts of metrics to do with code • To help manage my team and their activities, I counted lines of code, code delivered, module size, module rates of change, fault location and type, fan-in, fan-out, and other statically detectible measures as well as costs allocated to specific tasks, lifted from a simple time recording system (that I wrote specifically to see where time went).
When I were a lad (development team leader)…2 • I used them to justify: • buying tools, changing my team’s behaviour and attitude to standards and other development practices as well as justifying my team’s existence through their productivity • (Other teams didn’t have metrics, so we were, by definition, infinitely more productive) • Metrics are extremely useful as a political tool • Metrics (statistics) are probably the most useful tool of politics. Ask any politician! • I knew I was collecting ‘good stuff’ but some of it was to be taken more seriously than others.
My biggest objection… • Counting defects is misleading, by definition • A gruesome analogy (body count): • to measure progress in a military campaign • not a good measure of how successful your campaign has been • A body count gives you the following information: • opponent’s forces have been diminished by a certain number • But what was it to start with? • How are the enemy recruiting more participants? • No one knows • The count represents the number of participants who are no longer in the campaign. So by definition, they don’t count anymore.
Body count • The body count could be used to measure our efficiency of killing • But, is killing efficiency a good way to measure progress in a campaign intended to capture territory, enemy assets, hearts and minds? • Hardly - dead people are a consequence, a tragic side issue, not the objective itself.
Defect/bug count • Defect count gives us the following information: • the number of defects have been diminished by a certain number • but what was it to start with? • How are the developers (the enemy? Hehe) injecting more defects? • No one knows – they certainly don’t • Predictive metrics are unreliable because of software languages, people, knock-on effects, coupling etc. etc.
Defect/bug count 2 • Defect count of defects removed from the system • By definition…they don’t count anymore • Bugs left in don’t count because they are trivial • The count can be used to measure test “efficiency” • But is “defect detection efficiency” a good way to measure progress in a project intended to deliver functionality, business benefits, cost savings? • NO! Defects are a consequence, a tragic side issue, and an inevitability, not the end itself • Need I go on?
Counting defects sends the wrong message to testers and management • If the only thing we count and take seriously is defects, we are telling testers that the only thing that counts is defects • All they ever do is look for defects • All management think testers do is find defects • But what to managers want?
Managers want… • To know the status of deliverables, what works, what doesn’t • They want… and want it NOW…: • demonstration that software works • confidence that software is usable • Defects are an inevitable consequence of development and testing, but not the prime objective • They are a tactical challenge • Defects are a practitioner issue all the time • But not a serious management issue unless defects block release and a decision is required to unblock the project • Most of the time, test metrics are irrelevant.
Myers has a lot to answer for • Myers advanced testing a few years when he defined the purpose of testing in 1978 • But that flawed definition has held us back since 1983! • The defects we count aren’t representative • Typically, system and acceptance test defects are counted • We recommend that all defects are counted • But that’s hardly possible • Even if we tried we couldn’t count them all • The vast majority are corrected as they are created • Finding bugs is a tactical objective, not strategic.
Most defects corrected before they have an impact • When we write a document, code or a test plan, we correct the vast majority of our mistakes instantly • never find their way into the review or test • vast majority of defects not found by “testing” at all • Testing only detects the most obscure faults • But we use the metrics based on the obscure defects to generalise and steer our testing activities • Surely, this isn’t sensible? • Only if we consider defects in all their various manifestations, can we promote general theories of how testing can be improved.
Our approach to testing undermines the data we collect • Textbooks promote the economic view: • finding defects early is better than finding them later • The logic is flawless • But this argument only holds if the “absence of defects” is our key objective • But surely, defects are symptoms, not the underlying problem • Absence of defects is a sign of good work, it’s not the deliverable • How can “absence” of anything be a meaningful deliverable though?
Economic argument for early testing is flawed • It is based on fixing symptoms, not the underlying problem, or improving the end deliverable • The argument for using reviews and inspections has traditionally been ‘defect prevention’ • But this is nonsense • Inspections and reviews find defects like any other test activity • The economic argument is based on rework prevention, not defect prevention • Early defects are simply more expensive to correct if left in products.
Testing is a reactive activity, not proactive • Testing cannot prevent defects – it is reactive, never proactive • This is why we still have to convince management that testing is important • Testing actually corrupts the defect data we collect • If we structure our testing to detect defects before system and acceptance testing, design defects found in system test are A BAD THING • bad because of the way we approach dev and test • bad because we need to re-document, redesign, re-test at unit, integration levels and so on • Self-fulfilling prophesy • Late testing makes the other guys look bad.
Compare that with RAD, DSDM or Agile methods • Little testing done by developers • some might do test-first programming or good unit testing • but most don’t • Because there is shallow documentation • There is little developer testing • instant response to system/user testing incidents • cost of finding ‘serious defects’ in system/acceptance testing is remarkably low.
Economic argument of early testing is smashed • The whole basis for a structured testing discipline is undermined • Traditional metrics don’t support Agile approach, so we say they are undisciplined, unprofessional and incompetent • (It’s hard to sell ISEB courses to these guys!) • Surely we are measuring the wrong things? • The data we collect is corrupted by the processes we follow!
Where now with test metrics? • Move away from defects as the principle object of measurement • Move towards ‘information assets’ as the testers deliverable • Defects are a part of that information asset • Defect analysis: • A development task, not a tester’s • Only programmers know how to analyse the cause of defects and see trends • Defect analyses help developers improve, not testers (a white box metric)
Where now with test metrics? 2 • Testing metrics should be aligned with business metrics (more black box) • Business results/objectives/goals • Intermediate deliverables/goals • Risk • Looking forward to software use, not back into software construction • Need to present metrics in more accessible, graphical ways.
Suppose you were asked to carry fruit • How many Apples could you carry? • I can carry 100 • How many Oranges? • I can carry 80 • How many Watermelons? • I can carry 7 • Assuming you have carrier bags, could you carry 40 apples, 25 oranges and 4 watermelons? • How would you work it out?
Can I carry the fruit? • If my carrying capacity is C • Weight of an apple is C/100 • Weight of an Orange is C/80 • Weight of a watermelon is C/7 • So total weight of the load is: No, I obviously can’t carry that load 40C 25C 4C 100 80 7 + + = 1.28C
Acceptable load • I don’t know what C is precisely, but that doesn’t matter • If the load factor is greater than one, I can’t carry the load • Let’s ignore C, then, and just worry about the acceptable load factor L L must be less than one
Acceptable load • If L is > 1 let’s try and reduce it • Removing 1 watermelon makes L=1.14 • Removing 2 watermelons makes L=0.998 • I can now carry the reduced load (just) • I have a measure of the load (L) and a threshold of acceptability (less than one) • I know that removing the heavy items will have the biggest improvement.
Suppose you were asked to accept a system? • How many low severity bugs could you afford? • I can accept 100 • How many medium? • I can accept 80 • How many high? • I can accept 7 • Could you accept 40 low, 25 medium and 4 high? • Could you work it out?
Can I afford (accept) the system with bugs? • If my “bug budget” is B • Cost of a LOW is B/100 • Cost of a MEDIUM is B/80 • Cost of a HIGH is B/7 • So total cost of the bugs is: No, I obviously can’t accept those bugs 40B 25B 4B 100 80 7 + + = 1.28B
Acceptable bug cost • I don’t know what B is, but that doesn’t matter • If the total cost of bugs is greater than one, I can’t accept the system • Let’s ignore B, then, and just worry about the bug COST factor: C C must be less than one
Calculating cost of bugs • If C is > 1 let’s try and reduce it • Removing 1 HIGH makes C=1.14 • Removing 2 HIGHs makes C=0.998 • I can now accept the improved system (just) • I have a measure of the cost (C) and a threshold of acceptability (less than one) • I know that removing the HIGH severity bugs will have the biggest improvement.
A useful metric for developers • Now, developers have a numeric score to drive their rework efforts • The can model different change strategies and predict an outcome • They can normalise the cost of correction with the reduction in bug cost.
A useful metric for testers • Bugs get a score that is finer grained that 3 or five level severities • No need to worry about borderline cases as the user can adjust the acceptability factor for bugs • Testers should focus on high COST bugs • But not to the exclusion of lower cost bugs.
Proposal • Why not assign THREE classifications: • Priority • Severity • Bug Cost* • And plot the cost of open bugs over time as well as the number of bugs?
Some Thoughts on Metrics Test Management Forum Paul Gerrard Systeme Evolutif Limited 3rd Floor 9 Cavendish Place London W1G 0QD email: paulg@evolutif.co.uk http://www.evolutif.co.uk