390 likes | 719 Views
Release Readiness. Peter Farrell-Vinay CSC. What is release readiness?. The state of the system at the moment when a release can be made to the field. How do we define release readiness?. When we test we are trying to answer a big question: Can we release this (system, feature, code)?
E N D
Release Readiness Peter Farrell-Vinay CSC
What is release readiness? The state of the system at the moment when a release can be made to the field
How do we define release readiness? When we test we are trying to answer a big question: • Can we release this (system, feature, code)? This question can be broken down into many sub-questions: • Have we tested all the code? • Have we tested all the features? • Have we found all the bugs we expected to find? • Have we fixed all the bugs we need to fix?
Release readiness - the simple answer We can release when: • Coverage is OK • all the code has been exercised • every feature has been shown to work • every use case scenario has been exercised • all the tests have been run • All the interfaces with connected systems work • Bugs are OK • the number of bugs found is almost the number expected • the unfixed bugs are few enough • Non-functional features are OK • Code turmoil is OK • Feature stability is OK
Code coverage Code coverage is easy to define and hard to judge. Find out if it’s good enough early by reviewing the unit tests.
Feature coverage Question: what is a feature? • A1: anything stakeholders say it is • A2: something users can use to do a job • A3 some collection of screens and screen icons which users use to fulfill some unique task
Feature definitions Feature definitions vary from the ludicrous (verbal assertions) through user stories, to the highly-structured (UML). Somewhere in between is text. Any definition is good if it lets you: • Measure: • how many there are • how big each is • how important each is to the user • Test them in any combination such as to exhibit bugs • Identify inconsistencies • Know the limits of coverage
Problems (you knew we’d get here sometime) The existence and extent of the feature depends on: • Who is defining it (management, users, sales staff, business analysts, tech. writers, and, er, testers?) • How they are defining it (text, UML, memoranda, backs of envelopes, user stories) • How it is expected to be used, and by whom Lehman:the existence of the tool changes the nature of the job
Types of coverage • Structure • GUI icon • Feature • Scenario • Transition • Web script, web page, application, and component
Test coverage by structure (code) • You need to be sure the developers have exercised some minimum part of the code using several approaches. • Code coverage has been the subject of much academic interest: there are lots of measures: • Lines • Decision points • Declaration-Use paths • Linear Code Sequence and Jumps • Sometimes you need to instrument your code to prove it‘s been exercised. Great hammers for great nails
Test coverage by GUI icon • The user interface has a number of screens, buttons, pull-downs, tabs, menus etc. Do we have them all listed, and with tests which exercise every one? • Objections: • Q: d’you know how many such icons there are in the application? A: if it’s that big, you need a list to be sure you’ve hit them all • Q: just pulling down icons, hitting buttons and entering text in fields doesn’t exercise the system. A: true but if any of those don’t work you need to know asap.
Test coverage by feature • Do we have a test (set) for every feature plus installation, deinstallation, start-up and shut-down? • Can we decompose every feature into testable sub-bits? Has every one got a test? Does every test include at least one negative case? Objection: • No we don’t have any spec. worthy of the name nor any time to write it.
Test coverage by scenario Users have goals they want to achieve. • They achieve them using a number of (parts of) features. This sets up subtle feature interactions which no other coverage approach will mimic. • Problems: • P: The system is new and the users don’t know how they will use it. S: use a model office or a simulator. Build a process model and use that to identify fragments against which you can write test fragments. Then recombine those fragments into scenarios • P: The release adds new features - have we got to test all the old ones as well? S: What is the risk if they interact in unexpected ways and you haven’t tested all these ways? (aka “yes”) • P: User management doesn’t want us to talk to users. S: Have management define a manager as a model user and put him or her in the model office.
Subtle feature interactions • Window A has field A1 • Window B has fields B1 Enter data in Window A and it can affect or be affected by the data in Window B. Example: A1 = an angle whose tangent is used by field B1 thus: TAN(A1 * PI()/100) [in Excel notation] Which is fine until A1 = 270 degrees when it becomes undefined and is used as a divisor for B1.
Test coverage by transition • Web and conventional applications have “paths” a user may take through the system to achieve a goal. • Identify the paths in the form of a state transition diagram (typically from URL to URL in the case of a web test) such that a minimum number of paths can be identified and traversed. • Objections: • O: Far too many paths to do this. R: What’s the risk of something going wrong? Model your paths at whatever level you find useful to test against • O: The whole point of a web app. is to be agile. R:It doesn’t matter how agile your development is, if users click out because the app.’s unusable, because you haven’t tested all the paths. If the app is well-made there won’t be many paths.
More questions • Question: where does a feature start and end? • Question: how d’you measure the size of a feature? (See the IFPUG counting practices manual) • Question: the number of states the system can be in is astronomic - how do we decide what sets of variables to use? • Question: are there coverage questions in user interface testing?
Test coverage by web script, web page, application, and component Web sites are built from bits. • Testing each bit returns us to an older, component-based test model • Having identified the risk level of the web site, decide the level of coverage of each component. It doesn’t matter how stable the web site is if the user experience is vile
Conclusions Coverage must: • answer a question conclusively • have a baseline, • be user-, and business-relevant at some level Coverage is very definition-dependant: • Invent a new coverage type and every manager will want to know why you haven’t used it. • If the features are ill-defined, the coverage will be. Coverage remains a major question
Release readiness - the simple answer We can release when: • Coverage is OK • all the code has been exercised • every feature has been shown to work • every use case scenario has been exercised • all the tests have been run • Bugs are OK • the number of bugs found is almost the number expected • the unfixed bugs are few enough • Non-functional features are OK • Code turmoil is OK • Feature stability is OK
Release readiness - how many bugs should we find? Two possibilities: • You have never released anything before - you have no history, this is the first release the organisation has ever made. • You have made several releases
Release readiness - you have never released anything before Five possibilities: • Crude estimate No of bugs = Lines Of Executable Code/125 • Bug seeding (see the handout for details) • Use Weibull and match the curves - you’ll need the crude estimate for this (see the handout for details) • Ask the testers - when did they last find a showstopper? • Look at the bug curve - is it flattening out yet?
Release readiness - you have made several releases before • As before • Crude estimate • Bug seeding • Weibull • Ask the testers • Look at the bug curve • Bug density • Twin test team model (Remus and Zilles) • Multiple regression • Non-functional features are OK • Code turmoil is OK • Feature stability is OK • Build quality is OK
Bug density • Bug density as measured from your last n releases: density = bugs/KLOC • But beware: • include both testing and field bugs (you won’t always know them) • Bug densities vary as the developers get better or worse. • Big bugs may stay hidden a long time • Bug density is language-specific • Bug density may increase if code gets entropic because developers introduce more bugs than they fix (infinite defect loop)
Twin test team model • Remus and Zilles • Two test teams test the same application • Some of the bugs they find are duplicates • Exploit the duplication to discover the total number of bugs
Multiple regression - 1 • Assumes: • you have made a number of releases • you have access to the history files of code changes so for each release you can identify the number of inserts, updates and deletes to the code • the development environment and developers remain the same for all releases.
Multiple regression - 2 • Method • Get the number of inserts, updates and deletes to the code for each release • Use the multiple regression equations to derive an equation relating inserts, updates and deletes to the number of bugs found • Identify the degree of error the equation has • Plug in the numbers number of inserts, updates and deletes for the new release and get the number of bugs (within that degree of error).
Multiple regression - 3 • Caveats • Estimate the future growth of inserts, updates and deletes before the release. • Multiple Regression is sensitive to extremes in inputs. • User reported bugs may increase - recalibrate your model. • Use your history to identify the proportion bugs in each severity class. • An high number of deletes indicates a reduced number of bugs.
Code Turmoil The changes over time to inserts, deletes and updates. If fixing bugs continues to provoke much turmoil then you’re not ready to release. The example chart shows the progress of system testing of a system over 4 months
Feature stability • In release n some feature is bug-free in release n+1 it isn’t. • What’s going on? • This chart can warn you that the software is suffering from the infinite defect loop syndrome • Create it using a spreadsheet drawn from the test management tool
Build quality • As builds are made and tested track the severity of the total bugs on a spreadsheet. • This example says quite clearly Don’t release - we’re still finding severity 1 bugs (13) and severity 2 bugs (48)
What is going on? Are we creating more bugs than we find or are we just testing better?
If not now, when? • Are we on track? • Will we ever be on track?
Four ways to fail • Hitting the wall before release • 90% done • Endless testing (aka infinite defect loop) • Version 2 See [Mangione]
Four outcomes • Release now • Release later • Restructure project • Abandon project The decision to release can make or break a small company If you have a lot of pointers all saying No, then don’t do it. Saying No is part of our job. It’s what we’re paid and trusted to do. Keeping track of things tells us when we can say Yes.