410 likes | 502 Views
Testing. Testing, the necessity. Test early, and test often!. Cost of software failure Effort needed to locate and fix bugs A lot, in particular if they occur in the latter stages of your development: 0.5 – 3.5 hour at the “unit” level 6 – 10 x more at the “system” level. Terminology.
E N D
Testing, the necessity Test early, and test often! • Cost of software failure • Effort needed to locate and fix bugsA lot, in particular if they occur in the latter stages of your development: 0.5 – 3.5 hour at the “unit” level 6 – 10 x more at the “system” level
Terminology Test object = the thing you test. SUT, CUT, … Each “test” is also called “test-case” :a single or a sequence of interactions with the test object, and where we check if along the interactions the test object satisfy one or more expectations.(you will have to provide inputs for those interactions!) A test-suite = a set of test-cases
Typical setups Test Object Test Suite <<use>> (if TO can be directly interacted to by the TS, else we’ll have to use TI) Test Interface Test Suite <<use>> Test Object
Example, testing function sqrt(x:double) : double These could be our test-cases test1() { r = sqrt(4) assert r == 2 } implementing our expectation test2() { r = sqrt(0) assert r == 0 } (actually too strong when dealing with non-integral numbers. Weaken the expectations to allow a small degree of inaccuracy , e.g. assert |r – 2 | < epsilon , for some pre-specified, small epsilon)
Testing objects • Is a bit different, because an object have operations that may influence each other through the object’s state. • An object is interacted to through a sequence of calls to its operation. • So, it is natural to use such a sequence as a test-case to test an object. • What do you want to verify in your test-case ? • Post-conditions of the operations of Person • Person’s class invariant. Person - credit : int + buy(product) + getMoreCredit(e : Euro)
Testing objects is a bit different Person credit buy(product) getMoreCredit(euro) Issue-1 : you may have to test with respect to different interaction sequences. e.g. how aboutmoreCredit ; buy ; moreCreditmoreCredit ; moreCredit ; buy class MyTestClass { test1() { x = new Person(“Bob”)x.getMoreCredit(10) c0 = x.credit x.buy(apple) assertx.credit == c0 – apple.price()assertx.credit >= 0 } test2() { … } } Issue-2a : subclassing. We need to test that Person does not violate Liskov. Issue-2b : what we can’t be sure that e.g. every subclass of Product respect sLiskov,; this may break your class Person. We can still test Person where we call buy, with instances of subclasses of Product; Unfortunately this explodes the cost.
Well, while we are talking about testing…writing a test-class using JUnit import org.junit.* ; import static org.junit.Assert.* ; public class PersonTest{ @Test public void test0 () { System.out.print(“** testing if the initial credit is ok…”) Person P = new Person(“Bob”) ; assertTrue(P.getCredit () == 0) ; System.out.println(“pass”) ; } @Test public void test2 () { …. Person P = new Person(“Bob”) ; P.getMoreCredit(10) ; Product a = new Apple() ; P.buy(a) ; assertTrue(P.getCredit() == 98) ; … } }
How to determine which test-cases to give ? We can try to just choose the inputs randomly.Not very systematic too expensive if each test-case has to be hand crafted (which is usually the case in practice). Idea : to systematically test, divide the “input domain” of SUT into (disjoint) “partitions”.Hypothesis: SUT behaves “equivalently” on inputs from the same partitions. Therefore it is sufficient to cover each partition once. This is also called partition-based testing. Easy to do, quite effective in practice.
Example isAdult(person) { … } • Propose this partitions: • persons older than 17 yrs • persons 17 yrs or younger • invalid persons Test cases e.g.: tc1 : test with 1x person age 40y tc2 : test with 1x person age 4y tc3 : test with person =null
Using “classification tree” hierarchical partitioning This actually has two parameters: the student that receives the operation, and the grade you pass to it. Student addGrade(grade) “category” Grade Student NotSufficient Sufficient Invalid Invalid Bachelor Master 5.0 ≤ c < 5.5 < 5.0 > 10 < 0 “class” When you divide a partition P into sub-partitions X,Y,Z … the sub-partitions must be disjoint. You can now for example require that each lowest level partition (called “class”) of each category should be covered at least once.
CTE: graphical tool to manually specify the combinations you want Student Grade Insufficient Invalid Invalid Sufficient < 5.0 > 10 5.0 ≤ c < 5.5 < 0 Bachelor Master 5.5 ≤ c ≤ 6.0 > 6.0 TC1 TC2 TC3 … These would fully cover the combinations of Invalid student and Invalid grade; the rest are minimally covered.
Combinatoric testing Grade Student • We can try to test all possible combinations of student’s and grade’s classes. • This is called ‘full combinations set’. • Can generate a lot of test-cases : N = #Student x #Gradewhere #C is the number of “classes” in the category C • Explode if you have more categories (imagine hundreds of thousands test-cases!) NotSufficient Sufficient Invalid Invalid Bachelor Master 5.0 ≤ c < 5.5 < 5.0 > 10 < 0
Combinatoric testing Grade Student • We can try “minimal combinations set”; it is the smallest set that contains each class at least once. • Generate few test cases: N = max(#Student,#Grade) • It could be too few ! • Else, in some approaches you can declaratively specify the combinations you want; e.g. something like (Student.Invalid /\ Grade.Invalid) (Master /\ NotSufficient) NotSufficient Sufficient Invalid Invalid Bachelor Master 5.0 ≤ c < 5.5 < 5.0 > 10 < 0
Boundary testing Grade Insufficient partitions without specifically trying to pick boundary values 5.0 ≤ c < 5.5 < 5.0 HB LB M HB LB M adding boundaries 5.0 5.2 5.49 0.0 4.5 4.99 concrete test inputs Hypothesis: faults are often made at the boundary of your input domains. Therefore, also test boundary values. More thorough… but can explode your number of combinations.
Positive and negative test • Positive test: test the test object vs valid inputs. • Negative test: test it vs invalid inputs. It usually done to check the test-object’s error handling mechanism.May not be relevant when testing unit-level functions or classes.You must do it when testing a system. E.g. to make sure that it does not crash on invalid inputs.
Using concrete values as expectations Person credit buy(product) getMoreCredit(euro) test2 () { Person P = new Person(“Bob”) ; P.getMoreCredit(10) ; Product a = new Apple() ; P.buy(a) ; assertTrue(P.getCredit() == 98) ; } • Comparison to concrete values are often used to express test expectation; but this has drawbacks: • You have to calculate it by hand, and it can be non-trivial to calculate. • If the business logic changes, you have to recalculate them maintenance issue.
Property-basedtesting, more robust 0..1 has Person addEmail(email) * Email emails owner prop_personEmail(p,e) { assertp.getEmails.contains(e) asserte.getOwner() == p } test() { p = newPerson(“BOB”) e = new Email(“BOB@BOB.NET”) p.addEmail(e) assertp.getEmails().contains(e) asserte.getOwner() == p } prop_personEmail(p,e) But sometimes, your expectation can be generalized to a “property”, which parametric. “Properties” are much more robust for maintenance. Furthermore, you can now use generators to generate your test-inputs, since you will re-use the same properties for expectation.
Your OCL specs can be converted to “test properties” 0..1 has Person addEmail(email) * Email emails owner Person_classinv(p) { for (Email e : p.getEmails()) asserte.getOwner() == p } context p : Person inv : p.emails forall(e | e.owner = p) context p : Person :: addEmail(e : Email) pre : e null post : p.emails.includes(e) addMail_spec(p,e) { if (e==null) throwIllegalArg r = p.addEmail(e) assertp.getEmails.contains(e) return r } test0() { p = … e = … addMail_spec(p,e) Person_classinv(p) } A test case now looks like this:
Concete expectations vs property-based Property-based + Make automated testing much easier. + Properties are more robust. - The completeness of your properties-set determine the strength of your test. But it is not always easy to write a complete set of properties. • Concrete-expectations • - Cannot be automated • Not robust • + You don’t need to formalize nor implement any specification.
Coverage • Every method of in every class C must be exercised. • Every line in every method of a class C must be exercised. • Every partition in the classification tree of method m must be tried. Because your “resource” is limited, you can’t test all possible behavior je have to decide when it is enough. A pragmatic approach is to define a quantitative goal, like: Coverage : how much (in %) of such a goal is accomplished. Too little coverage implies you are not done yet. Full coverage (100%) gives you a ground to stop testing; but it does not imply correctness.
Code-based coverage decision point Abstractly: decision branch P(x,y) { if (even(x)) x = x/2 elsex-- if (even(y)) y = y/2 else y-- return x+y } even(x) even(x) even(y) even(y) Line coverage (previous slide) Decision coverage : all decision branches are exercised Path coverage : all possible “execution paths” are exercised
Coverage strength Coverage criteria differs in “strength”. Coverage criterion A is stronger than criterion B = full coverage wrt A implies full coverage wrt B. Path coverage is stronger than decision; and decision is stronger than line. But stronger criterion typically means you need more test-cases cost.For previous example :
Path-based coverage d 0 1 b Subpaths of length 2:ae bc, bd ce da, db a c 2 e 3 Strong, but unfortunately the number of paths to cover can explode exponentially, or even infinite if you have a loop or recursion. More practical solution: pair-wise path coverage.The goal is to cover all execution subpaths of length 2. Example:
Pair-wise path vs decision cov. d 2 test-cases can give full decision coverage; these are test-cases that give these executions: (see colored arrows) 0 1 b a c 2 Subpaths of length 2:ae bc, bd ce da, db Requre 3 test-cases to fully cover (in above test-cases we still miss the subpath db): bce bdae bdbce e 3 Pair-wise path coverage can be generalized to k-wise path coverage; stronger if k is bigger.
Testing your candy machine • In this case, the sequence in which we call the operations matters a lot. • Most sequences are invalid; how to make sure that we sufficiently cover the valid sequences? • Use your “state machine model” as guidance. CandyMachine insertCoin(c) turnCrank() getCandy()
The model of our candy machine turn crank [n>1] / [counter is updated] has coin [1eur] turn crank / [counter is updated] insert coin [else] / [coin is ejected] sold bonus no coin get candy turn crank / [counter is updated] [not empty] get candy [empty] empty
Model-based testing turn crank [n>1] / [counter is updated] [1eur] has coin turn crank / [counter is updated] insert coin [else] / [coin is ejected] sold bonus no coin get candy turn crank / [counter is updated] [not empty] get candy [empty] empty • The model specifies which paths are valid paths. Use it to guide in determining which paths to use as test-cases. • You can even automatically generate the test-cases (without the expectations) ! • Expectations what to check ? • What coverage criterion do you take? E.g. : • k-wise path • all paths from the start up to depth k
Commonly used overall “testing strategy” : V-model by project owner, or delegated to a 3rd party prepare requirements acceptance testing Requirement documents + use cases prepare detailed requirements system testing prepare integration testing Design prepare unit testing Detailed Design Analysis + design models development testing, by developers The “preparations” can be done parallel with development. The actual testing must wait until we have an implementation. Implementation
Fitting V in an iterative SDLC phases E.g. Unified Process: Inception Elaboration Construction Transition • requirement • analysis • design • implementation • test UP’s core workflows V VVVVV Iteration: 1 2 3 4 5 ... time
Regression test • During and after development (maintenance) you modify your software • advancing to the next iteration • bug fixes, new features, refactoring • Regression test : to test that the new version does not introduce any new error with respect to the unchanged part of the specification.By re-executing old test-cases (which are still relevant). • Problem: you may have accumulated a huge set of test-cases. Executing them all (TESTALL) may take a long time...Solution : apply a selection strategy; but you will need to invest in an infrastructure to facilitate this.
Performance testing A “virtual user” is a program that simulates a user interacting with the App. By creating more virtual users, you increase the load on the App. You typically can run multiple virtual users from a single “clientmachine”. If you need more load, you can then add more client machines. “virtual user” interaction APP DB client machine Server client machine Goal: to see how your application reacts to increasing work load. Not to be under estimated! Typical setup:
Several standard forms of performance testing expected peaks #VBs (load) normal load time • load test: to see the app’s response time under its typical load X app crashes • stress test: to see what the maximum load the app can handle before breaking
Related issues that can get in the way • Persistence • Concurrency • GUI : testability, explosion • Privacy/data security • Bad management
Persistence • DB, files form an implicit part of your SUT state! • Issues: • Interference:Via SUT, a test case may do side effect on your perisistence. You need a mechanism to undo the effect before you start the next test case. • Interactions with persistence are slow. • How to create a representative persistence? (for testing your SUT)
Creating a representative DB Suppose this is your model; every attribute and relation induce categories to partition. Once you have your classification tree, then you can proceed with determining the combinations you want, as explained before. Euro Product OID name currency price USD Persoon OID name buys 1.. Other 0.. price ≤ 1.00 1 >1 >0 0 price> 1.00 • Copy from production DB (if you already have one). Issue: • no effort is needed • usually big; slowing down queries • contains real data privacy and data security • Construct it from scratch e.g. apply the classification tree approach:
Concurrency P grabs fork F P thinks... Q think... Q grabs fork F Who actually gets the fork depends on the speed of P and Q. Consequently, when P causes an error when grabbing F, there are two issues: (1) for this error to surface we must get the timing right so that P can grab the fork (instead of Q); and (2) to duplicate the error we must be able to duplicate the timing. Ideally you need a separate infrastructure to let you fully control the oncurrency /timing; but such infrastructure is hard to set up; even then it won’t be able to contain the concurrency of components outside your system. • Concurrent execution is timing-sensitive • Problems • Some errors may be very difficult to trigger • If you do find them, they can be hard to duplicate
Risk-based test plan Risk of module x: your estimated chance that x fails. Impact of x: your estimated damage when x fails. Seems to be getting popular You have limited resources fall back to some prioritization strategy on various modules of SUT, which in turn determines the effort allocated on those modules. Typically you make calculated predictions of risk and impact of those modules. The product, possibly weighted, determines the priority. E.g Issue : how to predict risk and impact ??
An example of discovery handling procedure Report create() review() ... The SM below describes the life of a bug report; and also its handling procedure. create fix assigned fixed reported review [report ok] approve for fixing test [fail] test [pass] review [bad report] edit opened rejected closed re-opened reject problemreturn reopen deferred from: Foundation of Software Testing, 2008.