330 likes | 521 Views
Automated Test Data Generation. Maili Markvardt. Outline. Introduction Test data generation problem Black-box approach White-box approach. Introduction. Improtance of testing growing since mission-criticality of the software in our everyday life Software errors are more costly than ever
E N D
Automated Test Data Generation Maili Markvardt
Outline • Introduction • Test data generation problem • Black-box approach • White-box approach
Introduction • Improtance of testing growing since mission-criticality of the software in our everyday life • Software errors are more costly than ever • Testing can be automated • Test execution automation • Test generation automation • Test data generation automation
Problem: example • User inputs three sides of a triangle (a, b, c). Which type is the triangle? • Requirements: • IF a<=0 || b<=0 || c<=0 -> input incorrect • IF p*(p-a)*(p-b)*(p-c) < 0 -> sides not forming a triangle • IF a==b || a==c || b ==c -> isoceles • Kui a==b & b==c -> eqilateral • Other -> scalene • What strategy? -> what data?
Input validation automation • The Concept “side of a triangle” equivalence partitions and boundary values • Normal: ]0; ∞[ • Erroneous: ]- ∞; 0[, missing values • Border values: {0} • For testing the Input validation functionality, pick a random value from each equivalence partition for each side: • P(-1, 2, 3), P(1, -2, 3), P(1, 2, -3) • Same with boundary values • P(0, 1, 2), P(1, 0, 2), P(1, 2, 0) • Input validation with “normal” values • P(1, 2, 3)
What about other requirements? • If input values are dependent and that affects output, random values can not be used! – we may not be able to find needed values with random generation • if p*(p-a)*(p-b)*(p-c) < 0 -> sides don’t form a triangle • We must use specification-based (Black-box) or program-based (White-box) test data generation
Black-Box approach • Generating test data from formal specifications (ie. Z-notation) • Classification Tree Method (CTM) • ...
Classification Tree Method4 • Based on equivalence partitions method: input and output properties are divided into equivalence partitions • Equivalence partitions are combined into test cases • The goal: minimal but sufficient amount of test cases 4Dai, Z. R., Deussen, P. H. Automatic Test Data Generation for TTCN-3 Using Classification Tree Method (2005)
CTM • Equivalence partitions form a tree structure • Input dependencies are not resolved
White-Box approach • White-box test data generation – based on program structure • Test data generation problem: For program P and path u, find input x S, so that P(x) traverses path u, where S is the set of all input values • Remember: white-box approach is based on program formalisation (graph, FSM, …)
Test data generator structure2 2Edvardsson, J. Contributions to Program- and Specification-based test data generation. (2002). www.ida.liu.se/~joned/papers/joned_lic.pdf
Possible strategies (adequacy criteria) • Statement coverage • Branch coverage • Condition coverage • Multiple-condition coverage • Path coverage • …
Numerous methods for Constraint generator & Constraint solver • Symbolic Execution • Actual Execution • Symbolic/Actual Execution hybrid • Simulated Annealing • Iterative Relaxation Technique • Chaining Approach • Genetic Algorithms • MEA-Graph Planning • ...
Symbolic Execution2 • Popular static method for finding path constraints • Path constraints are rewritten using input variables • Not suitable for programs using pointers and arrays • Not suitable for programs using precompiled units read(a,b) c=a+b; d=a-b; e=c*d; if (e>5) {...} a*a – b*b > 5 =>
Actual Execution2 • Program is executed several times • On every execution: check, whether or not the desired path is executed • If desired path is not executed, program is re-executed with slightly modified input values • Program is re-executed until desired path is traversed or user-defined limit (time, execution count) is exceeded • Solves some problems of symbolic execution since values of variables are available
Actual Execution • For each path condition biobjective function is found: • Fi(x) {<|<=|=} 0 • If Fi(x) {<|<=|=} 0, then current path is executed • F(x)= Σ Fi(x), if branch consists of several conditions • How to minimize objective function so that Fi (x)=0 • In other words, what input values are needed to execute desired path?
Simulated Annealing • Simulated Annealing – generic probabilistic meta-algorithm for finding good approximation to the global optimum for a given function in a large search space • Analogy from metallurgy: Process of annealing is used for reducing defects in material • Metal is heated: atoms start to move • Metal is cooled down slowly: greater probability that atoms find a “suitable” place
Simulated Annealing • Goal: minimize the objective function -> desired path is executed • Find a “random” solution for objective function • Compare the solution with current solution of objective function • Decide, whether or not the “random” solution is better than current solution
Simulated Annealing • If “random” solution gives a better value (closer to 0) for objective function, the “random” value is always chosen (probability is 1) • If “random” solution is not better than current solution – “sometimes” it is chosen – depends of the “temperature” • The value of “temperature” is decreased • In the beginning high “temperature” -> almost every solution is chosen • When temperature is lowered, the probability of choosing worse solution is lowered until it is 0
Simulated Annealing: properties • Choosing worse solutions in the beginning lowers the probability of getting stuck in a local optimum (drawback of Gradient Descent/Hill Climbing/Greedy algorithms) • It is possible to show, that probability of finding global optimum is almost 1 • Little use in practice, since finding the global optimum with sufficient significance by annealing takes more time than full search of the whole search space
Simulated Annealing • Parameters for successful simulated annealing: Art rather than science • How to find a “random” solution – how to minimize the count of iterations finding the optimum? • How to determine, whether or not the “worse” solution is picked? • Annealing schedule – from what “temperature” to start and how the “temperature” is lowered?
Genetic algorithms • Imitates the process of natural selection • Evaluation • Choice • Recombination and mutation • Start with random set of solutions - population • Solutions are evaluated for their fitness – ability to generate good offspring • Chosen (good) solutions are recombined and mutated to generate a new generation of solutions
GA for test data generation5 • Algorithm is driven by control dependency graph of the program • Graph nodes = program statements • Graph edges = control dependencies between program statements • Goal: find data for executing certain node (program sentence) • Node X is post-dominated by node Y, if every directed path from X to the end of the program includes node Y 5Pargas, R., Harrold, J.M., Peck, R.R. Test-Data generation Using genetic Algorithms (1999)
GA for test data generation • Node Y is control dependent of Y, only if • Exists a directed path from Y to X and all nodes on this path (except X,Y) are post-dominated by Y and • X is not post-dominated by Y • Control dependency predicate path (CDPP)– predicates that must be satisfied on acyclic path from initial node to some other node X
GA for TDG: algorithm • Solution is set of test data • Start with random set • Evaluate fitness of data • Execute the program with data, mark predicates on executed path • Compare the found set of predicates with CDPP to desired node • The more the found set of data allowed to execute CDPP, the better the data is
GA for TDG • Best solutions are chosen, recombined and mutated to generate a new generation of solutions • Non-typical application of GA – several possible solutions, depending on the test goal • ie. find data for executing nodes A, C – more than one test may be needed if A and C are exclusive!
GA: example int i, j, k; 1: read i, j, k; 2: if (i<j) { 3: if (j<k) { 4: i=k; } else { 5: k=i; } } 6: print i, j, k; 5 is test goal, CDGpath {ET, 2T, 3F}
GA: example • Random population • Fitness f{2, 2, 0, 0} • Probability of choosing pi = fi/Σfj ({0.5,0.5,0,0}) • One solution can be chosen more than once
GA: Näide • New population {(1, 6, 9), (0, 1, 4), (0, 1, 4), (0, 1, 4)} • Recombination (one-point crossover): n first values form one parent and others from the other parent • N=2: {(1, 6, 4), (0, 1, 9)} • Mutation: Value in random position is replaced with a random number • (0, 6, 4), (5, 1, 4)
Summary • Numerous methods • Black-Box • White-Box • Choice of methods depends on • knowledge and preference of tester, • Technology of SUT
Viited • 1Edvardsson, J. A survey on Automatic Test Data Generation. (1999). [WWW] www.ida.liu.se/~joned/papers/class_atdg.pdf • 2Edvardsson, J. Contributions to Program- and Specification-based test data generation. (2002). [WWW] www.ida.liu.se/~joned/papers/joned_lic.pdf • 3Gupta, N., M, Mathur, A., Soffa, M.L. Automated Test Data Generation Using an Iterative Relaxation Method (1999)
Viited • 4Dai, Z. R., Deussen, P. H. Automatic Test Data Generation for TTCN-3 Using Classification Tree Method (2005) • 5Pargas, R., Harrold, J.M., Peck, R.R. Test-Data generation Using genetic Algorithms (1999)