Feed Back Directed Random Test Generation

Feed Back Directed Random Test Generation Carlos Pacheco1, Shuvendu K. Lahiri2, Michael D. Ernst1, and Thomas Ball2 1MIT CSAIL, 2Microsoft Research Presented By :AliasgarKagalwala

Outline: 1. Concept 2. Technique 3. Evaluation 4. Related work 5. Conclusion

Concept • This paper incorporates technique that improves random test generation by incorporating feedbacks obtained from executing test inputs. • The feedback obtained from executing sequence that guide the search towards sequences that yield new and legal object states. • In puts creating redundant states are never extended they prune the search space. • This input is checked against set of contracts and filters

Concept… • The work addresses random generation of unit tests for object-oriented programs. Implemented the technique in (RANDOOP) . • RANDOOP is fully automatic requires no input from the user except name of binary for .NET or class directory for the Java. • RANDOOP has found serious errors in widely deployed commercial software.

Test Case: Test case for java.util public static void test1() { LinkedList l1 = new LinkedList(); Object o1 = new Object(); l1.addFirst(o1); TreeSet t1 = new TreeSet(l1); Set s1 = Collections.unmodifiableSet(t1); // This assertion fails Assert.assertTrue(s1.equals(s1));

Test case shows violation of equals contract. • The set s1 returned by unmodifiable Set(Set) returns false for s1.equals(s1) this violates reflexivity of equals as specified in Suns API doc. • The other error is in TreeSet(Collection) contsructor which failed to throw ClassCastException as required by specification

An obj oriented unit test consist of sequence of method calls that set up a state (such as creating and mutating obj) Extending Sequence: m is a method with formal parameters of type T1,..Tk Seqs is the list of sequence Vals is a list of values .either its primitive or return value s.i of the ith method call.

Feedback directed algorithm for sequence. It builds seq incrementally It has four inputs. Selecting public method of classes for seq creation. randomSeqsAndVals() builds list of seq and values newSeq is result of applying extension operator execute method executes each method call in seq and checks contract .

List of default contracts checked by RANDOOP. RANDOOP outputs two input sets nonErrorSeqs and ErrorSeqs as Junit /Nunit test along with assertion representing the contract checked.

Filtering • Determines which values of a sequence are extensible and should be used as the input to next method call. • As a result of applying filter to a sequence s the filter may set some s.i extensible flags to false, so that this value will not be used as input to a new method call. • There are basically three filters that RANDOOP uses by default: • EQUALITY,NULL,EXCEPTION.

Equality: • This filter uses equals() method to determine if the resulting obj has been created. • The filter maintains a set allobjsof all extensible objects that have been created by algorithm across all sequence of execution. • This heuristic prunes any object with the same abstract value as a previously created value even if their concrete representation differ. • This might cause RANDOOP to miss an error if method calls on them behave differently.

NULL: • Null dereference exception occur in absence of null value in i/p it signifies some internal problem with the method. • Null arguments are hard to detect statically because arguments in the sequence themselves are output of other sequence. • Instead null filter checks the values computed by execution of specific sequence.

Exception: Exception frequently correspond to pre condition of violation for a method. Extension of the sequence would lead to exception before the execution completes.

Repetition. Repeated calls to add may be necessary to reach the code that increases the capacity of container object. Or Repeated calls may be required to create two equivalent objects that can cause a method like equals to go down certain branches. • Thus repetition is build in the generator. As follows • When generating a new sequence, with probabilityN, instead of appending a single call of a chosen method m to create a new sequence, the generator appends M calls, where M is chosen uniformly at random between 0 and some upper limit max. (max andN are user-settable; the default values are max = 100 and N = 0:1.)

Evaluation.

Evaluation… Container class have been used to evaluate input generation technique. Four container classes: a binary tree (BinTree,154 LOC), a binomialheap(BHeap, 355 LOC), a fibonacci heap (FibHeap, 286 LOC), and a red-black tree (TreeMap, 580 LOC). They compared the coverage achieved by six techniques. • model checking, • model checking with state matching, • model checking with abstract state matching, (4) symbolic execution (5) symbolic execution with abstract state matching, (6) undirected random generation.

Evaluation… • For each < technique, container > pair we report the maximum coverage achieved , and the time when maximum coverage was reached as shown by experiment.

Checking API Contract:

Checking API Contract. • In this experiment, they used feedback-directed random generation, undirected random generation, and systematic generation to create test suites for 14 widely-used libraries comprising a total of 780KLOC. • To reduce the amount of test cases they had to inspect, they implemented a test runner called REDUCE. • REDUCE only shows subset of failing test. • REDUCE partitions the failing test into equivalence classes, two test fall into the same class if their execution leads to contract violation.

They ran RANDOOP on a library, specifying all the public classes as targets for testing. Using RANDOOP's default parameters The output of this test suite was.?? Test cases generated. The size of the test suite (number of unit tests) output by RANDOOP. Violation-inducing test cases. The number of violationinducing test cases output by RANDOOP. REDUCE reported test cases. The number of violation inducing test cases reported by REDUCE Errors. The number of distinct errors uncovered by the error-revealing test cases. We count two errors as distinct if fixing them would involve modifying different source code. Errors per KLOC. The number of distinct errors divided by the KLOC count for the library.

Feedback-directed random generation

Errors Discovered were RANDOOP created a total of 4200 distinct violation-inducing test cases. Of those, REDUCE reported approximately 10% .Out of the 424 tests that REDUCE reported, 254 were error-revealing. The other 170 were illegal uses of the libraries

Undirected Random Testing • RANDOOP was tested using the same parameters, but disabling the user of filters or contracts to guide generation. • The result obtained was. Undirected generation did not find any errors in java.util or javax.xml, and was unable to create the sequence that uncovered the infinite loop in System.Xml.

Regression and compliance testing • feedback-directed random testing to find inconsistencies between different implementations of the same API. • RANDOOP guesses observer methods using a simple strategy: a method is an observer if all of the following hold: • (i) it has no parameters, • (ii) it is public and non-static • (iii) it returns values of primitive type (or String), and • (iv) its name is size, • count, length, toString, or begins with get or is.

Related Work: • Automatic test generation is active research area we focus on input generation technique that create method sequence.

Related Work… • Random Testing: • JCrasher : Creates test inputs using a parameter graph whose values can serve a as an input parameters. RANDOOP uses component set of previously created sequence. • Feedback directed test generation was introduced by the Eclat tool. Eclat's performance is sensitive to the quality of the sample execution given as an input to the tool. Since RANDOOP does not require a sample execution, it is not sensitive to this parameter. • An experimental comparison of Eclat and RANDOOP is an interesting avenue for the future work.

Related Work… Systematic Testing: • Bounded exhaustive generation has been implemented in Rostra and JPF and RANDOOP with some differences. • An alternative to bounded exhaustive approach is symbolic execution implemented in Symtra etc. • Check n Crash creates abstract constraints over input that cause exceptional behavior and uses constraint solver to derive test inputs.

Combining Random and Systematic Testing. • DART a symbolic execution approach that integrates random input generation. • RANDOOP is closer to random-systematic spectrum though it is random input generator it uses systemization to be more effective.

Conclusion • Feedback directed random testing scales to a large extend and finds errors quickly to heavily tested application. • Combining random testing and systematic testing gives advantage of both. • Notion of exploration using a component set or state matching when there are many object can be translated into exhaustive test domain.

References: • C. Pacheco, S. K. Lahiri, M. D. Ernst, and T. Ball. Feedback-directed random test generation. In Proc. 29th ACM/IEEE International Conference on Software Engineering (ICSE), pages 75{84. IEEE,May2007. • W. Visser, C. S. Pasareanu, and R. Pel´anek. Test input generation for Java containers using state matching. In ISSTA, pages 37.48, July 2006. • C. Pacheco and M. D. Ernst. Eclat: Automatic generation and classication of test inputs. In ECOOP, pages 504.527, July 2005. • T. Xie, D. Marinov, and D. Notkin. Rostra: A framework fordetecting redundant object-oriented unit tests. In ASE, pages 196.205, Nov. 2004. • T. Xie, D. Marinov, W. Schulte, and D. Notkin. Symstra:A framework for generating object-oriented unit tests using symbolic execution. In TACAS, pages 365.381, Apr. 2005.

Question???? • THANK YOU..

Feed Back Directed Random Test Generation