290 likes | 313 Views
Automatic System Testing of Programs without Test Oracles. Christian Murphy, Kuang Shen, Gail Kaiser Columbia University. Problem Statement. Some applications ( e.g. machine learning, simulation) do not have test oracles that indicate whether the output is correct for arbitrary input
E N D
Automatic System Testing of Programs without Test Oracles Christian Murphy, Kuang Shen, Gail Kaiser Columbia University
Problem Statement • Some applications (e.g. machine learning, simulation) do not have test oracles that indicate whether the output is correct for arbitrary input • Oracles may exist for a limited subset of the input domain, and gross errors (e.g. crashes) can be detected with certain inputs or techniques • However, it is difficult to detect subtle (computational) errors for arbitrary inputs in such “non-testable programs”
Observation • If there is no oracle in the general case, we cannot know the expected relationship between a particular input and its output • However, it may be possible to know relationships between sets of inputs and the corresponding set of outputs • “Metamorphic Testing” [Chen et al. ’98] is such an approach
Metamorphic Testing • An approach for creating follow-up test cases based on previous test cases • If input x produces output f(x), then the function’s “metamorphic properties” are used to guide a transformation function t, which is applied to produce a new test case input, t(x) • We can then predict the expected value of f(t(x)) based on the value of f(x) obtained from the actual execution
Metamorphic Testing without an Oracle • When a test oracle exists, we can know whether f(t(x)) is correct • Because we have an oracle for f(x) • So if f(t(x)) is as expected, then it is correct • When there is no test oracle, f(x) acts as a “pseudo-oracle” for f(t(x)) • If f(t(x)) is as expected, it is not necessarily correct • However, if f(t(x)) is not as expected, either f(x) or f(t(x)) (or both) is wrong
Metamorphic Testing Example • Consider a program that reads a text file of test scores for students in a class, and computes theaverages and the standard deviation of the averages • If we permute the values in the text file, the results should stay the same • If we multiply each score by 10, the final results should all be multiplied by 10 as well • These metamorphic properties can be used to create a “pseudo-oracle” for the application
Limitations of Metamorphic Testing • Manual transformation of the input data or comparison of output can be laborious and error-prone • Comparison of outputs not always possible with tools like diff when they are not expected to be “exactly” the same
Our Solution • Automated Metamorphic System Testing • Tester needs to: • Specify the application’s metamorphic properties • Configure the testing framework • Run the application with its test input • Framework takes care of automatically: • Transforming program input data • Executing multiple instances of the application with different transformed inputs in parallel • Comparing outputs of the executions
Amsterdam: Automated Metamorphic System Testing Framework • Metamorphic properties are specified in XML • Input transformation • Runtime options • Output comparison • Framework provides out-of-box support for numerous transformation and comparison functions but is extendable to support custom operations • Additional invocations are executed in parallel in separate sandboxes that have their own virtual execution environment [Osman et al.OSDI’02]
Empirical Studies • To measure the effectiveness of the approach, we selected three real-world applications from the domain of supervised machine learning • Support Vector Machines (SVM): vector-based classifier • C4.5: decision tree classifier • MartiRank: ranking application
Methodology (1) • Mutation testing was used to seed defects into each application • Comparison operators were reversed • Math operators were changed • Off-by-one errors were introduced • For each program, we created multiple variants, each with exactly one mutation • Weak mutants (that did not affect the final output) were discarded, as were those that caused outputs that were obviously wrong
Methodology (2) • Each variant (containing one mutation) acted as a pseudo-oracle for itself: • Program was run to produce an output with the original input dataset • Metamorphic properties applied to create new input datasets • Program run on new inputs to create new outputs • If outputs not as expected, the mutant had been killed (i.e. the defect had been detected)
Metamorphic Properties • Each application had four metamorphic properties specified, based on: • Permuting the order of the elements in the input data set • Multiplying the elements by a positive constant • Adding a constant to the elements • Negating the values of the elements in the input data • Testing was conducted using our implementation of the Amsterdam framework
SVM Results • Permuting the input was very effective at killing off-by-one mutants • Many functions in SVM perform calculations on a set of numbers • Off-by-one mutants caused some element of the set to be omitted • By permuting, a different number would be omitted • The results of the calculations would be different, revealing the defect
C4.5 Results • Negating the input was very effective • C4.5 creates a decision tree in which nodes contain clauses like “if attrn > α then class = C” • If the data set is negated, those nodes should change to “if attrn≤ -α then class = C”, i.e. both the operator and the sign of α • In most cases, only one of the changes occurred
MartiRank Results • Permuting and negating were effective at killing comparison operator mutants • MartiRank depends heavily on sorting • Permuting and negating change which numbers get compared and what the result should be, thus inducing the differences in the final sorted list
Summary of Results • 143 mutants killed out of 182 (78%) • Permuting or negating the inputs proved to be effective techniques for killing mutants because of the mathematical nature of the applications • Multiplying and adding were not effective, possibly because of the nature of the mutants we inserted
Benefits of Automation • For SVM, all of the metamorphic properties called for the outputs to be the same as the original • But in practice we knew they wouldn’t be exactly the same • Partly due to floating point calculations • Partly due to approximations in the implementation • We could use Heuristic Metamorphic Testing to allow for outputs that were considered “close enough” (either semantically or to within some tolerance)
Effect on Testing Time • Without parallelism, metamorphic testing introduces at least 100% overhead since the application must be run at least twice • In our experiments on a multi-core machine, the only overhead came from creating the “sandbox” and comparing the results • less than one second for a 10MB input file
Limitations and Future Work • Framework Implementation • The “sandbox” only includes in-process memory and the file system, but not anything external to the system • The framework does not yet address fault localization • Approach • Approach requires some knowledge of the application to determine the metamorphic properties in the first place • Need to investigate applicability to other domains • Further applicability of Heuristic Metamorphic Testing to non-deterministic applications
Contributions • A testing technique called Automated Metamorphic System Testing that facilitates testing of non-testable programs • An implementation called Amsterdam • Empirical studies demonstrating the effectiveness of the approach
Automatic System Testing of Programs without Test Oracles Chris Murphy cmurphy@cs.columbia.edu http://psl.cs.columbia.edu/metamorphic
Related Work • Pseudo-oracles [Davis & Weyuker ACM’81] • Testing non-testable programs [Weyuker TCJ’82] • Overview of approaches [Baresi and Young ’01] • Embedded assertion languages • Extrinsic interface contracts • Pure specification languages • Trace checking & log file analysis • Using metamorphic testing [Chen et al.JIST’02; others]
Related Work • Applying Metamorphic Testing to “non-testable programs” • Chen et al. ISSTA’02 (among others) • Automating metamorphic testing • Gotleib & Botella COMPSAC’03
Categories of Metamorphic Properties • Additive: Increase (or decrease) numerical values by a constant • Multiplicative: Multiply numerical values by a constant • Permutative: Randomly permute the order of elements in a set • Invertive: Reverse the order of elements in a set • Inclusive: Add a new element to a set • Exclusive: Remove an element from a set • Others…. • ML apps such as ranking, classification, and anomaly detection exhibit these properties [Murphy SEKE’08]
Further Testing • For each app, additional data sets were used to see if more mutants could be killed • SVM: 18 of remaining 19 were killed • MartiRank: 6 of remaining 19 were killed • C4.5: one remaining mutant was killed
Heuristic Metamorphic Testing • Specify metamorphic properties in which the results are may be “similar” but not necessarily exactly the same as predicted • Reducing false positives by checking against a difference threshold when comparing floating point numbers • Addressing non-determinism by specifying heuristics for what is considered “close”