1 / 29

Automatic System Testing of Programs without Test Oracles

Automatic System Testing of Programs without Test Oracles. Christian Murphy, Kuang Shen, Gail Kaiser Columbia University. Problem Statement. Some applications ( e.g. machine learning, simulation) do not have test oracles that indicate whether the output is correct for arbitrary input

stanfordw
Download Presentation

Automatic System Testing of Programs without Test Oracles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic System Testing of Programs without Test Oracles Christian Murphy, Kuang Shen, Gail Kaiser Columbia University

  2. Problem Statement • Some applications (e.g. machine learning, simulation) do not have test oracles that indicate whether the output is correct for arbitrary input • Oracles may exist for a limited subset of the input domain, and gross errors (e.g. crashes) can be detected with certain inputs or techniques • However, it is difficult to detect subtle (computational) errors for arbitrary inputs in such “non-testable programs”

  3. Observation • If there is no oracle in the general case, we cannot know the expected relationship between a particular input and its output • However, it may be possible to know relationships between sets of inputs and the corresponding set of outputs • “Metamorphic Testing” [Chen et al. ’98] is such an approach

  4. Metamorphic Testing • An approach for creating follow-up test cases based on previous test cases • If input x produces output f(x), then the function’s “metamorphic properties” are used to guide a transformation function t, which is applied to produce a new test case input, t(x) • We can then predict the expected value of f(t(x)) based on the value of f(x) obtained from the actual execution

  5. Metamorphic Testing without an Oracle • When a test oracle exists, we can know whether f(t(x)) is correct • Because we have an oracle for f(x) • So if f(t(x)) is as expected, then it is correct • When there is no test oracle, f(x) acts as a “pseudo-oracle” for f(t(x)) • If f(t(x)) is as expected, it is not necessarily correct • However, if f(t(x)) is not as expected, either f(x) or f(t(x)) (or both) is wrong

  6. Metamorphic Testing Example • Consider a program that reads a text file of test scores for students in a class, and computes theaverages and the standard deviation of the averages • If we permute the values in the text file, the results should stay the same • If we multiply each score by 10, the final results should all be multiplied by 10 as well • These metamorphic properties can be used to create a “pseudo-oracle” for the application

  7. Limitations of Metamorphic Testing • Manual transformation of the input data or comparison of output can be laborious and error-prone • Comparison of outputs not always possible with tools like diff when they are not expected to be “exactly” the same

  8. Our Solution • Automated Metamorphic System Testing • Tester needs to: • Specify the application’s metamorphic properties • Configure the testing framework • Run the application with its test input • Framework takes care of automatically: • Transforming program input data • Executing multiple instances of the application with different transformed inputs in parallel • Comparing outputs of the executions

  9. Model

  10. Amsterdam: Automated Metamorphic System Testing Framework • Metamorphic properties are specified in XML • Input transformation • Runtime options • Output comparison • Framework provides out-of-box support for numerous transformation and comparison functions but is extendable to support custom operations • Additional invocations are executed in parallel in separate sandboxes that have their own virtual execution environment [Osman et al.OSDI’02]

  11. Empirical Studies • To measure the effectiveness of the approach, we selected three real-world applications from the domain of supervised machine learning • Support Vector Machines (SVM): vector-based classifier • C4.5: decision tree classifier • MartiRank: ranking application

  12. Methodology (1) • Mutation testing was used to seed defects into each application • Comparison operators were reversed • Math operators were changed • Off-by-one errors were introduced • For each program, we created multiple variants, each with exactly one mutation • Weak mutants (that did not affect the final output) were discarded, as were those that caused outputs that were obviously wrong

  13. Methodology (2) • Each variant (containing one mutation) acted as a pseudo-oracle for itself: • Program was run to produce an output with the original input dataset • Metamorphic properties applied to create new input datasets • Program run on new inputs to create new outputs • If outputs not as expected, the mutant had been killed (i.e. the defect had been detected)

  14. Metamorphic Properties • Each application had four metamorphic properties specified, based on: • Permuting the order of the elements in the input data set • Multiplying the elements by a positive constant • Adding a constant to the elements • Negating the values of the elements in the input data • Testing was conducted using our implementation of the Amsterdam framework

  15. SVM Results • Permuting the input was very effective at killing off-by-one mutants • Many functions in SVM perform calculations on a set of numbers • Off-by-one mutants caused some element of the set to be omitted • By permuting, a different number would be omitted • The results of the calculations would be different, revealing the defect

  16. C4.5 Results • Negating the input was very effective • C4.5 creates a decision tree in which nodes contain clauses like “if attrn > α then class = C” • If the data set is negated, those nodes should change to “if attrn≤ -α then class = C”, i.e. both the operator and the sign of α • In most cases, only one of the changes occurred

  17. MartiRank Results • Permuting and negating were effective at killing comparison operator mutants • MartiRank depends heavily on sorting • Permuting and negating change which numbers get compared and what the result should be, thus inducing the differences in the final sorted list

  18. Summary of Results • 143 mutants killed out of 182 (78%) • Permuting or negating the inputs proved to be effective techniques for killing mutants because of the mathematical nature of the applications • Multiplying and adding were not effective, possibly because of the nature of the mutants we inserted

  19. Benefits of Automation • For SVM, all of the metamorphic properties called for the outputs to be the same as the original • But in practice we knew they wouldn’t be exactly the same • Partly due to floating point calculations • Partly due to approximations in the implementation • We could use Heuristic Metamorphic Testing to allow for outputs that were considered “close enough” (either semantically or to within some tolerance)

  20. Effect on Testing Time • Without parallelism, metamorphic testing introduces at least 100% overhead since the application must be run at least twice • In our experiments on a multi-core machine, the only overhead came from creating the “sandbox” and comparing the results • less than one second for a 10MB input file

  21. Limitations and Future Work • Framework Implementation • The “sandbox” only includes in-process memory and the file system, but not anything external to the system • The framework does not yet address fault localization • Approach • Approach requires some knowledge of the application to determine the metamorphic properties in the first place • Need to investigate applicability to other domains • Further applicability of Heuristic Metamorphic Testing to non-deterministic applications

  22. Contributions • A testing technique called Automated Metamorphic System Testing that facilitates testing of non-testable programs • An implementation called Amsterdam • Empirical studies demonstrating the effectiveness of the approach

  23. Automatic System Testing of Programs without Test Oracles Chris Murphy cmurphy@cs.columbia.edu http://psl.cs.columbia.edu/metamorphic

  24. Related Work • Pseudo-oracles [Davis & Weyuker ACM’81] • Testing non-testable programs [Weyuker TCJ’82] • Overview of approaches [Baresi and Young ’01] • Embedded assertion languages • Extrinsic interface contracts • Pure specification languages • Trace checking & log file analysis • Using metamorphic testing [Chen et al.JIST’02; others]

  25. Related Work • Applying Metamorphic Testing to “non-testable programs” • Chen et al. ISSTA’02 (among others) • Automating metamorphic testing • Gotleib & Botella COMPSAC’03

  26. Categories of Metamorphic Properties • Additive: Increase (or decrease) numerical values by a constant • Multiplicative: Multiply numerical values by a constant • Permutative: Randomly permute the order of elements in a set • Invertive: Reverse the order of elements in a set • Inclusive: Add a new element to a set • Exclusive: Remove an element from a set • Others…. • ML apps such as ranking, classification, and anomaly detection exhibit these properties [Murphy SEKE’08]

  27. Specifying Metamorphic Properties

  28. Further Testing • For each app, additional data sets were used to see if more mutants could be killed • SVM: 18 of remaining 19 were killed • MartiRank: 6 of remaining 19 were killed • C4.5: one remaining mutant was killed

  29. Heuristic Metamorphic Testing • Specify metamorphic properties in which the results are may be “similar” but not necessarily exactly the same as predicted • Reducing false positives by checking against a difference threshold when comparing floating point numbers • Addressing non-determinism by specifying heuristics for what is considered “close”

More Related