1 / 51

Empirically Revisiting the Test Independence Assumption

Empirically Revisiting the Test Independence Assumption. Sai Zhang , Darioush Jalali , Jochen Wuttke , Kıvanç Muşlu , Wing Lam, Michael D. Ernst, David Notkin University of Washington. Order dependence. Dependent t est. Two tests:. readFile (“foo”). createFile (“foo”).

parson
Download Presentation

Empirically Revisiting the Test Independence Assumption

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Empirically Revisiting the Test Independence Assumption Sai Zhang, DarioushJalali, JochenWuttke, KıvançMuşlu, Wing Lam, Michael D. Ernst, David Notkin University of Washington

  2. Order dependence Dependent test Two tests: readFile(“foo”) ... createFile(“foo”) ... Executing them in default order: (the intended test results) Executing them in a different order:

  3. Use the default execution order as baseline Visible test result rather than internal program state Dependent test A test that yields adifferenttest result than the defaultresult in a reordered subsequence of the original test suite. readFile(“foo”) ... createFile(“foo”) ... Executing them in default order: (test results by design) Executing them in different orders: Execute real tests rather than contrived ones

  4. Why should we care about test dependence? • Makes test behaviors inconsistent • Affects downstream testing techniques Test prioritization Test parallelization Test selection CPU 1 CPU 2

  5. Conventional wisdom: test dependence is not a significant issue • Test independence is assumed by: • Test selection • Test prioritization • Test parallel execution • Test factoring • Test generation • … 31 papers in ICSE, FSE, ISSTA, ASE, ICST, TSE, and TOSEM (2000 – 2013)

  6. Conventional wisdom: test dependence is not a significant issue • Test independence is assumed by: • Test selection • Test prioritization • Test parallel execution • Test factoring • Test generation • … Consider test dependence As a threat to validity 1 3 31 papers in ICSE, FSE, ISSTA, ASE, ICST, TSE, and TOSEM (2000 – 2013) 27 Assume test independence without justification

  7. Is the test independence assumption valid? No! • Does test dependence arise in practice? • What repercussions does test dependence have? • How to detect test dependence? Yes, in both human-written and automatically-generated suites • Inconsistent results: missed alarms and false alarms • Affecting downstream testing techniques • Proof: the general problem is NP-complete • Approximate algorithms based on heuristics work well

  8. Is the test independence assumption valid? • Implications: • Test independence should no longer be assumed • New challenges in designing testing techniques No! • Does test dependence arise in practice? • What repercussions does test dependence have? • How to detect test dependence? Yes, in both human-written and automatically-generated suites • Inconsistent results: missed alarms and false alarms • Affecting downstream testing techniques • Proof: the general problem is NP-complete • Approximate algorithms based on heuristics work well

  9. Is the test independence assumption valid? • Does test dependence arise in practice? • What repercussion does test dependence have ? • How to detect test dependence? Yes, in both human-written and automatically-generated suites • Inconsistent results: missed alarms and false alarms • Affecting downstream testing techniques • The general problem is NP-complete • Approximate algorithms based on heuristics work well

  10. Methodology Reported dependent tests New dependent tests 5 issue tracking systems 4 real-world projects

  11. Methodology Reported dependent tests • Search for 4 key phrases: • (“dependent test”, “test dependence”, • “test execution order”, “different test outcome”) • Manually inspect 450 matched bug reports • Identify 96 distinct dependent tests 5 issue tracking systems • Characteristics: • Manifestation • Root cause • Developers’ action

  12. Manifestation Number of tests involved to yield a different result … … #Tests = 1 (default order) #Tests = 2 (run in isolation) (run after another)

  13. Manifestation Number of tests involved to yield a different result 96 dependent tests

  14. Manifestation Number of tests involved to yield a different result #Tests = 1 Unknown 6 82% can be revealed by no more than 2 tests 15 2 #Tests = 3 73 #Tests = 2

  15. Root cause 96 dependent tests

  16. Root cause Unknown at least 61%are due to side-effectingaccess to static variables. 23 59 10 database static variable 4 file system

  17. Developers’ action • 98% of the reported tests are marked as major or minor issues • 91% of the dependence has been fixed • Improving documents • Fixing test code or source code

  18. Methodology New dependent tests • Human-written test suites • 4176 tests • Automatically-generated test suites • use Randoop [Pacheco’07] • 6330tests • Ran dependent test detection • algorithms (details later) 29dependent tests 354 dependent tests 4 real-world projects

  19. Characteristics • Manifestation: number of tests to yield a different result 29 manual dependent tests

  20. Characteristics • Manifestation: number of tests to yield a different result #Tests = 3 #Tests = 2 4 2 354 auto-generated dependent tests 29 manual dependent tests 23 #Tests= 1

  21. Characteristics • Manifestation: number of tests to yield a different result #Tests = 3 #Tests ≥2 #Tests = 2 4 2 186 29 manual dependent tests 168 23 #Tests= 1 #Tests = 1

  22. Characteristics • Manifestation: number of tests to yield a different result • Root cause • All because of side-effecting access of static variables #Tests = 3 #Tests ≥2 #Tests = 2 4 2 186 29 manual dependent tests 168 23 #Tests= 1 #Tests = 1

  23. Developers’ actions • Confirm all manual dependent tests • tests should always “stand alone”, that is “test engineering 101” • Merged two tests to remove the dependence • Opened a bug report to fix the dependent test • Wont fix the dependence, since it is due to the library design

  24. Is the test independence assumption valid? • Does test dependence arise in practice? • What repercussion does test dependence have ? • How to detect test dependence? Yes, in both human-written and automatically-generated suites • Inconsistent results: missed alarms and false alarms • Affecting downstream testing techniques • The general problem is NP-complete • Approximate algorithms based on heuristics work well

  25. Reported dependent tests 96 dependent tests 5 issue tracking systems

  26. Reported dependent tests Missed alarms 2 96 dependent tests 94 5 issue tracking systems False alarms

  27. Example false alarm void testDisplay() { //create a Displayobject … //dispose the Display object } void testShell() { //create a Displayobject … } In Eclipse, only one Display object is allowed. In default order: testDisplaytestShell In a non-default order: testShelltestDisplay Led to a false bug report that took developers 3 months to resolve.

  28. Example missed alarm Need to be set to “arg” before a client calls any method in the class. public final class OptionBuilder { static String argName = null; static void reset() { … argName = “arg”; } } • BugTest.test13666validates correct behavior. • This test should fail, • but passes when running in the default order • Another test calls reset() before this test Hid a bug for 3 years.

  29. Example missed alarm Need to be set to “arg” before a client calls any method in the class. public final class OptionBuilder { static String argName = null; static void reset() { … argName = “arg”; } } • BugTest.test13666validates correct behavior. • This test should fail, • but passes when running in the default order • Another test calls reset() before this test Hid a bug for 3 years.

  30. Example missed alarm Need to be set to “arg” before a client calls any method in the class. public final class OptionBuilder { static String argName = null; static void reset() { …… } static { argName = “arg”; } } Bug fix • BugTest.test13666validates correct behavior. • This test should fail, • but passes when running in the default order • Another test calls reset() before this test Hid a bug for 3 years.

  31. Test prioritization … … A test execution order A new test execution order Achieve coverage faster Improve fault detection rate … Each test should yield the same result.

  32. Five test prioritization techniques [Elbaum et al. ISSTA 2000] Total: 4176 manual tests 4 real-world projects • Record the number of tests yielding different results

  33. Evaluating test prioritization techniques Total: 4176 manual tests • Implication: • Existing techniques are not aware of test dependence

  34. Is the test independence assumption valid? • Does test dependence arise in practice? • What repercussion does test dependence have ? • How to detect test dependence? Yes, in both human-written and automatically-generated suites • Inconsistent results: missed alarms and false alarms • Affecting downstream testing techniques • The general problem is NP-complete • Approximate algorithms based on heuristics work well

  35. General problem of test dependence detection … … All dependent tests A test suite • NP-Complete • Proof: reducing the Exact Cover problem to • the dependent test detection problem

  36. Detecting dependent tests in a test suite • Approximate algorithms • Reversal algorithm • Randomized execution • Exhaustive bounded algorithm • Dependence-aware bounded algorithm … … All dependent tests A test suite All algorithms are sound but incomplete

  37. Approximate algorithms by heuristics • Reversal algorithm • Randomized execution • Exhaustive bounded algorithm • Dependence-aware bounded algorithm Intuition: changing order of each pair may expose dependences

  38. Approximate algorithms by heuristics • Reversal algorithm • Randomized execution • Exhaustive bounded algorithm • Dependence-aware bounded algorithm Shuffle the execution order multiple times …

  39. Approximate algorithms by heuristics • Reversal algorithm • Randomized execution • Exhaustive bounded algorithm • Dependence-aware bounded algorithm Executes all k-permutations for a bounding parameter k k= 2 Most dependent tests can be found by running short test subsequences (82% of the dependent tests are revealed by no more than 2 tests)

  40. Approximate algorithms by heuristics • Reversal algorithm • Randomized execution • Exhaustive bounded algorithm • Dependence-aware bounded algorithm x y write write read k= 2 Record read/write info for each test Filter away unnecessary permutations

  41. Evaluating approximate algorithms Finding New dependent tests • Human-written test suites • 4176 tests • Automatically-generated test suites • use Randoop [Pacheco’07] • 6330tests 29dependent tests 354 dependent tests 4 real-world projects

  42. Evaluating approximate algorithms Estimated cost Actual cost Shuffle 1000 times k = 2 (did not finish for some programs)

  43. Evaluating approximate algorithms Cheap and detects half of the dependent tests! Find all dependences within a bound, but computationally infeasible. Detects the most dependent tests.

  44. Related work • Existing definitions of test dependence • Based on program state change [Kapfhammer’03] • Informal definitions [Bergelson’06] Our definition focuses on the concrete test execution result. Program state change may not affect test execution result. • Flaky tests [Luo et al’14, Google testing blog] • Tests revealing inconsistent results Dependent test is a special type of flaky test. • Tools supporting to execute tests in different orders • JUnit 4.1: executing tests in alphabetical order by name • DepUnit, TestNg: supporting specifying test execution order Do not support detecting test dependence.

  45. Contributions • Revisiting the test independence assumption • Test dependence arises in practice • Test dependence has non-trivial repercussions • Test dependence detection is NP-complete • Heuristic algorithms are effective in practice • Our tool implementation http://testisolation.googlecode.com Test independence should no longer be assumed!

  46. [Backup slides]

  47. Why not run each test in a separate process? • Implemented in JCrasher • Supported in Ant + JUnit • Unacceptably high overhead • 10 – 138 X slowdown • Recent work merges tests running in separate processes into a single one [Bell & Kaiser, ICSE 2014]

  48. Why more dependent tests in automatically-generated test suites? • Manual test suites: • Developer’s understanding of the code and their testing goals help build well-structured tests • Developers often try to initialize and destroy the shared objects each unit test may use • Auto test suites: • Most tools are not “state-aware” • The generated tests often “misuse” APIs, e.g., setting up the environment incorrectly • Most tools can not generate environment setup / destroy code

  49. What is the default test execution order? • The intended execution order as designed • Specified by developers • Such as, in make file, ant file, or TestAll.java • Lead to the intended results as developers want to see

  50. Dependent tests vs. Nondeterministic tests • Nondeterminism does not imply dependence • A program may execute non-deterministically, but its tests may deterministically succeed. • Test dependence does not imply nondeterminism • A program may have no sources of nondeterminism, but its tests can still be dependent on each other

More Related