Delta Debugging

Politecnico di Milano Delta Debugging An advanced debugging technique Authors: Carlo Curino, Alessandro Giusti Curino, Giusti

Motivations • Reducing faults: • 50%-80% of total cost • Debugging: • One of the hardest, yet least systematic activities of software engineering • most time-consuming • Locating faults: • most difficult Curino, Giusti

Overview • Which problems are solved by Delta Debugging • Four solutions: a common approach • Simplifying failure-inducing input • Isolating failure-inducing thread schedule • Identifying failure-inducing changes in the code • Isolating Cause-Effect Chains Curino, Giusti

Failure-inducing input • This HTML input makes Mozilla crash (segmentation fault). Which portion is the failure-inducing one? Curino, Giusti

Thread scheduling • The result of a multithread program seems not deterministic. Why it happens? Curino, Giusti

Code changes • The old version of GDB works with DDD, the new one doesn’t! • 178.000 lines of code have been modified between the two versions where’s the bug? Curino, Giusti

Cause-effect chain • Which part of the program state is involved in the failure? Curino, Giusti

Four solutions: a single approach • The underlying problem is: • Find which part of something determines the failure So a common strategy can be applied: • Divide et impera applied to deltas between: • Working and failing Inputs • Working and failing code versions • Working and failing threads schedules • Working and failing program states This allows: • Efficient and automatic debugging procedure Curino, Giusti

Common terminology • A test case can either: • Fail • (The failure shows up) • Pass • (program runs properly) • Be Unspecified • (different problems arise) • Delta debugging Algorithms iteratively: • Apply changes (to input, code, schedule or state) • Run tests Curino, Giusti

Common terminology (2) • Concept of difference: • A really general delta between something in 2 test cases • Examples: • Difference in the input: different character (or bit) in the input stream • Difference in thread schedule: difference in the time a given thread switch is performed • Difference in the code: different statement in 2 version of a program • Difference in the program state: different values of the internal variables of a program Curino, Giusti

Simplifying Failure-inducing input Curino, Giusti

Minimizing vs Isolating • Minimizing (ddmin algorithm): • Slower • More human friendly • Isolating (dd algorithm): • Generalization of the ddmin algorithm • Faster • Good to generate the input of the cause-effect chain DD Curino, Giusti

Minimizing: Mozilla bug • Minimizing: • 57 test to simplify the 896 line HTML input to the “<SELECT>” tag that causes the crash • Each character is relevant (as shown from line 20 to 26) • Only removes deltas from the failing test • Returns a n-minimal (global minimum is NP) input that causes a failure Curino, Giusti

Minimizing: didactic example Curino, Giusti

Isolating: Mozilla bug • Isolating: • Only 7 tests (instead of 26) • Removes deltas from the failing test and add deltas to passing test • Isolates a single delta “<” that makes the failure to go away • Returns the 2 nearest input on failing and the other passing Curino, Giusti

General DD Algorithm Initial Fail Differences Initial Pass Curino, Giusti

What if we remove these diff from current failing test? General DD Algorithm Initial Fail Differences Initial Pass Curino, Giusti

General DD Algorithm Initial Fail Differences Failure disappears: “Move up” Initial Pass Curino, Giusti

What if we remove these diff? General DD Algorithm Initial Fail Differences Initial Pass Curino, Giusti

General DD Algorithm Initial Fail UNRESOLVED TEST: “Increase Granularity” Differences Initial Pass Curino, Giusti

General DD Algorithm Initial Fail What if we remove these diff from current failing test? Differences Initial Pass Curino, Giusti

General DD Algorithm Initial Fail Still Fails: “Move Down” Differences Initial Pass Curino, Giusti

Formally: the Algorithm Curino, Giusti

Efficiency considerations • The worst case: |k|2 + 3|k| tests (k=cardinality of the change set) • all test cases are unresolved except the last one • very unlikely • The best case: 2*log|k| • Try to avoid unresolved tests outcomes • Lexical, syntactical knowledge about input Curino, Giusti

DEMO Eclipse Plugin Live Demo Curino, Giusti

Thread Scheduling • The behavior of a multithreaded program may depend on the schedule. Curino, Giusti

DD applied to Thread Scheduling • Debug is even harder here: • Thread switches and schedules are nondeterministic • It is difficult to reproduce and isolate failures • Goal: • Relate failure to a small set of relevant differences from passing and failing schedules • Again a “purely experimental approach”, no need to understand the program Curino, Giusti

Purely experimental: Pros and Cons • Pros: • program treated as a black box:requires only to execute the program • Failure: an arbitrary behaviour of the program. Requires only to distinguish failure from success. • Cons: • (w.r.t static analysis) Test-based: can not determine properties for all runs of a program like the general absence of deadlocks • require an observable failure Curino, Giusti

Dejavu tool • Tool: Dejavu (DEterministic JAVa replay Utility) by IBM • Reproduce of schedules and induced failures • Exploiting Dejavu • the Thread Schedule becomes an input • We can generate schedules by mixing 1 running schedule and 1 failing schedule Curino, Giusti

Differences in thread scheduling • Starting point: • Passing run • Failing run • Differences (for t1): • t1 occurs in at time 254 • t1 occurs in at time 278 • ∆1 = |278 − 254| induces a statement interval: the code executed between time 254 and 278 Curino, Giusti

Differences in thread scheduling • We can build further test cases mixing the two schedule to isolate the relevant differences Curino, Giusti

Real life test: setting • Test #205 of the SPEC JVM98 Java test suite • Modification of the raytracer program to a multi-threaded version • Introduction of a simple race condition • Implementation of an automated test that checks failure/passing • Generation of random schedules to find a passing schedule and a failing schedule • Differences between the passing and failing schedule: • 3,842,577,240 differences • Each diff moves thread switch time to +1 or -1 Curino, Giusti

Real life test: results • DD isolate one single difference after 50 test (about 28 min) Curino, Giusti

Real life test: pin-point the failure • The failure occurs if and only if thread switch #33 occurs at yield point (safe point like function invocation) 59,772,127 (instead of 59,772,126) • at 59,772,127 line 91 is the first yield point after the initialization of OldScenesLoaded • At 59,772,126 line 82 is the yield point just before the initialization of OldScenesLoaded Curino, Giusti

Real life test: conclusion • Delta Debugging is efficient • even when applied to very large thread schedules (>3,000,000,000 diff) • No analysis is required as Delta Debugging relies on experiments alone • only the schedule was observed and altered • failure-inducing thread switch is easily associated with code • Alternate runs are obtained automatically • by generating random schedules • only one initial run (pass or fail) is required Curino, Giusti

Code changes • A given revision of a program behaves correctly. The next one does not. • Find which of the changes in the code causes the problem. • Inconvent when difference == thousands of lines of code Curino, Giusti

The manual solution • Binary search through the revision history  Regression containment • Does not always work: • Multiple changes that cause the failure only when combined (interference) • A single change can amount to many code lines (granularity) • Mixing parallel developement branches originates inconsistency problems Curino, Giusti

Procedure • Developed in 1999: some differences with current general DD algorithms. • Consider the differences between the working and failing revisions. • Ignore any knowledge about the temporal ordering of the changes. • Goal: find a minimal failure-inducing change set. Curino, Giusti

Inconsistencies • Mixing code changes regardless of their ordering originates lots of tests with “Unresolved” outcome: • Integration failure • Construction failure • Execution failure • They increase complexity of the DD algorithm! Curino, Giusti

Future work • Group related changes (partly done)  less inconsistent trials. • Common change dates/sources • Location criteria • Lexical criteria • Syntactic criteria (common funcions/modules) • Semantic criteria Curino, Giusti

Cause-Effect Background • A bit of background: • A program state is represented by variable values, and references. Curino, Giusti

Background (2) • While the program runs, the state evolves. • We assume the program is • Deterministic • Not interactive  identical states at identical times have identical evolutions. Curino, Giusti

Idea: apply DD to program states. • We need two distinct runs: • one failing • one passing • We want the two runs to be (initially) as much similar as possibile. • If we let the two runs evolve in parallel, their initial state will be similar. • Isolating failure-inducing input can help. • Apply DD to different "slices" of the program evolution. (A sort of TAC for computer routines). Curino, Giusti

Procedure • Iteratively • Build a new state mixing the passing and failing state. • Let the program evolve and see if it passes, fails, or does unrelated weird things (undefined outcome). • Isolate the smallest subset of the state relevant for the failure. • No news so far. But: • this happens at a specific moment of the program evolution. It will be repeated (e.g. at important functions' entry points). Curino, Giusti

The result • A cause-effect chain that leads to a failure. Curino, Giusti

The cause-effect chain • The initial states are absolutely legitimate: for example, direct consequence of a specific input that the program should handle.  intended program states. • The final effects are the failure.  faulty program states. • The error lies somewhere in the middle, when an intended program states evolves into a faulty one. Curino, Giusti

Fascinating terminology • A defect in the code originates an infection in the state. • The infection usually propagates as the program evolves. Curino, Giusti

Limits • No automatic discrimination of intended and faulty (infected) states! • The human user can increase resolution of slices, and pinpoint the code that evolves an INTENDED state to a FAULTY one.  Correct the error (== defect in the code) and break the cause-effect chain that leads to the failure. Curino, Giusti

Cause Transitions • Sometimes executing an instruction • a given variable ceases to be failure-inducing • others begin  the failure-inducing subset of the state changes (cause transition) • An algorithm can efficiently find cause transitions in cause-effect chains, by means of binary search (again). Curino, Giusti

Cause Transitions (2) Curino, Giusti

Delta Debugging

Delta Debugging

Presentation Transcript

Debugging

Delta Delta Delta

CS590Z Delta Debugging

Debugging

Debugging

Debugging

Delta Debugging - Demo

Debugging

Delta Debugging and Model Checkers for fault localization

Delta Debugging

Delta Delta Delta

Delta Debugging

Debugging

Debugging

Debugging !!! 

Debugging

Debugging

Debugging