1 / 30

CS590Z Statistical Debugging

CS590Z Statistical Debugging. Xiangyu Zhang (part of the slides are from Chao Liu). A Very Important Principle. Traditional debugging techniques deal with single (or very few) executions.

base
Download Presentation

CS590Z Statistical Debugging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS590Z Statistical Debugging Xiangyu Zhang (part of the slides are from Chao Liu)

  2. A Very Important Principle • Traditional debugging techniques deal with single (or very few) executions. • With the acquisition of a large set of executions, including passing and failing executions, statistical debugging is often highly effective. • Failure reporting • In house testing

  3. Tarantula (ASE 2005, ISSTA 2007)

  4. Scalable Remote Bug Isolation (PLDI 2004, 2005) • Look at predicates • Branches • Function returns (<0, <=0, >0, >=0, ==0, !=0) • Scalar pairs • For each assignment x=…, find all variables y_i and constants c_j, each pair of x (=,<,<=…) y_i/c_j • Sample the predicate evaluations (Bernoulli sampling) • Investigate the relation of the probability of a predicate being true with the bug manifestion.

  5. Bug Isolation

  6. Bug Isolation How much does P being true increase the probability of failure over simply reaching the line P is sampled.

  7. An Example void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if (m >= 0){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } • Symptoms • 563 lines of C code • 130 out of 5542 test cases fail to give correct outputs • No crashes • The predicate are evaluated to both true and false in one execution not enough

  8. void subline(char *lin, char *pat, char *sub) { int i, lastm, m; lastm = -1; i = 0; while((lin[i] != ENDSTR)) { m = amatch(lin, i, pat, 0); if ((m >= 0) && (lastm != m) ){ putsub(lin, i, m, sub); lastm = m; } if ((m == -1) || (m == i)){ fputc(lin[i], stdout); i = i + 1; } else i = m; } } P_f (A) = tilde P (A | A & !B) P_t (A) = tilde P (A | !(A&!B))

  9. Program Predicates • A predicate is a proposition about any program properties • e.g., idx < BUFSIZE, a + b == c, foo() > 0 … • Each can be evaluated multiple times during one execution • Every evaluation gives either true or false • Therefore, a predicate is simply a boolean random variable, which encodes program executions from a particular aspect.

  10. Evaluation Bias of Predicate P • Evaluation bias • Def’n: the probability of being evaluated as true within one execution • Maximum likelihood estimation: Number of true evaluations over the total number of evaluations in one run • Each run gives one observation of evaluation bias for predicate P • Suppose we have n correct and m incorrect executions, for any predicate P, we end up with • An observation sequence for correct runs • S_p = (X’_1, X’_2, …, X’_n) • An observation sequence for incorrect runs • S_f = (X_1, X_2, …, X_m) • Can we infer whether P is suspicious based on S_p and S_f?

  11. Prob Prob 0 1 Evaluation bias 0 1 Evaluation bias Underlying Populations • Imagine the underlying distribution of evaluation bias for correct and incorrect executions are and • S_p and S_f can be viewed as a random sample from the underlying populations respectively • One major heuristic is • The larger the divergence between and , the more relevant the predicate P is to the bug

  12. Prob Prob 0 1 Evaluation bias 0 1 Evaluation bias Major Challenges • No knowledge of the closed forms of both distributions • Usually, we do not have sufficient incorrect executions to estimate reliably.

  13. Our Approach

  14. Algorithm Outputs • A ranked list of program predicates w.r.t. the bug relevance score s(P) • Higher-ranked predicates are regarded more relevant to the bug • What’s the use? • Top-ranked predicates suggest the possible buggy regions • Several predicates may point to the same region • … …

  15. Outline • Program Predicates • Predicate Rankings • Experimental Results • Case Study: bc-1.06 • Future Work • Conclusions

  16. Experiment Results • Localization quality metric • Software bug benchmark • Quantitative metric • Related works • Cause Transition (CT), [CZ05] • Statistical Debugging, [LN+05] • Performance comparisons

  17. Bug Benchmark • Bug benchmark • Dreaming benchmark • Large number of known bugs on large-scale programs with adequate test suite • Siemens Program Suite • 130 variants of 7 subject programs, each of 100-600 LOC • 130 known bugs in total • mainly logic (or semantic) bugs • Advantages • Known bugs, thus judgments are objective • Large number of bugs, thus comparative study is statistically significant. • Disadvantages • Small-scaled subject programs • State-of-the-art performance, so far claimed in literature, • Cause-transition approach, [CZ05]

  18. Localization Quality Metric [RR03]

  19. 1st Example 1 10 2 6 3 5 4 7 9 8 T-score = 70%

  20. 2nd Example 1 10 2 6 3 5 4 7 9 8 T-score = 20%

  21. Related Works • Cause Transition (CT) approach [CZ05] • A variant of delta debugging [Z02] • Previous state-of-the-art performance holder on Siemens suite • Published in ICSE’05, May 15, 2005 • Cons: it relies on memory abnormality, hence its performance is restricted. • Statistical Debugging (Liblit05) [LN+05] • Predicate ranking based on discriminant analysis • Published in PLDI’05, June 12, 2005 • Cons: Ignores evaluation patterns of predicates within each execution

  22. Localized bugs w.r.t. Examined Code

  23. Cumulative Effects w.r.t. Code Examination

  24. Top-k Selection • Regardless of specific selection of k, both Liblit05 and SOBER are better than CT, the current state-of-the-art holder • From k=2 to 10, SOBER is better than Liblit05 consistently

  25. Outline • Evaluation Bias of Predicates • Predicate Rankings • Experimental Results • Case Study: bc-1.06 • Future Work • Conclusions

  26. Case Study: bc 1.06 • bc 1.06 • 14288 LOC • An arbitrary-precision calculator shipped with most distributions of Unix/Linux • Two bugs were localized • One was reported by Liblit in [LN+05] • One was not reported previously • Some lights on scalability

  27. Outline • Evaluation Bias of Predicates • Predicate Rankings • Experimental Results • Case Study: bc-1.06 • Future Work • Conclusions

  28. Future Work • Further leverage the localization quality • Robustness to sampling • Torture on large-scale programs to confirm its scalability to code size • …

  29. Conclusions • We devised a principled statistical method for bug localization. • No parameter setting hassles • It handles both crashing and noncrashing bugs. • Best quality so far.

  30. Discussion • Features • Easy implementation • Difficult experimentation • More advanced statistical technique may not be necessary • Go wide, not go deep… • Predicates are treated as independent random variables. • Can execution indexing help? • Can statistical principles be combined with slicing or IWIH ?

More Related