Fault Diagnosis* of Software Systems

Fault Diagnosis* of Software Systems Rui Maranhão Dept. of Informatics Engineering Faculty of Engineering of University of Porto FEUP, 13Jan10 * aka automatic debugging

Fault Diagnosis* of Software Systems Rui Abreu Software Technology Dept. Delft University of Technology, NL PARC, 18Jul09 * aka automatic debugging

About the speaker… PhD, TUD ST, UU Philips Research Labs LESI, UM Ass. prof., FEUP Siemens, Porto

PhD defense (4Nov09) • picasaweb.google.com/rui.maranhao

Outline • Fault Diagnosis • Spectrum-Based Fault Localization • Statistics-based • Reasoning-based • Summary

Software Faults • Faults (bugs) have been around since the beginning of computer science • can have serious financial or life-threatening consequences

Fault Diagnosis Identify (SW) component(s) that are root cause of failure healthy f1 f2 f3 x y = f(x,h) faulty f4 f5 x, y: observation vectors f: system function, fj: component functions h: system health state vector, hj: component health vars Diagnose failure: solve inverse problem h = f-1(x,y) Diagnosis: h4 = fault state; or (h2 and h5) = fault state; or ..

Fault Diagnosis Approaches x x fault diagnosis: SFL x x x SW x x fault diagnosis: MBD x x x x: fault (bug, defect)

x y = f(x,h) f2 M2 f3 M3 f1 M1 M5 f5 f4 M4 =? Model-Based Diagnosis • suppose we have component models (Mi) of all fi: • can infer location(s) of failure (Model-Based Diagnosis) Pass/Fail y’ = M(x,h’) searchfor h’=h such that y’ consistent with y

x y = f(x,h) f2 2 f3 3 1 f1 5 f5 f4 4 =? Spectrum-based Fault Localization • suppose we only have trace on involvement of fi (and have M): • can infer location(s) of failure (Spectrum-Based Fault Localization) Pass/Fail y’ = M(x) trace correlatetrace with Pass/Fail test outcomes

MBD vs. SFL • MBD • Reasoning approach based on (propositional) models • High(est) diagnostic accuracy • Prohibitive (modeling and/or diagnosis) cost for [Emb] SW • SFL • Statistical approach based on spectra (traces) • Lower diagnostic accuracy: cannot reason over multiple faults • No modeling cost (except test oracle) + low diagnosis cost

Outline • Fault Diagnosis • Spectrum-based Fault Localization • Statistics-based • Reasoning-based • Summary

1 2 3 4 5 6 7 8 9 10 11 12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 SFL: Principle (1) 1 2 3 4 5 6 7 8 9 10 11 12 Not touched Touched, pass Touched, fail

1 2 3 4 5 6 7 8 9 10 11 12 3 1 1 0 1 3 3 2 3 1 3 3 2 0 0 2 1 2 2 0 2 1 0 0 SFL: Principle (7) 1 2 3 4 5 6 7 8 9 System components are ranked according to likelihood of causing the detected errors 10 11 12 Not touched Touched, pass Touched, fail

SFL Example (1) void RationalSort( int n, int *num, int *den ) { // block c1 int i,j; for ( i=n-1; i>=0; i-- ) { // block c2 for ( j=0; j<i; j++ ) { // block c3 if ( RationalGT( num[j], den[j], num[j+1], den[j+1] ) ) { // block c4 swap( &num[j], &num[j+1] ); /* swap( &den[j], &den[j+1] ); */ <== FAULT } } } }

SFL Example (2) intermediate error

SFL Example (3) (hit) spectrum c1 c2 c3c4 P/F (I1) 1 0 0 0 0 (P) (I2) 1 1 0 0 0 (P) (I3) 1 1 1 1 0 (P) (I4) 1 1 1 1 0 (P) (I5) 1 1 1 1 1 (F) (I6) 1 1 1 0 0 (P) n11: 1 1 1 1 n11: # hits where run is failing n10: 5 4 3 2 n10: # hits where run is passing n01: 0 0 0 0 n01: # misses where run is failing s: 1/6 1/5 1/4 1/3 s = n11 / (n11 + n10 + n01) (Jaccard similarity coefficient)

Shortcomings of using Statistics test suite: a b x P/F 1 0 1 F 0 1 1 F 1 0 0 F 0 1 0 F 1 1 1 P if (a == 1) y = f1(x); if (b == 1) y = f2(x); if (x == 0) y = f3(y); return y; f1(x) { /* c1 */ return x * 100; } f1(x) { /* c2 */ return x / 100; } f3(y) { /* c3 */ return y + 1; }

Similarity-based Approach c1 c2 c3 P/F 1 0 0 1 (F) 0 1 0 1 (F) 1 0 1 1 (F) 0 1 1 1 (F) 1 1 0 0 (P) n11: 2 2 2 n10: 1 1 0 n01: 1 1 1 s: 1/2 1/22/3 c3 ranked highest instead of c1, c2 !

Reasoning-based Approach (1) c1 c2 c3 P/F 1 0 0 1 (F) c1 must be faulty 0 1 0 1 (F) c2 cannot be single fault 1 0 1 1 (F) c3 cannot be single fault 0 1 1 1 (F) c2, c3 cannot be double fault 1 1 0 0 (P)

Reasoning-based Approach (2) c1 c2 c3 P/F 1 0 0 1 (F) 0 1 0 1 (F) c2 must be faulty 1 0 1 1 (F) c1 cannot be single fault 0 1 1 1 (F) c3 cannot be single fault 1 1 0 0 (P) c1, c3 cannot be double fault

Reasoning-based Approach (3) c1 c2 c3 P/F 1 0 0 1 (F) c1, c2 can be double fault 0 1 0 1 (F) 1 0 1 1 (F) 0 1 1 1 (F) 1 1 0 0 (P) Summary:c1, c2 faulty but not single-fault c1, c2 can be double-fault c1, c3 nor c2, c3 can be double-fault so {c1, c2} is the only diagnosis possible (subsuming the triple fault {c1, c2, c3})

Idea: Extend SFL with MBD • MBD • Reasoning approach based on generic model • High(er) diagnostic accuracy • Prohibitive (modeling and/or diagnosis) cost for [Emb] SW • SFL • Statistical approach based on spectra • Lower diagnostic accuracy: cannot reason over multiple faults • No modeling cost (except test oracle) + low diagnosis cost [DX’08, DX’09, SARA’09, IJCAI’09, ASE’09]

Spectrum-Based Reasoning Generic component model: j : hj => inp_okj => out_okj c1 c2 c3 P/F 1 0 0 1 (F) h1 0 1 0 1 (F) h2 h1h2 1 0 1 1 (F) h1h3 0 1 1 1 (F) h2h3 1 1 0 0 (P) diagnosis candidates (minimal hitting set) In this case only 1 diagnosis candidate {c1,c2} => correct diagnosis

Bayesian Probability Ranking Many diagnostic candidates dk (e.g., d1 = {c3}, d2 = {c4, c5}, ..) => Diagnosis = candidates ordered in terms of probability Bayes’ rule: Pr(dk|obsi) = ( Pr(obsi|dk) Pr(dk|obsi-1) ) / Pr(obsi) A priori Pr(dk) = p|dk| (1-pM-|dk|) where p fault probability of c Intermittent component failure model: gj [0,1] System failure model: failure when  1 comp. fails (Pr=(1-gj)) => Pr(obsi|dk) = 1 - j gj for fail; Pr(obsi|dk) = j gj for pass

Example Diagnosis c1 c2 c3 P/F 1 1 0 1 (F) d1 = h1h2 0 1 1 1 (F) ...d2 = h1h3 1 0 0 1 (F) 1 0 1 0 (P) Pr(obs|d1) = (1-g1g2) (1-g2) (1-g1) g1 MLE: maximize Pr(obs|d1) wrt g1,g2: g1 = .47, g2 = .19, Pr(d1) = 0.19 Pr(obs|d2) = (1-g1) (1-g3) (1-g1) g1g3 MLE: maximize Pr(obs|d2) wrt g1,g3: g1 = .41, g3 = .50, Pr(d2) = 0.04 => Diagnosis = < {c1,c2}, {c1,c3} > (i.e., start testing c1 and c2)

Implementation (Zoltar) system * * instrumented to obtain spectrum spectrum diagnostic engine * diagnosis pass/fail * Tarantula [GAT], Barinel [TUD]

Diagnostic Work - Metric • Wasted effort W • Excess work to find the faulty components • Independent of C • As opposed to measuring work • As an example, suppose • A M=10-component program, c1 and c2 are faulty • D = <{1,3}, {3,5}, {2,4}> • W = 2/10

Diagnostic Work vs. # Obs (5 faults / 20)

Case Studies (Siemens set)

Case Studies (PSC/NXP)

Outline • Fault Diagnosis • Spectrum-based Fault Localization • Statistics-based • Reasoning-based • Summary

Summary • Fault diagnosis critical success factor in developing SW • SFL viable approach to SW fault diagnosis • Bayesian reasoning extension yields more accuracy at marginal complexity increase • Work is appreciated by the scientific community • 28 research papers published and a best demo award @ ASE’09 • On its way: Java + eclipse plug-in + graphical display of results • Acknowledgments: Arjan van Gemund (TUD), Peter Zoeteweij (ex-TUD), Johan de Kleer (PARC), Wolfgang Mayer (UniSA), Markus Stumptner (UniSA), Rob Golsteijn (Philips/NXP Semiconductors), Hasan Sozer (UTwente), Mehmet Aksit (UTwente), ….

Help me! I’ve got a problem. We might lose the picture for a while… No Way, Not now!They’re about to score! I’ll recover for you!

Questions ?

Assignments • Debugging functional languages • Would these concepts apply to e.g., scala? • Visualizations for spreadsheets’ diagnostic reports • How to automatically detect errors? • Mobile Apps: testing and debugging • Android ; Windows Phone (2 different assignments) • SFL for software evolution

Fault Diagnosis* of Software Systems