300 likes | 483 Views
An Experimental Evaluation on Reliability Features of N-Version Programming. Teresa Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005 November 10, 2005. Outline. Introduction Motivation Experimental evaluation Fault analysis Failure probability Fault density Reliability improvement
E N D
An Experimental Evaluation on Reliability Features of N-Version Programming Teresa Cai, Michael R. Lyu and Mladen A. Vouk ISSRE’2005 November 10, 2005
Outline • Introduction • Motivation • Experimental evaluation • Fault analysis • Failure probability • Fault density • Reliability improvement • Discussions • Conclusion and future work
Introduction • N-version programming (NVP) is one of the main techniques for software fault tolerance • It has been adopted in some mission-critical applications • Yet, its effectiveness is still an open question • What is reliability enhancement? • How does the fault correlation between multiple versions affect the final reliability?
Introduction (cont’) • Empirical and theoretical investigations have been conducted based on experiments, modeling, and evaluations • Avizienis and Chen (1977), Knight and Leveson (1986), Kelly and Avizienis (1983), Avizienis, Lyu and Schuetz (1988), Eckhardt et al (1991), Lyu and He (1993) • Eckhardt and Lee (1985), Littlewood and Miller (1989), Popov et al. (2003) • Belli and Jedrzejowicz (1990), Littlewood. et al (2001), Teng and Pham (2002) • No conclusive reliability estimation can be made because of the size, population, complexity and comparability of these experiments
Research questions • What is the reliability improvement of NVP? • Is fault correlation a big issue that will affect the final reliability? • What kind of empirical data can be comparable with previous investigations?
Motivation • To address the reliability and fault correlation issues in NVP • To conduct a comparable experiment with previous empirical studies • To investigate the “variant” and “invariant” features in NVP
Experimental background • Some features about the experiment • Complexity • Large population • Well-defined • Statistical failure and fault records • Previous empirical studies • UCLA Six-Language project • NASA 4-University project • Knight and Leveson’s experiment • Lyu-He study
Experimental setup • RSDIMU avionics application • 34 program versions • A team of 4 students • Comprehensive testing exercised • Acceptance testing: 800 functional test cases and 400 random test cases • Operational testing: 100,000 random test cases • Failures and faults collected and studied • Qualitative as well as quantitative comparisons with NASA 4-University project performed
Experimental description • Geometry • Data flow diagram
Comparisons between the two projects • Qualitative comparisons • General features • Fault analysis in development phase & operational test • Quantitative comparisons • Failure probability • Fault density • Reliability improvement
Fault analysis in development phase • Common related faults • Display module (easiest part) • Calculation in wrong frame of reference • Initialization problems • Missing certain scaling computation • Faults in NASA project only • Division by zero • Incorrect conversion factor • wrong coordinate system problem.
Fault analysis in development phase (cont’) • Both cause and effect of some related faults remain the same • Related faults occurred in both easy and difficult subdomains • Some common problems, e.g., initialization problem, exist for different programming languages • The most fault-prone module is the easiest part of the application
Faults in operational test (cont’) • These faults are all related to the same module, i.e., sensor failure detection and isolation problem • Fault pair (34.2 & 22.1) : 25 coincidence failures • Fault pair (34.3 & 29.1) : 32 coincidence failures • Yet these two pairs are quite different in nature • Version 34 shows the lowest quality • Poor program logic and design organization • Hard coding • The overall performance of NVP derived from our data would be better if the data from version 34 are ignored
Input/Output domain classification • Normal operations are classified as: Si,j = {i sensors previously failed and j of the remaining sensors fail | i = 0, 1, 2; j = 0, 1 } • Exceptional operations: Sothers
Failures in operational test • States S0,0, S1,0 and S2,0 are more reliable than states S0,1, S1,1, S2,1 • Exceptional state reveals most of the failures • The failure probability in S0,1 is the highest • The programs inherit high reliability on average
Coincident failures • Two or more versions fail at the same test case, whether the outputs identical or not • The percentage of coincident failures versus total failures is low: • Version 22: 25/618=4% • Version 29: 32/2760=1.2% • Version 34: (25+32)/1351=4.2%
Fault density • Six faults identified in 4 out of 34 versions • The size of these versions varies from 1455 to 4512 source lines of code • Average fault density: • one fault per 10,000 lines • It is close to industry-standard for high quality software systems
Failure bounds for 2-version system • Lower and upper bounds for coincident failure probability under Popov et al model • DP1: normal test cases without sensor failures dominates all the testing cases • DP3: the test cases evenly distributed in all subdomains • DP2: between DP1 & DP3
Quantitative comparison in operational test • NASA 4-university project: 7 out of 20 versions passed the operational testing • Coincident failures were found among 2 to 8 versions
Observations • The difference on fault number and fault density is not significant • In NASA project: • The number of failures and coincident failures in NASA project is much higher • Although there is coincident failures in 2- to 8-version combinations, the reliability improvement for 3-version system still achieves 80~330 times better • In our project: • Average failure rate is 50 times better • The reliability improvement for 3-version system is 30~60 times better
Invariants • Reliable program versions with low failure probability • Similar number of faults and fault density • Distinguishable reliability improvement for NVP, with 102 to 104 times enhancement • Related faults observed in both difficult and easy parts of the application
Variants • Compared with NASA project, our project: • Some faults not observed • Less failures • less coincident failures • Only 2-version coincident failures • The overall reliability improvement is an order of magnitude larger
Discussions • The improvement of our project may attributed to • stable specification • better programming training • experience in NVP experiment • cleaner development protocol • different programming languages & platforms
Discussions (cont’) • The hard-to-detected faults are only hit by some rare input domains • New testing strategy is needed to detect such faults: • Code coverage? • Domain analysis?
Conclusions • An empirical investigation is performed to evaluate reliability features by a comprehensive comparison between two NVP projects • NVP can provide significant improvement for final reliability according to our empirical study • Low number of coincident failures provides a supportive evidence for NVP • Possible attributes that may affect the NVP reliability improvement are discussed
Future work • Apply more intensive testing on both Pascal and C programs • Conduct cross-comparison on these program versions developed by different programming languages • Investigate the reliability enhancement of NVP based on the combined set of program versions