260 likes | 368 Views
An Experimental Evaluation on Reliability Features of N-Version Programming. Presented by Onur TÜRKYILMAZ. Authors Xia Cai, Michael R. Lyu and Mladen A. Vouk I nternational Symposium on Software Reliability Engineering 2005 (ISSRE’05). Outline. Introduction Motivation
E N D
An Experimental Evaluation on Reliability Features of N-Version Programming Presented by Onur TÜRKYILMAZ Authors Xia Cai, Michael R. Lyu and Mladen A. Vouk International Symposium on Software Reliability Engineering 2005 (ISSRE’05)
Outline • Introduction • Motivation • Experimental evaluation • Fault analysis • Failure probability • Fault density • Reliability improvement • Discussions • Conclusion
Introduction • N-version programming is one of the main techniques for software fault tolerance • It has been adopted in some mission-critical applications • Yet, its effectiveness is still an open question • What is reliability enhancement? • Is the fault correlation between multiple versions a big issue that affects the final reliability?
Research questions • What is the reliability improvement of NVP? • Is fault correlation a big issue that will affect the final reliability? • What kind of empirical data can be comparable with previous investigations?
Motivation • To address the reliability and fault correlation issues in NVP • To conduct a comparable experiment with previous empirical studies • To investigate the “variant” and “invariant” features in NVP
Experimental background • Some features about the experiment • Complexity • Large population • Well-defined • Statistical failure and fault records • Previous empirical studies • UCLA Six-Language project • NASA 4-University project • Knight and Leveson’s experiment • Lyu-He study
Experimental setup • RSDIMU avionics application • 34 program versions • A team of 4 students • Comprehensive testing exercised • Acceptance testing: 800 functional test cases and 400 random test cases • Operational testing: 100,000 random test cases • Failures and faults collected and studied • Qualitative as well as quantitative comparisons with NASA 4-University project performed
Experimental description • Geometry - estimating the vehicle acceleration using eight redundant accelerometers (sensors) - sensors mounted on the four triangular faces of a semioctahedron
Comparisons between the two projects • Qualitative comparisons • General features • Fault analysis in development phase & operational test • Quantitative comparisons • Failure probability • Fault density • Reliability improvement
Fault analysis in development phase • Common related faults • Display module (easiest part) • Calculation in wrong frame of reference • Initialization problems • Missing certain scaling computation • Faults in NASA project only • Division by zero • Incorrect conversion factor • wrong coordinate system problem.
Fault analysis in development phase (cont’) • Both cause and effect of some related faults remain the same • Related faults occurred in both easy and difficult subdomains • Some common problems, e.g., initialization problem, exist for different programming languages • The most fault-prone module is the easiest part of the application
Input/Output domain classification • Normal operations are classified as: Si,j = {i sensors previously failed and j of the remaining sensors fail | i = 0, 1, 2; j = 0, 1 } • Exceptional operations: Sothers
Failures in operational test • States S0,0, S1,0 and S2,0 are more reliable than states S0,1, S1,1, S2,1 • Exceptional state reveals most of the failures • The failure probability in S0,1 is the highest • The programs inherit high reliability on average
Coincident failures • Two or more versions fail at the same test case, whether the outputs identical or not • The percentage of coincident failures versus total failures is low: • Version 22: 25/618=4% • Version 29: 32/2760=1.2% • Version 32: (25+32)/1351=4.2%
Failure bounds for 2-version system • Lower and upper bounds for coincident failure probability under Popov et al model • DP1: normal test cases without sensor failures dominates all the testing cases • DP3: the test cases evenly distributed in all subdomains • DP2: between DP1 & DP3
Quantitative comparison in operational test • NASA 4-university project: 7 out of 20 versions passed the operational testing • Coincident failures were found among 2 to 8 versions • 5 out of 7 faults were not observed in our project
Invariants • Reliable program versions with low failure probability • Similar number of faults and fault density • Distinguishable reliability improvement for NVP, with 102 to 104 times enhancement • Related faults observed in both difficult and easy parts of the application
Variants • Compared with NASA project, our project: • Some faults not observed • Less failures • less coincident failures • Only 2-version coincident failures (other than 2- to 8- version failures) • The overall reliability improvement is an order of magnitude larger
Discussions • The improvement of the project may attributed to • stable specification • better programming training • experience in NVP experiment • cleaner development protocol • different programming languages & platforms
Discussions (cont’) • The hard-to-detected faults are only hit by some rare input domains • New testing strategy is needed to detect such faults: • Code coverage? • Domain analysis?
Conclusion • An empirical investigation is performed to evaluate reliability features by a comprehensive comparisons on two NVP projects • NVP can provide distinguishable improvement for final reliability according to the empirical study conducted • Small number of coincident failures provides a supportive evidence for NVP • Possible attributes that may affect the reliability improvement are discussed
Thank you ! Q & A