Reliability Modeling for Design Diversity: A Review and Some Empirical Studies

Reliability Modeling for Design Diversity: A Review and Some Empirical Studies Teresa Cai Group Meeting April 11, 2006

Outline • Introduction • Reliability modeling on design diversity • Empirical Studies • Possible directions • Conclusion

... -> Defects -> Faults -> Errors -> Failures ->... Reliability Availability Safety Security Fault (Defect) Fault (Failure) Fault Fault (Error) Avoidance Removal Tolerance Prediction Software reliability engineering techniques and their application domains

Introduction • Software fault tolerance adopts two main techniques on top of fault avoidance and fault removal • Single version techniques: • checkpointing and recovery; • Exception handling; • Data diversity • Multiple version techniques: • Recovery blocks • N-version programming • N-version self-checking programming

Recovery Input A.T. Version 1 Decision Version Result Version 2 A.T. Output Function Version N A.T. Fault N-Version Programming (NVP) fault tolerantsoftware architecture Examples: N-Version Programming (NVP)

Introduction • The rationale is the expectation that software components built differently will fail differently • The probability of coincident failures in multiple versions remains the key issue in design diversity • Reliability models attempt to the modeling of reliability and fault correlations in diverse systems • Empirical data are highly demanded for evaluation and cross-validation of the usefulness and/or effectiveness of these models

Current Reliability Modeling • Eckhardt and Lee (1985) • Variation of difficulty on demand space • Positive correlations between version failures • Littlewood and Miller (1989) • Forced design diversity • Possibility of negative correlations • Dugan and Lyu (1995) • Markov reward model • Tomek and Trivedi (1995) • Stochastic reward net • Popov, Strigini et al (2003) • Subdomains on demand space • Upper bounds and “likely” lower bounds for reliability Conceptual models Structural models In between

Eckhardt and Lee Model • Assumption: • Failures of an individual program π are deterministic and a program version either fails or succeeds for each input value x; • There is a randomness due to the development process. P(π) is the probability that a particular version π will be produced from the set of all possible program version Π • There is a randomness due to the demands in operation. P(x): probability of selection of a given input demands x in the set of all possible demands X.

Eckhardt and Lee Model • Score function • ω(x) = 0: program πsucceeds for input x • ω(x) = 1: program πfails for input x • Difficulty function: the average probability of a program version failing on a given demand

Eckhardt and Lee Model • The average probability of failure per demand (pfd) of a randomly chosen single version: • The average pfd of randomly chosen pair of program versions:

Eckhardt and Lee Model • If for process A and B, the difficulty functions are identical and constant: • Otherwise, it is always the case that:

Littlewood and Miller Model • Assumption: the same as EL model • The LM model generalizes the EL model to take account to forced diversity by defining different distributions over the population of all program.

Littlewood and Miller Model • Independence: Cov(A<B) = 0 • Positive correlated: Cov(A<B) > 0 • Negative correlated: Cov(A<B) < 0 • Basic intuition: • What you find difficult, I may find easy (or at least easier) • Forced diversity: diverse processes and techniques are employed to force the diversity of final program versions

Popov and Strigini Model • Alternative estimates for probability of failures on demand (pfd) of a 1-out-of-2 system

Popov and Strigini Model • Upper bound of system pfd • “likely” lower bound of system pfd - under the assumption of conditional independence

Empirical studies • Various projects have been conducted to investigate and evaluate the effectiveness of design diversity • evaluations on the effectiveness and cost issues of the final product of diverse systems • Avizienis and Chen (1977) • Knight and Leveson (1986) • NASA-4 University project (1990) • experiments evaluating the design process of diverse systems • Avizienis (1995) • adoption of design diversity into different aspects of software engineering practice • Popov & Strigini (2003)

Recent simple empirical studies on difficulty functions • From Center for Software Reliability, City University • First experiment: “Online Judge” • On-line spec / submission / testing • 3444 C program versions • Source code: about 20 • Input domain: 2500 input pairs • Second experiment: ACM program contest • Contest Host problem • 2666 program versions fall into 34 equivalence classes, while 5 classes dominates 98% of the population • Input domain: 40401 input pairs

First experiment Difficulty functions Expected pdfs Dominated faults are spec faults

Second experiment Difficulty functions with / without spec faults for initial release

Second experiment • Difficulty functions without spec faults

Second experiment • Comparison of expected probability of failure on demand

Second experiment • Observation • Difficulty functions are relative flat • The increase in the pfd of a diverse pair relative to the independence assumption is small for all population • Limitations • Simple programs • The input domain may not be representative for real world application

Our on-going work • Two program pools on RSDIMU project • C vs. PASCAL • 34 versions vs. 20 versions • 100 million test cases generated • Fault correlation analysis in and between: • 34 C program versions • 20 Pascal program versions • Between the two

Our objective • Is there any difference between the fault correlation in two different program pools? • Does diversity (different programming languages and process) have any influence on the fault correlation of the final programs? • What’s the benefit or loss in diversity with this empirical study?

Reliability Modeling for Design Diversity: A Review and Some Empirical Studies