240 likes | 354 Views
Reliability Modeling for Design Diversity: A Review and Some Empirical Studies. Teresa Cai Group Meeting April 11, 2006. Outline . Introduction Reliability modeling on design diversity Empirical Studies Possible directions Conclusion.
E N D
Reliability Modeling for Design Diversity: A Review and Some Empirical Studies Teresa Cai Group Meeting April 11, 2006
Outline • Introduction • Reliability modeling on design diversity • Empirical Studies • Possible directions • Conclusion
... -> Defects -> Faults -> Errors -> Failures ->... Reliability Availability Safety Security Fault (Defect) Fault (Failure) Fault Fault (Error) Avoidance Removal Tolerance Prediction Software reliability engineering techniques and their application domains
Introduction • Software fault tolerance adopts two main techniques on top of fault avoidance and fault removal • Single version techniques: • checkpointing and recovery; • Exception handling; • Data diversity • Multiple version techniques: • Recovery blocks • N-version programming • N-version self-checking programming
Recovery Input A.T. Version 1 Decision Version Result Version 2 A.T. Output Function Version N A.T. Fault N-Version Programming (NVP) fault tolerantsoftware architecture Examples: N-Version Programming (NVP)
Introduction • The rationale is the expectation that software components built differently will fail differently • The probability of coincident failures in multiple versions remains the key issue in design diversity • Reliability models attempt to the modeling of reliability and fault correlations in diverse systems • Empirical data are highly demanded for evaluation and cross-validation of the usefulness and/or effectiveness of these models
Current Reliability Modeling • Eckhardt and Lee (1985) • Variation of difficulty on demand space • Positive correlations between version failures • Littlewood and Miller (1989) • Forced design diversity • Possibility of negative correlations • Dugan and Lyu (1995) • Markov reward model • Tomek and Trivedi (1995) • Stochastic reward net • Popov, Strigini et al (2003) • Subdomains on demand space • Upper bounds and “likely” lower bounds for reliability Conceptual models Structural models In between
Eckhardt and Lee Model • Assumption: • Failures of an individual program π are deterministic and a program version either fails or succeeds for each input value x; • There is a randomness due to the development process. P(π) is the probability that a particular version π will be produced from the set of all possible program version Π • There is a randomness due to the demands in operation. P(x): probability of selection of a given input demands x in the set of all possible demands X.
Eckhardt and Lee Model • Score function • ω(x) = 0: program πsucceeds for input x • ω(x) = 1: program πfails for input x • Difficulty function: the average probability of a program version failing on a given demand
Eckhardt and Lee Model • The average probability of failure per demand (pfd) of a randomly chosen single version: • The average pfd of randomly chosen pair of program versions:
Eckhardt and Lee Model • If for process A and B, the difficulty functions are identical and constant: • Otherwise, it is always the case that:
Littlewood and Miller Model • Assumption: the same as EL model • The LM model generalizes the EL model to take account to forced diversity by defining different distributions over the population of all program.
Littlewood and Miller Model • Independence: Cov(A<B) = 0 • Positive correlated: Cov(A<B) > 0 • Negative correlated: Cov(A<B) < 0 • Basic intuition: • What you find difficult, I may find easy (or at least easier) • Forced diversity: diverse processes and techniques are employed to force the diversity of final program versions
Popov and Strigini Model • Alternative estimates for probability of failures on demand (pfd) of a 1-out-of-2 system
Popov and Strigini Model • Upper bound of system pfd • “likely” lower bound of system pfd - under the assumption of conditional independence
Empirical studies • Various projects have been conducted to investigate and evaluate the effectiveness of design diversity • evaluations on the effectiveness and cost issues of the final product of diverse systems • Avizienis and Chen (1977) • Knight and Leveson (1986) • NASA-4 University project (1990) • experiments evaluating the design process of diverse systems • Avizienis (1995) • adoption of design diversity into different aspects of software engineering practice • Popov & Strigini (2003)
Recent simple empirical studies on difficulty functions • From Center for Software Reliability, City University • First experiment: “Online Judge” • On-line spec / submission / testing • 3444 C program versions • Source code: about 20 • Input domain: 2500 input pairs • Second experiment: ACM program contest • Contest Host problem • 2666 program versions fall into 34 equivalence classes, while 5 classes dominates 98% of the population • Input domain: 40401 input pairs
First experiment Difficulty functions Expected pdfs Dominated faults are spec faults
Second experiment Difficulty functions with / without spec faults for initial release
Second experiment • Difficulty functions without spec faults
Second experiment • Comparison of expected probability of failure on demand
Second experiment • Observation • Difficulty functions are relative flat • The increase in the pfd of a diverse pair relative to the independence assumption is small for all population • Limitations • Simple programs • The input domain may not be representative for real world application
Our on-going work • Two program pools on RSDIMU project • C vs. PASCAL • 34 versions vs. 20 versions • 100 million test cases generated • Fault correlation analysis in and between: • 34 C program versions • 20 Pascal program versions • Between the two
Our objective • Is there any difference between the fault correlation in two different program pools? • Does diversity (different programming languages and process) have any influence on the fault correlation of the final programs? • What’s the benefit or loss in diversity with this empirical study?