200 likes | 222 Views
Software Reliability Corroboration. (WVU UI: Integrating Formal Methods and Testing in a Quantitative Software Reliability Assessment Framework 2002). Bojan Cukic, Erdogan Gunel, Harshinder Singh, Lan Guo West Virginia University Carol Smidts University of Maryland. Overview.
E N D
Software Reliability Corroboration (WVU UI: Integrating Formal Methods and Testing in a Quantitative Software Reliability Assessment Framework 2002) Bojan Cukic, Erdogan Gunel, Harshinder Singh, Lan Guo West Virginia University Carol Smidts University of Maryland
Overview • Introduction and motivation. • Software reliability assessment and NASA IV&V. • Bayesian hypothesis testing approach. • A methodology for formulating priors. • Case study • Accounting for severities and risks. • Summary
Introduction • Improvement of software V&V practices, especially for high assurance systems. • Quantification of the effects of V&V activities is always desirable. • Is software reliability quantification practical for safety/mission critical systems? • Time and cost considerations may limit the appeal. • Reliability growth applicable only to integration testing, the tail end of V&V. • Estimation of operational usage profiles is rare.
Is SRE Impractical for NASA IV&V? • Most IV&V techniques are qualitative in nature. • Mature software reliability estimation methods based exclusively on operational (system) testing. • Neglects the investment made in other IV&V techniques • Requirements readings, inspections, problem reports and tracking, unit level tests… Tradional SW Rel. Assessment Req Design Code Test (Verification & Validation) Unit Integration Acceptance Life cycle long IV&V Implementation
Regulatory Viewpoint • Regulatory view: DO178-B (software considerations in airborne systems and equipment certification) “… methods for estimating the post-verification probabilities of software errors were examined. The goal was to develop numerical requirements for such probabilities for software in computer-based airborne systems of equipment. The conclusion reached, however, was that currently available methods do not provide results in which the confidence can be placed to the level required for this purpose... If the applicant proposes to use software reliability models for certification credit, rationale for the model should be included in the plan for software aspects of certification, and agreed with by the certification authority.”
Contribution • Develop software reliability assessment methods that build on: • Stable and mature development environments. • Lifecycle long IV&V activities. • Utilize all relevant available information. • Qualitative (formal and informal) IV&V methods? • Strengthening the case for IV&V all across NASA enterprise.
Assessment vs. Corroboration • Current thinking • Software reliability “tested into” the product through the integration and acceptance testing. • Our thinking • Why “waste” the results of all the qualitative IV&V activities. • Testing should corroborate that the life-cycle long IV&V techniques are giving the “usual” results, that the project follows usual quality patterns.
Reliability Assessment (No Prior Assumptions) • P(q < q0) >= 0.99. Required testing effort (N), from random sampling: Number of failure free test cases as a function of the required failure rate, with C=0.99 Value of Number of Tests10-2 458 10-3 4,602 10-4 46,048 10-5 460,51410-6 4,605,167 Required testing effort not realistic.
Bayesian Inference • Allows for the inclusion of a subjective probability of failure. • Subjective estimate based on observed behavior, reflects beliefs. • Hypothesis on the event occurrence probability is combined with new evidence, which may change the degree of belief..
Bayesian Estimation(Non Ignorance Priors) • Needs the following assumption: • The system has achieved desired reliability prior to acceptance testing. • This “guess” should be “reasonably accurate.” • Use random tests (operational profile) to corroborate assumed system failure probability. • How many failure free random tests U should be performed?
Benefits • What if corroboration testing is not failure free? • Keep adjusting the target number of tests[Littlewood 97, 98]
Bayesian Hypothesis Testing (BHT) • Problem of Bayesian estimation: • Categorical assumption that the program meets required reliability. • BHT makes this a probability statement, P(H0). • Corroboration testing now looks for the evidence in favor of the hypothesized reliability. • Ho : q <= qo null hypothesisH1 : q > qo alternative hypothesis.
The number of corroboration tests according to BHT theory. qo P(Ho) no n1 n2 0.01 0.01 457 476 497 0.001 0.01 2378 2671 2975 0.0001 0.01 6831 10648 14501 0.00001 0.01 9349 33176 63649 0.000001 0.01 9752 101273 282007 0.01 0.1 228 258 289 0.001 0.1 636 1017 1402 0.0001 0.1 853 3157 6150 0.00001 0.1 886 9646 27281 0.000001 0.1 890 30067 123725 0.01 0.4 90 128 167 0.001 0.4 138 411 739 0.0001 0.4 146 1251 3260 0.00001 0.4 147 3889 14724 0.000001 0.4 147 12222 67468 0.01 0.6 50 87 126 0.001 0.6 63 269 552 0.0001 0.6 65 827 2458 0.00001 0.6 65 2584 11173 0.000001 0.6 65 8139 51351
Formulating Priors • Formulation of prior beliefs is the most important research issue. • Historical data on failure occurrences under the same V&V regime. • Historical data on failure occurrence reduction related to the application of specific verification techniques (very few studies). • Process effectiveness measures [Smidts 98]. • Transforming fault density into failure intensity [Smidts 01]. • Represent the application of a specific verification method by an appropriate number of random tests [Miller et. al. 94].
Can This Be Done? • Is it realistic to expect software developers to hypothesize on the operational reliability? • Experiment (Smidts et. al.). • A panel of experts ranked 32 measures related to software reliability. • Ranks normalized to a [0, 1] range. • Highly ranked measures: • Failure rate (0.98), test coverage (0.90), fault density (0.73). • Low ranked measures: • Mutation testing(0.48), function point analysis (0.00), bugs per line of code (Gaffney estimate, 0.00).
Controlled Experiment • A company contracted to develop a program (smart card based access control system, PACS). • Controlled requirements document (NSA specs). • Five software engineering measures monitored: • Defect density, test coverage, requirements traceability, function points, Gaffney. • Each measure can used within a reliability prediction system (RPS). • Accurate RPS. • Defect density, test coverage and requirements traceability. • Inaccurate RPS: function points and Gaffney.
Software Reliability Corroboration • Accurate predictors are adequate for the corroboration approach. • A weighted linear combination of the three measures (RPS) gives a very accurate reliability prediction. • Low levels of trust in the prediction accuracy. • No experience in repeatability. • Low value of P(H0) still requires substantial but realistic reliability corroboration effort.
Accounting for Failure Severities • Not all the failures encountered in corroboration testing are equally important. • Instead of counting generic failures, test failures stratified according to their severity. • If a high severity failure encountered in corroboration testing, strong evidence in favor of the alternative hypothesis. • Allows for the tolerance towards low severity failures.
Approach Recap Software Reliability Measure BHT software reliability corroboration SW Reliability Corroboration Testing Null Hypothesis, H0 Alternative Hypothesis, Ha RPS Combination (Experience, Learning, Dempster-Schafer…) RPS Combination Techniques RPS1 RPS2 RPSk RPSm . . . Software Development Lifecycle Reliability Prediction Systems (RPS) SQM3 SQM6 SQMj SQM1 Software quality Measures (SQM) SQM4 SQMi SQM2 SQM5
Status and Perspectives • Software reliability corroboration allows: • Inclusion of IV&V quality measures and activities into the reliability assessment. • A significant reduction in the number of (corroboration) tests. • Software reliability of safety/mission critical systems can be assessed with a reasonable effort. • Research directions. • Sound formulation of prior beliefs from IV&V. • Further experimentation (other measures, repetition). • Can prior beliefs be based on the “formality” of the IV&V methods (formal methods)?