270 likes | 283 Views
This paper presents a framework for early reliability assessment in software development, integrating formal methods and testing. It discusses case studies and applies Dempster-Shafer inference to NASA datasets.
E N D
A Framework for Early Reliability Assessment (WVU UI: Integrating Formal Methods and Testing in a Quantitative Software Reliability Assessment Framework 2003) Bojan Cukic, Erdogan Gunel, Harshinder Singh, Lan Guo, Dejan Desovski West Virginia University Carol Smidts, Ming Li University of Maryland
Overview • Introduction and Motivation. • Software Reliability Corroboration Approach. • Case Studies. • Applying Dempster Shafer Inference to NASA datasets. • Summary and Further Work.
Introduction • Quantification of the effects of V&V activities is always desirable. • Is software reliability quantification practical for safety/mission critical systems? • Time and cost considerations may limit the appeal. • Reliability growth applicable only to integration testing, the tail end of V&V. • Estimation of operational usage profiles is rare.
Is SRE Impractical for NASA IV&V? • Most IV&V techniques are qualitative in nature. • Mature software reliability estimation methods based exclusively on testing. • Can IV&V techniques be utilized for reliability? • Requirements readings, inspections, problem reports and tracking, unit level tests… Traditional Software Reliability Assessment Techniques Req Design Code Test (Verification & Validation) Unit Integration Acceptance Life cycle long IV&V Implementation
Contribution • Develop software reliability assessment methods that build on: • Stable and mature development environments. • Lifecycle long IV&V activities. • Utilize all relevant available information • Static (SIAT), dynamic, requirements problems, severities. • Qualitative (formal and informal) IV&V methods. • Strengthening the case for IV&V across NASA enterprise. • Accurate, stable reliability measurement and tracking. • Available throughout the development lifecycle.
Assessment vs. Corroboration • Current thinking • Software reliability “tested into” the product through the integration and acceptance testing. • Our thinking • Why “waste” the results of all the qualitative IV&V activities. • Testing should corroborate that the life-cycle long IV&V techniques are giving the “usual” results, that the project follows usual quality patterns.
Approach Trustworthy Software Reliability Measure BHT software reliability corroboration SW Reliability Corroboration Testing Null Hypothesis, H0 Alternative Hypothesis, Ha RPS Combination (Experience, Learning, Dempster-Schafer…) RPS Combination Techniques RPS1 RPS2 RPSk RPSm . . . Software Development Lifecycle Reliability Prediction Systems (RPS) SQM3 SQM6 SQMj SQM1 Software quality Measures (SQM) SQM4 SQMi SQM2 SQM5
Software Quality Measures (roots) • The following ones used in experiments. • Lines of code • Defect density • No defect that remain unresolved after testing, divided by the LOC. • Test coverage • LOCtested / LOCtotal. • Requirements traceability • RT= #_requirements_implemented/#_original_requirements. • Function points • . . . • In principle, any measures available could/should be taken into account. • Defining appropriate Reliability Prediction Systems (RPS).
Reliability Prediction Systems • An RPS is a complete set of measures from which software reliability can be predicted. • The bridge between an RPS and software reliability is a MODEL. • Therefore, select (and collect) those measures that have the highest relevance to reliability. • Relevance to reliability ranked from expert opinions [Smidts 2002].
RPS Model Test coverage Root measures Notation : Test coverage Support measures: ·Implemented LOC (LOCI) ·Tested LOC (LOCT) ·The number of defects found by test (N0) ·Missing function point (FPM) ·Backfiring coefficient (k) ·Defects found by test (DT) ·Linear execution time (TL) ·Execution time per demand (t) Fault exposure ratio (K) C0defect coverage C1 test coverage (statement coverage) a0,a1,a2 coefficients N0 the number of defects found by test N the number of defects remaining K fault exposure ratio TL linear execution time t the average execution time per demand RPS for Test Coverage
Approach Software Reliability Measure BHT software reliability corroboration SW Reliability Corroboration Testing Null Hypothesis, H0 Alternative Hypothesis, Ha RPS Combination (Experience, Learning, Dempster-Schafer…) RPS Combination Techniques RPS1 RPS2 RPSk RPSm . . . Software Development Lifecycle Reliability Prediction Systems (RPS) SQM3 SQM6 SQMj SQM1 Software quality Measures (SQM) SQM4 SQMi SQM2 SQM5
Reliability “worthiness” of different RPS 32 measures ranked by five experts
Combining RPS • Weighted sums used in initial experiments. • RPS results weighted by the expert opinion index. • Removing inherent dependencies/correlations. • Dempster-Shafer (D-S) belief networks approach developed. • Network automatically built from datasets by the Induction Algorithm. • Existence of suitable NASA datasets? • Pursuing leads with several CMM level 5 companies.
Approach Software Reliability Prediction BHT software reliability corroboration SW Reliability Corroboration Testing Null Hypothesis, H0 Alternative Hypothesis, Ha RPS Combination (Experience, Learning, Dempster-Schafer…) RPS Combination Techniques RPS1 RPS2 RPSk RPSm . . . Software Development Lifecycle Reliability Prediction Systems (RPS) SQM3 SQM6 SQMj SQM1 Software quality Measures (SQM) SQM4 SQMi SQM2 SQM5
Bayesian Inference • Allows for the inclusion of imprecise (subjective) probability of failure. • Subjective estimate reflects beliefs. • Hypothesis on the event occurrence probability is combined with new evidence, which may change the degree of belief.
Bayesian Hypothesis Testing (BHT) • Hypothesized reliability H0 comes as a result of RPS combination. • Based on the level of (in)experience, the degree of belief assigned: P(H0). • Corroboration testing now looks for the evidence in favor of the hypothesized reliability. • Ho : q <= qo null hypothesis H1 : q > qo alternative hypothesis.
The number of corroboration tests according to BHT theory
Controlled Experiments • Two independently developed versions of PACS (smart card based access control). • Controlled requirements document (NSA specs).
RPS Experimentation RPS predictions of system failure rates: Predicted Failure Rate: 0.084 Actual Failure Rate: 0.09
Reliability Corroboration • Accurate predictors appear adequate • Low levels of trust in the prediction accuracy. • No experience in repeatability at this point in time.
“Research Side Products” • Significant amount of time spent studying and developing Dempster-Shafer inference networks. • “No hope” of demonstrating this work within the scope of integrating RPS results. • Availability of suitable datasets. • But, some datasets are available. So, use them for D-S demo! • Predicting fault-prone modules in two NASA projects (KC2, JM1) • KC2 contains over 3,000 modules, 520 modules of research interest • 106 modules have errors, ranging from 1 to 13 • 414 modules are error free • JM1 contains 10,883 modules • 2,105 modules have errors, rangingfrom 1 to 26 • 8,778 modules are error free • Each dataset contains 21 software metrics, mainly McCabe and Halstead
How D-S Networks Work • Combining distinct sources of evidence by the D-S scheme. • Building D-S networks by prediction logic. • Nodes connected by implication rules. • Each implication rule assigned a specific weight. • Updating belief for the corresponding nodes • Propagating the updated belief to the neighboring nodes, and throughout the entire network. • D-S network can be tuned for a various range of verification requirements.
KC2 JM1 D-S Networks vs. ROCKY
KC2 JM1 D-S Networks vs. See5
KC2 dataset D-S Networks vs. WEKA
JM1 D-S Networks vs. WEKA
Status and Perspectives • Software reliability corroboration allows: • Inclusion of IV&V quality measures and activities into the reliability assessment. • A significant reduction in the number of (corroboration) tests. • Software reliability of safety/mission critical systems can be assessed with a reasonable effort. • Research directions. • Further experimentation (data sets, measures, repeatability). • Defining RPS based on the “formality” of the IV&V methods.