250 likes | 441 Views
NASA OSMA SAS '04. System and Software Reliability Technical Presentation Naval Surface Warfare Center Dahlgren Division Software Assurance Technology Center & University of Connecticut Drs. William H. Farr & John R. Crigler (NSWCDD) Dolores Wallace (SATC), & Dr. Swapna Gokhale (UC).
E N D
NASA OSMA SAS '04 System and Software ReliabilityTechnical PresentationNaval Surface Warfare Center Dahlgren DivisionSoftware Assurance Technology Center& University of ConnecticutDrs. William H. Farr & John R. Crigler (NSWCDD) Dolores Wallace (SATC), & Dr. Swapna Gokhale (UC) SAS 04/ GSFC/SATC-NSWCDD
Outline of the Presentation • FY03 & FY04 Research Initiatives • FY03 Research • Description of SMERFS^3 • Description of the Models Implemented • Application of the Models to GSFC Data • Lessons Learned • FY04 Research • Literature Search • System Model Taxonomy • Description of GFSC System Data • Plans for SMERFS^3 • Technology Readiness • Barriers to Research or Application SAS 04/ GSFC/SATC-NSWCDD
2003 & 2004 Research • 2003 (Software Based) • Literature search • Selection of new models • Build new software models into SMERFS^3 • Test new models with Goddard project data • Make latest version of SMERFS^3 available • 2004 (System Based) • Conduct similar research effort for System Reliability • Enhance and validate system models SAS 04/ GSFC/SATC-NSWCDD
SMERFS^3 • Current Version features: • 6 software reliability models • 2D, 3D plots of input data, fit into each model • Various reliability estimates • User queries for predictions • Update constraints: • Employ data from integration, system test, or operational phase • Use existing graphics of SMERFS^3 • Integrate with existing user interfaces, goodness-of-fit tests, and prediction capabilities SAS 04/ GSFC/SATC-NSWCDD
Hypergeometric ModelAssumptions • Test instance, t(i): A collection of input test data. • N: Total number of initial faults in the software. • Faults detected by a test instance are removed before the next test instance is exercised • No new fault is inserted into the software in the removal of the detected fault. • A test instance t(i) senses w(i) initial faults. w(i) may vary with the condition of test instances over i. It is sometimes referred to in the authors' papers as a "sensitivity" factor. This w(i) can take any number of forms. • The initial faults actually sensed by t(i) depend upon t(i) itself. The w(i) initial faults are taken randomly from the N initial faults. SAS 04/ GSFC/SATC-NSWCDD
Schneidewind Model • There are three versions: • Model 1: All of the fault counts for each testing period are treated the same. • Model 2: Ignore the first s-1 testing periods and their associated fault counts. Only use the data from s to n. • Model 3: Combine the fault counts of the intervals 1 to s-1 into the first data point. Thus there are s+1 data points. SAS 04/ GSFC/SATC-NSWCDD
Schneidewind Assumptions • The number of faults detected in each of the respective intervals are independent. • The fault correction rate is proportional to the number of faults to be corrected. • The intervals over which the software is tested are all taken to be of the same length. • The cumulative number of faults by time t, M(t), follows a Poisson process with mean value function μ(t). The mean value function is such that the expected number of fault occurrences for any time period is proportional to the expected number of undetected faults at that time. • The failure intensity function, λ(t), is assumed to be an exponentially decreasing function of time; that is, λ(t)=αexp(-βt) for some α, β > 0. SAS 04/ GSFC/SATC-NSWCDD
Models Incorporated in 2003 • Hypergeometric & Schneidewind Model (enhancements) were Incorporated into SMERFS^3 • These two models require error count failure data • For the Hypergeometric model only the constant form (w(i) = c with c>=0) was implemented • Only error count data was captured in the GSFC project database. The available data included fault occurrence date, life cycle phase & severity level for three separate builds. No data was available on testing intensity measures (number of tests, number of testing personnel, testing hours expended, etc.) • For the Schneidewind Model Type II the global optimal “s” was obtained and the risk criterion measures were implemented for all interval data models. • The risk criterion measures address the important question of when can I release my software to minimize the number of remaining faults and to maximize the chance that a fault will not manifest itself over a specified mission critical time. Risk measures that were implemented during the last half of 2003 included: • Operational quality at time t • Risk criterion metric for the remaining faults at time t • Risk criterion metric for the time to next failure at time t SAS 04/ GSFC/SATC-NSWCDD
2003 Software Available Data • Large GSFC project, but confidentiality required • Several subsystems • Data flat files – much effort into spreadsheet/database • Operational failures only • Removed specific faults and sorted others • Three builds were used (called A,B, & C) consisting of aggregated faults by months for the activity phases: Integration Testing, Operability Testing, System Testing , and Operations and for severity levels 1, 2, and 3. This gave a resulting data set of 201 for Build A; 249 for B, and 187 for C. • Bottom line: organizing data required substantial effort – minimized if project person prepared the data SAS 04/ GSFC/SATC-NSWCDD
GFSC Build A, B & C Faults per Month Data SAS 04/ GSFC/SATC-NSWCDD
Sch. Type II Sch. Type II Yamada Yamada Build A Model Results Build B Model Results Sch. Type II Yamada Build C Model Results
Build A Risk Assessment Criteria Remaining Faults = 5 Mission Duration = 2 Remaining Faults = 5 Desired Goal Desired Remaining Faults Requires 27 Months of Testing; Desired Mission Duration Requires 48 Months of Testing Goal Achieved Mission Duration
Lessons Learned • There is a need to capture additional information on the Faults. • Description of the particular activity that found the faults, • Duration of the activity, • Number of individuals involved, etc. • Schneidewind’s Treatment Type 2 and the Yamada S-shaped models consistently did the best job in fitting the data. • Early fault data was not reflective of the current failure rate. • Both models tend to factor out the early behavior. SAS 04/ GSFC/SATC-NSWCDD
Literature Search Limited to papers published from 1990 onward. Few papers addressed total system reliability before then. Initial search revealed that focus should be on system availability vice reliability. Reviewed 72 journal and conference papers. Availability = the proportion of some specified period of time during which the system is operating satisfactorily Availability = uptime/total time = MTBF/(MTBF + MTTR) Availability is the fundamental quantity of interest for repairable systems and is a more appropriate measure than reliability for measuring the effectiveness of maintained systems. Literature is also replete with special availability measures that have been proposed for specific application systems. SAS 04/ GSFC/SATC-NSWCDD
System Model Taxonomy • System modelcriteria for incorporation into SMERFS^3: • Must use failure data (i.e., time-between-failures) from system testing or operation. Need both hardware and software failures. • Dates of failures and their closures must be included. Date on which a fault was corrected should be provided. • Candidate model must integrate well with the existing graphics and interface capabilities in SMERFS^3. • Two types of availability and reliability modeling approaches: • Model-based analysis • Measurement-based analysis SAS 04/ GSFC/SATC-NSWCDD
Model-based analysis • Model relating failure and repair events of the components to failure and repair events of the system based on its structure. • Model types: • Combinatorial (Fault tree, Reliability block diagram) • State space (Markov chain) • Hierarchical • Advantages: • Can be performed early before the system is available • Facilitates “what-if”/predictive and sensitivity analysis • Disadvantages: • Complex models with many parameters • Data availability to estimate the parameters is an issue SAS 04/ GSFC/SATC-NSWCDD
Measurement-based analysis • Collect data during system operation • Obtain reliability and availability estimates directly from data • Advantages: • Provides true estimates of reliability and availability • Verify assumptions underlying model-based analysis • Reveal model structure and build new models • Estimate the values of model parameters • Disadvantages: • Requires an operational system or at least a prototype • No consensus or uniformity in the type of data required and its collection • Expensive to perform predictive and sensitivity analysis SAS 04/ GSFC/SATC-NSWCDD
Description of GSFC System Data • Requirements for availability measurement • Time of each failure • Time system restored to service after each failure • GSFC system data • Several spacecraft with severity level specified • Acceptance testing through operation • Hardware (predominant) and software • After accounting for each spacecraft, severity, activity, each data set has only a few data points • Date of each failure • Date correction officially accepted • Exact downtime not available. SAS 04/ GSFC/SATC-NSWCDD
Technology Readiness • The prototype tool for SMERFS^3 incorporating the software models and their updates, the existing hardware models, and the basic systems modeling capability will be completed by the end of this year. • Additional needs for general availability of the tool and ease-of-use for the general practitioner include: • A User’s Manual • A training package and supporting documentation • A way of making Program Managers, developers, etc. aware of the technology, the supporting tool and the technologies’ strengths and weaknesses • A distribution medium if the tool is desired • Demonstrated Return-on-Investment using the technology SAS 04/ GSFC/SATC-NSWCDD
Plans for SMERFS^3 in 2004 • Identify 1-2 Candidate Models (August) • Formulate Models for Coding (August - September) • Code models in SMERFS^3 (October – November) • Apply Models to GFSC data (November – December) • Write-up Final Report & distribute SMERFS^3 (December) SAS 04/ GSFC/SATC-NSWCDD
Barriers to Research or Application • Data Availability • When data was available we encountered: • Confidentiality concerns • The right kinds of data were not being collected. (Example: Some measures relating to testing intensity.) • Lack of consistency among and within data sets. (Definitions and quality were particularly troublesome.) • Complexity of the models. The more complex the models are, the more parameters are used to define them. This necessitates advanced computational algorithms and larger data sets with more information that needs to be recorded. • Validation of these models on large systems. There are so many factors (size, type, environment, etc.) to consider that real validation is a serious concern. • Management support on the use of this technology. To gain that we must demonstrate real Return-On-Investment. SAS 04/ GSFC/SATC-NSWCDD