1 / 30

Software Reliability Techniques Applied to Constellation

Software Reliability Techniques Applied to Constellation. Technical Briefing NASA OSMA Software Assurance Symposium September 9-11, 2008. Allen P. Nikora, JPL/Caltech, PI Sergio Guarro, ASCA, Inc., Co-I.

maustin
Download Presentation

Software Reliability Techniques Applied to Constellation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software Reliability Techniques Applied to Constellation Technical Briefing NASA OSMA Software Assurance Symposium September 9-11, 2008 Allen P. Nikora, JPL/Caltech, PI Sergio Guarro, ASCA, Inc., Co-I This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration. The work was sponsored by the NASA Office of Safety and Mission Assurance under the Software Assurance Research Program led by the NASA Software IV&V Facility. This activity is managed locally at JPL through the Assurance and Technology Program Office SAS08_Classify_Defects_Nikora

  2. Agenda • Problem/Approach • Relevance to NASA • Accomplishments and/or Tech Transfer Potential • Technology Readiness Level • Data Availability • Impediments to Research or Application • Next Steps SAS08_CxP_SWRel_Nikora

  3. Problem/Approach • Software-related failures responsible for more than half of NASA major space mission losses or malfunctions between 1996 and 2007 • Large majority due to system conditions not been anticipated or fully understood in the system / software specification and design process • As NASA space missions are increasingly controlled by software, probability of mission failure due to software may increase if no action is taken • Minimizing loss of crew/loss of mission requires appropriate techniques to evaluate reliability of on-board and ground-based support software during all development phases. SAS08_CxP_SWRel_Nikora

  4. Problem/Approach (cont’d) • Modeling of a software system in its anticipated operational context is an important aspect of assuring software reliability. • Recognized in concept of “operational profile”, software reliability model assumptions • Many techniques for modeling software reliability treat software in isolation from the hardware on which it runs and which it controls. • Goals: • Demonstrate feasibility of applying Context-based Software Risk Modeling (CSRM) technique to CxP applications/scenarios • Focus on mission-critical applications such as GN&C, Safety and Health Monitoring, Launch Abort • Develop guidelines for use of context-based techniques • Infuse context-based SW reliability modeling techniques to other NASA SW development efforts SAS08_CxP_SWRel_Nikora

  5. Relevance to NASA • Reliability of software component depends on operating environment. CSRM explicitly includes context in system/software models. • Unlike traditional software reliability modeling techniques, CSRM helps guide software testing • CSRM can be used to evaluate risk of software failure during specification and design phases as well as during implementation and test. • Identify risk-prone areas earlier in development  reduced number of defects passed through to test and operations • Earlier identification of risk-prone areas  more effective management of development resources SAS08_CxP_SWRel_Nikora

  6. Accomplishments and/or Tech Transfer Potential • Selected PA-1 as initial scenario to be modeled • Acquired relevant artifacts from Windchill, JSC contacts • Analysis of PA-1 software specifications/design in progress • Development of CSRM models of PA-1 software in progress. • GNC is the initial software component selected for modeling SAS08_CxP_SWRel_Nikora

  7. Technology Readiness Level • CSRM is TRL 9 • Actual system has been thoroughly demonstrated and tested in its operational environment. • All documentation completed. • Successful operational experience. • Sustaining engineering support in place. • Goal of this effort is to apply CSRM to CxP rather than developing new software reliability modeling techniques SAS08_Classify_Defects_Nikora

  8. Data Availability • Access to Windchill repository, CxP artifacts • Contact points at JSC, GSFC to • Help with navigation through repository • Obtain needed artifacts from contractors that aren’t in repository SAS08_Classify_Defects_Nikora

  9. Impediments to Research or Application • Large volume of data – difficult to navigate through repository and identify appropriate artifacts. SAS08_Classify_Defects_Nikora

  10. Next steps • Complete development of PA-1 model(s) • Analyze models; evaluate software failure risk • Review models, results • Refine models • Select further applications to model SAS08_CxP_SWRel_Nikora

  11. Technical Detail

  12. CSRM Key FeaturesFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • CSRM  Context-based Software Risk Model • A practical approach and framework for assurance of mission-critical software-intensive systems for NASA programs’ use • System and mission scenario analysis oriented • Integrates traditional PRA event-tree / fault-tree models with Dynamic Flowgraph Methodology (DFM) models suited to handle software-intensive and human-in-the-loop systems (“dynamic PRA” environments) • Can be applied for both preliminary assessments of yet-to-be-written software and in-depth assessment of existing, testable software • Produces software test guidance, as well as assurance and PRA-integrated risk models and metrics • Supported by implementation toolsets • Classical PRA and DFM software Approach Next Slide SAS08_Classify_Defects_Nikora

  13. CSRM Technical HighlightsFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • PRA-style development of mission and risk scenario models • Uses traditional event-tree / fault-tree logic models at top modeling level to capture the basic aspects of mission scenarios • Uses Dynamic Flowgraph Methodology (DFM) models to capture dynamic and logically complex aspects of system/software/operator interactions • DFM analytical and quantitative results are fully compatible / can be integrated with PRA tool binary models and results (SAPHIRE, CAFTA) • Can incorporate risk, reliability and assurance info from other tools and sources • SW-process-quality information and non-project-specific reliability data and assessments • SW reliability info collected in other projects and deemed applicable as a first-estimates of risk levels in current SW modules of interest • Produces software test guidance, as well as assurance and PRA-SW defect / reliability model output (e.g., Schneidewind’s model or other) • Traditional test results Approach Next Slide SAS08_Classify_Defects_Nikora

  14. CSRM Analysis OverviewFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 Event-tree branch-point to be further modeled and analyzed Inspect / examine conventional PRA ET/FT models and identify SW related system functions and events Quantify SW functions and events via process-quality assessment methods and/or generic SW data (as needed and applicable for preliminary assessment and prioritization purposes) Approach Next Slide SAS08_CxP_SWRel_Nikora

  15. P1 1-P1 CSRM Analysis Overview (cont’d)From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 Develop DFM model of high-priority SW related functions, accordingly expanding ET branch-point or FT events of interest Approach Next Slide SAS08_CxP_SWRel_Nikora

  16. CSRM Analysis Overview (cont’d)From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 Use DFM multi-valued logic / dynamic analysis of higher-level ET or FT event to identify SW and HW/SW potential failure mode sub-scenarios (e.g., “cut-set” constituted of < HW-failure-X AND SW-faulty-response-Y >) Test HW/SW in actual or simulated integrated system set-up, to exclude or establish risk upper-bound for existence of analytically identified potential cut-sets Insert and integrate Step 4 and 5 results into overall PRA ET/FT models, to obtain full system-level mission assurance, risk analysis and quantification perspective Approach Next Slide SAS08_CxP_SWRel_Nikora

  17. CSRM Data NeedsFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • Logic model(s) development and qualitative analysis • Logic model(s) development and qualitative (i.e., logic) analysis are iterative processes. • Logic model(s) for the software and the balance-of-system will evolve with the design of the system. • The fidelity of the model(s) and the qualitative analytical results increases with this evolution process. Early Design Phase System Integration Phase Design Maturity Data need for SW Interface documents, Preliminary SW design spec., Preliminary Hazard Analyses, FMECAs, Classification of SW failure data for similar designs Detailed SW design docs., Pseudo code, Preliminary module testing (qualitative results – e.g. types of contexts tested, types of errors encountered) Executable code, Module & Integration testing (qualitative results) Data need for Balance-of-system Conceptual design docs., High level qualitative risk assessment models such as FMEAs, master logic diagrams Detailed design docs., Preliminary qualitative risk assessment models such as event sequence diagrams, event trees, fault trees, fish bone models etc. System integration docs., System PRA model Approach Next Slide SAS08_CxP_SWRel_Nikora

  18. CSRM Data Needs (cont’d)From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • Quantitative Analysis • Quantitative analysis is also an iterative process: • Preliminary qualitative and quantitative results identify SW error-forcing contexts to be tested and establish the testing criteria for meeting the reliability threshold. • More detailed qualitative and quantitative results identify areas of refinement for risk management and risk reduction. • Final qualitative and quantitative results estimate the contribution of the SW to the overall system risk. Early Design Phase System Integration Phase Design Maturity Data need for SW Generic SW failure data or reliability / risk assessments for similar designs Preliminary module testing (qualitative / quantitative results – e.g. type and no. of contexts tested, no. of tests executed, type & no. of errors encountered) Executable code, Module & Integration testing (quantitative results) Data need for Balance-of-system High level quantitative risk assessment models such as top-level event tree / fault-tree quantifications Preliminary quantitative risk assessment results, such as quantitative estimates for failure modes of sub-systems interacting w/ the SW Quantitative risk assessment results Approach Next Slide SAS08_CxP_SWRel_Nikora

  19. Dynamic Flowgraph MethodologyFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • DFM is a directed-graph, modeling methodology that uses multi-valued logic and discrete-event dynamic representation of system parameter and component states • Capable of handling – within the limits of the discrete state and time representations: • Cause-effect relationshiops • Time-dependent relationships. • Feedback and logic loops • Cognitive models of human operator actions. • A DFM system model, once constructed, can be analyzed in either deductive (e.g., “fault-tree like”) of inductive (e.g., “FMEA or event-tree like”) mode • Deductive analysis produces the “prime implicants” for any “top event” that can be defined in terms of combinations of possible system parameter and/or component states (even across time boundaries) • Inductive analysis tracks the evolution of parameter, component and system states over discrete time and logic steps, starting from any user defined combination of states that represents a possible system state Approach Next Slide SAS08_Classify_Defects_Nikora

  20. DFM and PRA/PSA ToolsFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • DFM is not intended to be a substitute of any existing PRA tool (although in “binary mode” it can mimic both event-tree and fault-tree models) • DFM can be most useful as a PRA/PSA modeling supplement, for those special portions of a system or mission that call for the use of non-static, non-binary modelsA DFM system model, once constructed, can be analyzed in either deductive (e.g., “fault-tree like”) of inductive (e.g., “FMEA or event-tree like”) mode • DFM can be integrated with an existing PRA/PSA framework by inserting its results into an existing ET / FT model framework • This can be automated if the ET / FT tool offers a data interchange utility and / or an “open API” Approach Next Slide SAS08_Classify_Defects_Nikora

  21. DFM Constructs andModeling RepresentationsFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • Nodes and discretized state-vectors represent key process parameters and/or components • Mapping between the discretized state-vectors is governed by multi-valued logic rules • Transfer-boxes (decision tables) • Transition-boxes (decision tables with built-in time transitions) Approach Next Slide SAS08_Classify_Defects_Nikora

  22. Steps in Typical DFM AnalysisFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • Step 1: Model Construction • Construct DFM model of system of interest • Representing the system behavior and flow of causality • (Model is a network of nodes, transfer-boxes, transition-boxes, and associated arc connections) • Step 2: System Analysis • Use DFM inductive and deductive engines to: • Verify specified behavior (can be done on system “design model”) • Identify system failure modes in terms of basic component failure modes (“Automated FMEA”) • Develop “Dynamic Scenario Trees” (similar to dynamic event trees) • Identify prime implicants for system failure (“Top-Events” of interest) • Define test sequences specifically suited to identify and isolate varioius classes of possible faulrs. (This feature is useful for generating input vectors for testing software based systems) Approach Next Slide SAS08_Classify_Defects_Nikora

  23. Steps in Typical DFM AnalysisFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • Step 3: Quantification of System Analysis • DFM Model results usually identify subevents that contribute probability to the branch-points of a system / mission event tree • DFM analysis is equivalent in concept and results to the fault-tree analyses carried out in traditional PRA to provide further definition and quantification to system sequences initially defined via event-tree models • DFM “top events” are quantified in fashion similar to fault-tree “top events” • To quantify a DFM Top Event, the set of associated n prime implicants (PIs) is first converted into a set of m mutually exclusive implicants (MEIs) • Top Event = MEI1MEIm • The sum of probabilities for the MEIs yields the probability of the Top Event • P(Top Event) = P(MEI1) ++ P(MEIm) • The above is in essence the multi-value logic equivalent of the BDD (Binary Decision Diagram) quantification process for fault-trees Approach Next Slide SAS08_Classify_Defects_Nikora

  24. Use of DFM in CSRM FrameworkFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • CSRM (Context-based Software Risk Model) is a framework to address and guide the integration of functional models of software-related risk into “classical” PRA / PSA frameworks • CSRM is the modeling approach for software intensive space systems recommended and illustrated in the NASA PRA Procedures Guide • CSRM can be implemented for simpler systems using only standard ET / FT PRA models • For more complex systems, use of methods with more advanced and dynamic features (such as DFM or “colored Markov”) is recommended, at least for part of the modeling and analytical effort ) Approach Next Slide SAS08_Classify_Defects_Nikora

  25. Example: Top Level DFM Model of Mini-AERCam SystemFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 This node represents the actual attitude of the Mini-AERCam. It is discretized into 3 states: 1. Correct (Error < 3˚) 2. Slightly Inaccurate (Error of 3˚ to 10˚) 3. Inaccurate (Error > 10˚) This is the sub-model for the GN&C Software. It is expanded in the next slide. 1 clk = 1 sec. ) Approach Next Slide SAS08_Classify_Defects_Nikora

  26. Example: DFM Model of Mini-AERCamGN&C Sub-ModelFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 This sub-model includes the GPS hardware and the translational navigation software. ) Approach Next Slide This sub-model includes the angular rate gyro hardware and the rotational navigation software. 1 clk = 1 sec. SAS08_Classify_Defects_Nikora

  27. Example: DFM Model of Mini-AERCam Propulsion SubsystemFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • This node represents a leak in the propulsion • system fuel lines after the isovalve but before the • thruster solenoids. • It is discretized into 4 states: • 1. None • 2. Small (1 – 40%) • A small leak produces thrust and torque of less than 40% of the total thrust and torque the Mini-AERCam can produce to counteract it. A leak of this magnitude should not significantly affect the performance of the Mini-AERCam. • 3. Large (41-80%) • Produces thrust or torque within 80% of the Mini-AERCam’s. The Mini-AERCam can compensate and should be recoverable, but its performance is inadequate to perform its mission safely. • 4. Critical (> 81%) • The Mini-AERCam is expected to be • uncontrollable. ) Approach Next Slide SAS08_Classify_Defects_Nikora

  28. Analysis of Mini-AERCam DFM ModelFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • Analysis of the Autonomous Hold Failure Top Event yields n prime implicants (PIs) • Top Event = PI1 Pin • DFM prime implicants identify: • HW-only fault conditions • SW-only fault conditions • Combinations of HW & SW fault conditions • For example: • Prime Implicant 1 is • IsoValveCond = Stuck Closed at time-1. HW only fault • Prime Implicant 2 is • TargetAtt = Inaccurate at time-1. SW only error • (The TargetAtt node in the GN&C sub-model represents the accuracy of the target attitude determined by the rotational guidance software function. The PI identifies the possibility that a programmer introduced an error when coding the module, resulting in severely inaccurate output when the latter is used.) Approach Next Slide SAS08_Classify_Defects_Nikora

  29. Mini-AERCam Model Analysis (cont’d)From “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • Prime Implicant 3 is • PropLineLeak = Small Leak at time-2 . and. • RotThrusterComm = Slightly Inaccurate at time-1 • This Prime Implicant corresponds to a combination of hardware and software conditions. (The hardware condition is a small leak in one of the propellant lines. The software condition is an algorithmic fault that causes drifting of the attitude control given a sub-nominal thrust caused by a line leak.) • If only one of the two conditions exists, the Mini-AERCam does not fail: • The GN&C software works properly when no leak exists. • If a small leak occurs but there is no drift error in the attitude control, the GN&C is able to compensate for the leak by using the thrusters. • This PI example shows how DFM analysis can identify an off-nominal entry condition for which the SW may have to be tested: • does not correspond to a normal state of the system; • would not be usually identified and tested for in a standard SW V&V process addressing the SW operational profile. Approach Next Slide SAS08_Classify_Defects_Nikora

  30. Risk-Informed Testing of Potential SW RiskScenario and Quantification of DFM Prime ImplicantFrom “Risk-Informed Software Assurance for NASA Space Missions”, Sergio Guarro, ASCA Inc., November, 2007 • Prime Implicant 3 is one of the mutually exclusive implicants. It can be quantified by considering: • The “entry condition” (i.e. small propellant line leak) • The conditional probability that the software causes an attitude shift under this triggering condition • From a HW failure rate database (e.g., NPRD), the entry condition can be determined to occur with a failure rate of 6.00E-06/hr. For a 5 hour mission duration, the associated probability is P(C3) = 3.00E-05. • The SW attitude control function can then be tested in the (real or simulated) presence of the system (HW fault) entry condition to determine whether it performs correctly or not • Without the specific identification of the HW fault condition, random sampling of the SW normal operational input space may never cover the actual system condition! • In the case discussed the risk quantification process was completed via a simulated “hardware in the loop” test process • Sampling conducted across the possible range of initial states (i.e., MiniAERCam spatial and rotational positions, compatible thruster command settings, etc.) in which the system could be at the onset of the leak condition. • With the aid of the CSRM – DFM analysis a normalized sampling set of 450 tests was sufficient to “demonstrate” a risk contribution in the order of 1.E-6 from this scenario, if no erroneous GN&C SW response was observed in the tests • This was obtained via a straight Bayesian estimation, starting from a uniform, non informative prior Approach Next Slide SAS08_Classify_Defects_Nikora

More Related