240 likes | 254 Views
Software Anomaly Trends in JPL Missions. Assurance Technology Program Office presented by Allen P. Nikora Nelson W. Green Jet Propulsion Laboratory/California Institute of Technology. Agenda. Overview Software Failure Intensities Software Failures vs. All Failures
E N D
Software Anomaly Trends in JPL Missions Assurance Technology Program Office presented by Allen P. Nikora Nelson W. Green Jet Propulsion Laboratory/California Institute of Technology
Agenda • Overview • Software Failure Intensities • Software Failures vs. All Failures • Software Failures by Criticality • Discussion and Future Work • Backup Material Software Anomaly Trends in JPL Missions
Overview • Presentation based on work performed for the Ultra-Reliability1 (UR) program • UR objective: Achieve NASA-wide reliability of one order of magnitude better than today • Definitions • Ultra-reliability • Given a specific time frame – reliability one order of magnitude more than current standard • Long Life • Missions with a design lifetime of 20 years or more • UR Program Elements • Integrated Systems Health Management with feedback for extremely long term reliability • Reliability Roadmap • Software reliability • Reliability for extended missions • Workshop on Lunar and Mars mission reliability • Ultra-Reliability Integration is a multi-center task funded by NASA OSMA • Phil Napala – NASA Headquarters S&MA Sponsor • Charles Barnes – ATPO Program Manager • Andrew Shapiro – Program Element Manager Software Anomaly Trends in JPL Missions
Pre-Launch/Launch Program Planning – Area Identification Reliability Issue Identification, Mitigation Strategies TransitInitial task Execution Re-evaluation New task identification Revaluation Orbit/DescentInfrastructure Development Strategies for new missions Surface Ultra-Reliability by Design Overview Ultra Reliability Phases Software Anomaly Trends in JPL Missions
Overview • UR program is NASA-wide: • to address different ultra-reliability needs in different NASA Enterprises • to leverage the wide variety of expertise across all of NASA • to get buy-in and make this a successful program • to develop a NASA - wide infrastructure (paramount) • to leverage overlapping issues • to take advantage of related on-going NASA tasks • There is a lead center for each major area, but many centers should participate and be funded in each area • Metric for leveraging of internal S&MA research • The development of reliability assessment is a key for success • Intelligent consistent use of existing NASA methods and an opportunity to develop novel ways of assessing reliability Software Anomaly Trends in JPL Missions
Overview • Results reported are based on work performed for the Software Reliability element of the UR program • UR program overall goal: improve the reliability of NASA systems by an order of magnitude • Reliability improvement goal includes software components • Achieving goal requires knowledge of software reliability for current and historical missions • Analyzed space mission software failures observed during mission operations to determine if and how software failure behavior changes from mission to mission. • How does software failure intensity change from mission to mission? • Does the proportion of anomalies due to software change from mission to mission? • Does the proportion of software anomalies associated with a specific criticality level change from mission to mission? Software Anomaly Trends in JPL Missions
Legend • Flight Software Anomaly • Ground Software Anomaly • The number of points on a given date represents the number of anomalies observed on that date • Left box edge represents launch date • Right box edge represents • End of mission • Anomaly collection date (for current missions) Overview Flight and Ground Software Anomalies by Mission, Date Software Anomaly Trends in JPL Missions
Software Failure Intensities • Observed increased software failure intensity during mission operations from mission to mission • Computing software failure intensity • Collect ISAs for planetary missions • Identify software anomalies for a given project using code in “Cause” field • Compute failure intensity = number of failures/mission length • Completed missions length: (mission end date) – (mission launch date) • Current missions length: (ISA data collection date) – (mission launch date) • Flight and ground software failure intensities computed separately • Flight and ground software may be of different mission criticality • Different structural characteristics • Different development practices • Applied T4253H smoother to remove noise in anomaly data • More thorough recording of failures for one mission than for another • Different skill, experience levels in different operations teams • Incorrect identification of anomaly cause (e.g., SW failure labeled as non-SW) • Software Anomaly Trends in JPL Missions
Raw Data Raw Data T4253H Smoothed Mars Pathfinder CASSINI Mars Mars Stardust Mars Genesis Mars Deep Mars Global Climate Polar Odyssey Exploration Impact Reconnaissance Surveyor Orbiter Lander Rover Orbiter Mission Name (in launch order) Software Failure IntensitiesSmoothed Data Additional FSW Failure Intensities Last Slide Viewed Software Anomaly Trends in JPL Missions
Raw Data Raw Data T4253H Smoothed Mars Pathfinder CASSINI Mars Mars Stardust Mars Genesis Mars Deep Mars Global Climate Polar Odyssey Exploration Impact Reconnaissance Surveyor Orbiter Lander Rover Orbiter Mission Name (in launch order) Software Failure IntensitiesSmoothed Data Additional GSW Failure Intensities Last Slide Viewed Software Anomaly Trends in JPL Missions
Software Failure Intensities • Analysis indicates that failure intensities are increasing at a greater than linear rate from mission to mission. • New techniques to achieve UR program reliability goal may need to be developed • Estimated failure intensity may be low. Detailed analysis of small sample of ISAs from one project indicates that number of SW ISAs may be undercounted by at least a factor of 2. • Work underway to identify software/mission/development process characteristics associated with increasing failure intensity • Budget • Schedule • Mission complexity • Staffing/effort • In-house vs. subcontracted • Avionics complexity • Executable image size • Software Anomaly Trends in JPL Missions
Software Failures vs. All Failures • Analyzed SW ISAs for projects identified on slides 9 and 10 to determine trends in the proportion of SW anomalies to all anomalies. • Results • Software anomalies represent an increasing proportion of mission anomalies • Increase in the proportion of anomalies due to SW (next slide) between 1996 and 2003 (especially ground software) • Overall increase in proportion of anomalies due to SW for Mars missions (slide 14), rising to nearly 70%. • No trend apparent in proportion of SW anomalies from 2003 to present (next slide) • Discrepancy between proportions in slides 13, 14 • Different techniques used to identify SW anomalies – “Cause” field vs. detailed analysis of “Description” and “Corrective Action” fields. • Inconsistent representation may indicate issues with problem reporting practices • Partial, but not complete, overlap between missions analyzed for slide 13 and slide 14. • Different computation of proportions – cumulative for slide 13, mission-by-mission for slide 14. Software Anomaly Trends in JPL Missions
Software Failures vs. All Failures Proportion of SW to Non-SW ISAs – Running Average (Planetary Missions - Post Mars Observer) Last Slide Viewed Software Anomaly Trends in JPL Missions
Mars Mars Mars Mars Mars Mars Mars Observer Global Pathfinder Climate Polar Odyssey Exploration Surveyor Orbiter Lander Rover Software Failures vs. All FailuresSmoothed Data Proportion of All Anomalies due to SW for Selected Missions (raw data and smoothed) Adapted from results presented in “Anomaly Trends for Robotic Missions to Mars: Implications for Mission Reliability”, N. Green, A. Hoffman, T. Schow and H. Garrett, 44th AIAA Aerospace Sciences Meeting and Exhibit, Reno, Nevada, Jan. 9-12, 2006 Only anomalies after launch and before MOI are included in this plot Last Slide Viewed Software Anomaly Trends in JPL Missions
Software Failures by Criticality • Analyzed SW ISAs for projects identified on slides 9 and 10 to determine trends in the proportions of different criticality levels for SW anomalies. • Results • FSW • Small decrease in Criticality 2 anomalies • Increase in Criticality 1 anomalies from ~5% to ~10% • Small increase in Criticality 3 anomalies • GSW • Significant decrease in proportion of Criticality 2 anomalies • No trend in Criticality 1 anomalies • Significant increase in proportion of Criticality 3 anomalies Software Anomaly Trends in JPL Missions
Software Failures by Criticality Running Proportion of FSW ISAs by Criticality (Planetary Missions - Post Mars Observer) Last Slide Viewed Software Anomaly Trends in JPL Missions
Software Failures by Criticality Running Proportion of GSW ISAs by Criticality (Planetary Missions - Post Mars Observer) Last Slide Viewed Software Anomaly Trends in JPL Missions
Discussion and Future Work • Apparent increase in SW Failure Intensities, Proportion of SW Anomalies • Potential to affect future mission operations • Reduced science return • Missed observation opportunities • Damage to instruments • Increase effort required for • Contingency planning • Recovering from anomalies • Additional analysis in progress to verify trends, check accuracy of estimated failure intensities • Detailed analysis of anomaly descriptions, anomaly verification, and corrective action descriptions from ISAs Software Anomaly Trends in JPL Missions
Discussion and Future Work • Future Work • Identify relationships between observed increase in failure intensities/proportion of failures due to SW and measurable characteristics of software/mission/development process • Budget • Schedule • Mission complexity • Staffing/effort • In-house vs. subcontracted • Avionics complexity • Executable image size • • Determine whether there are relationships between numbers and types of SW failures observed during development testing and SW failures observed during launch • Identify trends in effort required to deal with SW anomalies • Monitor current/future missions to determine whether trends continue • Resolve discrepancies between results reported on slide 12 and slide 13. • Different techniques used to identify software anomalies • Detailed analysis of description, verification, and corrective action vs. • Identification via “Cause” field in ISA. • Indicates that problem reporting procedures may need to modified to accurately identify SW anomalies. Software Anomaly Trends in JPL Missions
T4253H Smoothing • Description from help system for SPSS 13.0 • The smoother starts with a running median of 4, which is centered by a running median of 2. It then resmoothes these values by applying a running median of 5, a running median of 3, and hanning (running weighted averages). Residuals are computed by subtracting the smoothed series from the original series. This whole process is then repeated on the computed residuals. Finally, the smoothed residuals are computed by subtracting the smoothed values obtained the first time through the process. • References • P.F. Velleman, “Definition and Comparison of Robust Nonlinear Data Smoothing Algorithms,” Journal of the American Statistical Association, vol. 75, September 1980, pp. 609-615. • P. F. Velleman and D. C. Hoaglin, Applications, Basics, and Computing of Exploratory Data Analysis, Boston: Duxbury Press, 1981. Last Slide Viewed Software Anomaly Trends in JPL Missions
Raw Data Raw Data T4253H Smoothed Voyager GALILEO ULYSSES Mars Pathfinder CASSINI Mars Mars Stardust Mars Genesis Mars Deep Mars Global Climate Polar Odyssey Exploration Impact Recon. Surveyor Orbiter Lander Rover Orbiter Mission Name (in launch order) Software Failure IntensitiesPlanetary Missions Flight Software From Voyager to MRO Last Slide Viewed Software Anomaly Trends in JPL Missions
Raw Data Raw Data T4253H Smoothed Voyager GALILEO ULYSSES Mars Pathfinder CASSINI Mars Mars Stardust Mars Genesis Mars Deep Mars Global Climate Polar Odyssey Exploration Impact Recon. Surveyor Orbiter Lander Rover Orbiter Mission Name (in launch order) Software Failure IntensitiesPlanetary Missions Ground Software From Voyager to MRO Last Slide Viewed Software Anomaly Trends in JPL Missions
Software Failure IntensitiesAnalysis Summary • Conducted Curve Fit Analysis with SPSS 13.0 to determine whether failure intensities were increasing, decreasing, or showed no trends. • Best-fit curve for all data sets indicates super-linear growth in failure intensities. • Cubic curve with adjusted R2 0.7 • 11 curves fitted to data Last Slide Viewed Software Anomaly Trends in JPL Missions