1 / 19

Aerospace Mishaps and Lessons Learned

Aerospace Mishaps and Lessons Learned. 2004 MAPLD International Conference Washington, D.C. September 7, 2004. "... most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices.".

deon
Download Presentation

Aerospace Mishaps and Lessons Learned

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aerospace Mishaps and Lessons Learned 2004 MAPLD International Conference Washington, D.C. September 7, 2004

  2. "... most accidents are not the result of unknown scientific principles but rather of a failure to apply well-known, standard engineering practices." Nancy Leveson in Safeware, 1995.

  3. Seminar Program

  4. Training vs. Education • The NASA Office of Logic Design works to educate design engineers, not train them. • Training promotes rote responses • Education promotes thinking and the ability to adapt to and cope with new situations. • Hence, MAPLD hosts seminars and not training sessions.

  5. Design Seminars • These case studies are real and are not contrived examples. Many of the leaders have first hand knowledge of these mishaps. • Contribute: Discuss the topics presented, disagree with them, present interesting cases you wish to share, additional lessons, or alternative viewpoints. • Do not sit there quietly and expect to be treated like a cocker spaniel being trained and drilled to emit Pavlovian responses in response to stimuli (bell for dogs, donuts for engineers).

  6. Material • Material will be made available on • CD-ROM • Hardcopy • klabs.org • All public domain, you may use the material as you wish.

  7. I Was Reading AW&ST … Aviation Week & Space Technology, August 23/30, 2004, pp. 29-30

  8. Barto's Law: Every circuit is considered guilty until proven innocent.

  9. A Recent Mishap(that gave me the idea for this seminar)

  10. Background • Popular single board computer • Everything was working fine • Ran vibration test • Unpowered and unmonitored • Subsequently failed to boot intermittently • Testing at manufacturer’s also showed intermittent failures, although at a lower rate than observed at the contractor.

  11. Project’s Corrective Action • Unit (S/N 031) pulled from the flight instrument • New unit (S/N 034) installed in the flight instrument • Repeated testing with the new unit was successful • Signed off, ready for launch

  12. Risk Reduction Effort • Reviewed problem/failure report • No root cause or failure mechanism identified • Conclusion of the Verification and Analysis Section stated: • No direct or indirect evidence given in the “Verification and Analysis” section to support a workmanship issue. • No analysis given to show that the workmanship problem was not systemic to all units. Since the unit is clearly marginal and it is difficult to make fail, it is not shown that other units have sufficient margin to support operation in all operating environments over the design life of the unit. … Each time there was a failure to boot, the power was cycled and the computer subsequently rebooted. The result of the testing at XXXXXX was that the most probable cause of the boot failure was a workmanship issue specific to SN034 and is not endemic to the XXXXXXXX computer and therefore does not affect SN031.

  13. Risk Reduction Effort • Note: the “analyst” consistently remarks that after a failed boot the next power cycle results in correct operation of the board. Yet the board fails multiple times. This is evidence of the “PC mentality” seen in many Projects where, when there is a problem, the solution is to switch the power off and back on to “correct it.” • Contractor and Project claimed repeatedly that the unit was troubleshot and nothing more could be done.

  14. Let’s Take a Closer Look • Examination of failures at manufacturer • The failures reported were a result of test equipment; there was zero failures detected at the manufacturer • Intermittent operation of the computer could not be supported. Electrical environment suspicion grows • “What if” analysis results in a large number of possible failure mechanisms

  15. Let’s Take a Closer Look • Examination of troubleshooting at contractor • Previously claimed fully troubleshot • Examination shows that no oscilloscope probe ever touched the board • Examined at interface points only • Throughout organization “failures to boot” were routine • Many failures reports written over many units. • Contractor did not use available diagnostic signals and port to ascertain status of the CPU and computer

  16. Troubleshooting Again • Contractor fought hard to prevent • Stalled effort for many months • Initial examination showed that the protection signals for the EEPROM memories did not behave as predicted by the analysis • Contractor would not show the analysis • Examination of diagnostic signals quickly showed that the CPU had halted

  17. Troubleshooting Results • Cause of failure determined • Known issue with pipeline timing • Software service routines not installed to handle all conditions • Project previously had assured the independent review that software was installed to handle all conditions • Did not fail at manufacturer since test software installed properly handled the interrupt from the pipelining issue • No support for “a workmanship issue specific to SN034 …” • Flight software rewritten

  18. Lessons and Suggestions • Problem/Failure Reports • Examine original documents. • Request and examine all related P/FRs from all units • Provide direct evidence (at a minimum!) for determination of the cause of failure • Intermittent’s after vibration test led to the conclusion of a workmanship error; the “bad solder joint” was never identified • “Failures” at the manufacturer reinforced the false conclusion as those “failures” were not examined in detail and were a result of a testing error. • Do not conduct reviews in a board room with PowerPoint slides • Pack up your oscilloscope and go into the lab

  19. Enjoy your seminar!

More Related