280 likes | 432 Views
Learning from. Disaster. Jack Ganssle. The Tacoma Narrows Bridge. The Tacoma Narrows Bridge 4 months after opening, Nov 7, 1940. Forgotten Failures. Dryburgh Abbey Bridge, Scotland, 1818. Montrose Bridge, Scotland 1838. Menai Strait Bridge, Wales, 1839. Basse-Chaine Bridge, 1850.
E N D
Learning from Disaster Jack Ganssle
The Tacoma Narrows Bridge The Tacoma Narrows Bridge 4 months after opening, Nov 7, 1940
Forgotten Failures Dryburgh Abbey Bridge, Scotland, 1818 Montrose Bridge, Scotland 1838 Menai Strait Bridge, Wales, 1839 Basse-Chaine Bridge, 1850 Roche-Bernard Bridge, France, 1852 Wheeling Suspension Bridge, 1854 Niagara-Lewiston Bridge, 1864 Niagara-Clifton Bridge, 1889 Deer Isle Bridge, 1939 Bronx-Whitestone, 1939
Costs George Golden Bronx- Tacoma Washington Gate Whitestone Narrows Completed 1935 1937 1939 1940 Span 3500 ft 4200 ft 2300 ft 2800 ft Cost $59.5m $35m $19.7m $6.4m
Lessons • Cheaper is often more expensive • Management decisions do not repeal the laws of physics • Not learning from the past means repeating the past – endlessly • Codes are a powerful way to insure projects are done correctly
Clementine • Lessons learned: • Schedules can’t rule • Never sacrifice testing • Tired people make mistakes • Error handlers save systems
NEAR • Lessons Learned: • Tired people make • mistakes. • Use the VCS • Test everything! • Engineers rock! • We must learn from • disaster
Mars Polar Lander/Deep Space 2 • Lessons learned: • Tired people make • mistakes • Test everything! • Test like you fly; • fly what you test
Pathfinder • Lessons learned: • There’s no such • thing as a glitch – • believe your tests! • Error handlers save • systems
Mars Exploration Rover Lessons learned: • Test like you fly; fly what you test • Poor error handler • We must learn from • disaster
Titan IVb Centaur • Lessons Learned: • Test like you fly; • fly what you test • Use the VCS
Ariane 5 • Lessons Learned: • Improve error handling • Assume software can fail • Test everything! • Be careful with ported code
Chinook • Lessons Learned: • Do reviews… • before shipping! • Test like you fly; • fly what you test
Therac 25 • Lessons Learned: • Use tested • components • Use accepted • practices • Use peer reviews
Radiation Deaths in Panama • May ‘01: Over 20 dead patients • Possible to enter data in such a way • to confuse machine; unit prints a • safe treatment plan but overexposes. • Lessons Learned: • Test carefully • Better Requirements • Use a defined process & peer • reviews
Pacemakers • Lessons Learned: • Test everything! • Flash is not a • schedule enhancer
Near Meltdown • Lessons Learned: • Test everything! • Improve error • handling
Lessons Learned: • Be careful with ported code • Blame the engineers Uwatec dive computer (1995) The Challenger
A Hot Day • Lessons Learned: • Test everything!
Lessons Learned: • Choose your IP carefully
Forgotten Failures 2000 – Ford Explorer recall 2000 - Ford Explorer recall 2004 - Grand Prix leap-year glitch 1992 – Crash of only F-22 prototype 2003 – BMW traps Thai politician 2003 – BMW recalls 15000 745is 747, 767, A340 avionics lockups 2003 – Slammer worm attacks nuke 1991 – Patriot missile failure 1974 – Loss of a job for 7 years
Our Criminal Behavior No Peer Reviews Implicated in the Chinook helicopter, Multidata Radiotherapy device, Therac 25. Average uninspected code contains 50-100 bugs per 1000 LOC. Inspections find most of these. Cheaply.
Our Criminal Behavior Inadequate testing Implicated in the Clementine, NEAR, Mars Polar Lander, Pathfinder, Mars Expedition Rover, Titan IVb, Ariane, Sea Launch, Chinook, Therac 25, Multidata, pacemakers, Los Alamos incident, huge digital thermometer. Ignoring or cheating the VCS Implicated in the NEAR, Pathfinder, Titan IVb, EFF, and FAA incidents.
Our Criminal Behavior Lousy error handlers Implicated in the Ariane, Los Alamos incident, Clementine, Yorktown, Mars Expedition Rover, and many others This means adopting a culture of anticipating and planning for failures! And for FPGA users it means adopting a philosophy that things do fail!
Our Criminal Behavior The use of dangerous tools! • C (worst) 500 bugs/KLOC • C (average) 167-26 • ADA (worst) 50 • ADA (average) 25 • SPARK (average) 4
The Boss’s Criminal Behavior Schedules can’t rule: Corollary: Tired people make mistakes Implicated in the Clementine, NEAR, Mars Polar Lander and many others
The Boss’s Criminal Behavior Be wary of financial shortcuts! Implicated in the Takoma Narrows Bridge, Ariane, MGM fire, and many others Reuse is not a panacea Implicated in the Ariane, Uwatec and many others. Reuse is extremely difficult. See “Confessions of a Used Program Salesman” by Will Tracz
Are we criminals? Or are we still in the dark ages? But there’s a lot we do know, so we’re negligent – and will be culpable – if we don’t consistently use best practices.