190 likes | 435 Views
Software Engineering Disasters. Georges Hatem. Software in our lives, then and now. I think there is a world market for maybe five computers. - IBM Chairman Thomas Watson, 1943 . Medical (processing and analysis, Computer Aided Surgery, other various equipment)
E N D
Software Engineering Disasters Georges Hatem
Software in our lives, then and now I think there is a world market for maybe five computers. - IBM Chairman Thomas Watson, 1943 • Medical (processing and analysis, Computer Aided Surgery, other various equipment) • Financial and business (banking, trading) • Transportation (trains, cars, planes, auto-pilot) • Home (security / fire) • Leisure • Military
Murphy’s law “Anything that can go wrong, will go wrong.”
Previously in CS 577 • Mars 2 Rover crash-landing (1971) • dust storm caused incorrect landing angle computations? • Ariane 5 self-destruct (1996) • Data conversion from 64-bit floating point to 16-bit signed integer: overflow • Cost: $370,000,000 • Therac-25 • Beta radiation overdose (10,000%) • Replacing hardware interlocks with software interlock mechanisms • Frequent overflow in a one-byte counter. Operator input to the machine during overflow causes interlock mechanism to fail due to race condition • 3 deaths, 3 injured • Unrealistic risk assessment, inadequate testing • AMR / Budget Rent-A-Car / Hilton Hotels / Marriott International “Confirm” • Bank of America “MasterNet”
Disasters at the people (not company) level • Panama Radiation Therapy Overdose (2000) • 18 deaths, 10 injured • Double counting, Overreliance on automation • Various military vehicle crashes • Chinook Helicopter Crash, 29 deaths (1994): uncommanded run up and run down of the engines (analysis shows 486 anomalies in 18% of the code) • V-22 Osprey Crash, 4 deaths (2000): software causes aircraft to decelerate when pilot attempts to reset software • Failed missile interception, 28 deaths, 94 injured (1991): system clock • Y2K (2000) • Abbreviating year with 2 digits • $300,000,000,000 cost
Toyota Anti-Lock Brake recalls (2010) • ~150,000 vehicles recalled • Reason: 1 second lag • 60 mph (96.5 km/h) ~90 feet (27.5m) • Enough to cause accidents • Bad PR • $1.1 billion in repairs • $770-880 million in lost sales • Endangering people’s lives Toyota "Moving forward"... even when you don't want to.
Stock Market Flash Crash (2010) • Dow Jones stock market (very closely watched U.S. benchmark indices tracking targeted stock market activity). • Biggest on-day market decline, 998.5 points • Cost: $1,000,000,000,000 • Procter & Gamble, Accenture: shares price down to a penny, or up to $100,000. • Recovered a large amount of the point drop
Cold War Nuclear Missile False Alarm • Very sensitive period • Strategy was an immediate nuclear counter-attack to guarantee “Mutually Assured Destruction” • How it was mitigated: soldier considered it was a computer error • The bug: false alarm created by a rare alignment of sunlight on high-altitude clouds and the satellites’ orbits. • Cost: Nuclear World War 3
What’s next? Just as Thomas Watson couldn’t guess what was coming up in the next 40 years, it is pretty hard for us to estimate how computers and technology will evolve in the near future. However, we know for sure that software systems will get MUCH larger and complex, more tasks will be automated, reliance on software will greatly increase.
Do more testing? Testing will only catch ~80% of the bugs. “Program testing can be used to show the presence of bugs, but never to show their absence!” EdsgerDijkstra
Conclusion: our role • Our responsibility increases as the need for reliability in our system increases • Proper process / practices in architecting, managing risks, developing and testing. • As we were taught in various SE classes (577, 578…) • Good communication between stakeholders • To ensure all sides are talking about the same thing