400 likes | 1.02k Views
Errors, Failures, and Risks. Items to Discuss. Failures and Errors in Computer Systems Case Study: The Therac-25 Increasing Reliability and Safety Dependence, Risk, and Progress. Failures and Errors in Computer Systems.
E N D
Items to Discuss • Failures and Errors in Computer Systems • Case Study: The Therac-25 • Increasing Reliability and Safety • Dependence, Risk, and Progress
Failures and Errors in Computer Systems • Most computer applications are so complex it is virtually impossible to produce programs with no errors • The cause of failure is often more than one factor • Computer professionals must study failures to learn how to avoid them • Computer professionals must study failures to understand the impacts of poor work
Failures and Errors in Computer SystemsIndividual Problems • Billing errors • A 101-year man suddenly had a auto insurance rate that was tripled. • Chicago cat owners were billed for failure to register Dachshunds. • Inaccurate and misinterpreted data in databases
Failures and Errors in Computer SystemsIndividual Problems • Inaccurate and misinterpreted data in databases • 2000 election, Florida had lists that would not allow people to vote because their names were similar to convicted felons. • People could be misrepresented by sex offender databases. • TSA have indicated that more than 30,000 people have been mistakenly matched to terrorist watch lists at airports and border crossings.
Failures and Errors in Computer SystemsIndividual Problems • Inaccurate and misinterpreted data in databases • Large population where people may share names • Automated processing may not be able to recognize special cases • Overconfidence in the accuracy of data • Errors in data entry • Lack of accountability for errors
Failures and Errors in Computer Systems System Failures • AT&T • Lost phone service for voice and data for nine-hours because of error in a 3-line change to a 4-million line program. • Tokyo Stock Exchange, NASDAQ, London Stock Ex. • A software upgrade in Tokyo shutdown the trading floor. • An update to Charles Schwab Corp. virtually shutdown NASDAQ for 2-hours. • London had a software glitch freeze that system for 8-hours the last day of the tax year. • Businesses have gone bankrupt after spending huge amounts on computer systems that failed.
Failures and Errors in Computer Systems System Failures • Voting system in 2000 presidential election because of outdated voting machines. • Denver Airport • This airport is extremely large and the underground baggage system consists of 22-miles of underground track, which refused to function correctly.
Failures and Errors in Computer Systems System Failures • Mars Climate Orbiter misinterpreted altitude data, causing the orbiter crash.
Failures and Errors in Computer Systems System Failures • Ariane 5 Rocket incorrectly sent data to the rocket control system after a hardware upgrade, causing it to immediate loose control after launch.
Failures and Errors in Computer SystemsReasons Denver Airport: • Baggage system failed due to real world problems, problems in other systems and software errors • Main causes: • Time allowed for development was insufficient • Denver made significant changes in specifications after the project began
Failures and Errors in Computer SystemsReasons High-level Causes of Computer-System Failures: • Lack of clear, well thought out goals and specifications • Poor management and poor communication among customers, designers, programmers, etc. • Pressures that encourage unrealistically low bids, low budget requests, and underestimates of time requirements • Use of very new technology, with unknown reliability and problems • Refusal to recognize or admit a project is in trouble
Failures and Errors in Computer SystemsReasons Safety-Critical Applications: • A-320: "fly-by-the-wire" airplanes (many systems are controlled by computers and not directly by the pilots) • Between 1988-1992 four planes crashed • Air traffic control is extremely complex, and includes computers on the ground at airports, devices in thousands of airplanes, radar, databases, communications, and so on - all of which must work in real time, tracking airplanes that move very fast • In spite of problems, computers and other technologies have made air travel safer
Case Study: The Therac-25 Therac-25 Radiation Overdoses: • Massive overdoses of radiation were given; the machine said no dose had been administered at all • Caused severe and painful injuries and the death of three patients • Important to study to avoid repeating errors • Manufacturer, computer programmer, and hospitals/clinics all have some responsibility
Case Study: The Therac-25 Software and Design problems: • Re-used software from older systems, unaware of bugs in previous software • Weaknesses in design of operator interface • Inadequate test plan • Bugs in software • Allowed beam to deploy when table not in proper position • Ignored changes and corrections operators made at console
Case Study: The Therac-25 Why So Many Incidents? • Hospitals had never seen such massive overdoses before, were unsure of the cause • Manufacturer said the machine could not have caused the overdoses and no other incidents had been reported (which was untrue) • The manufacturer made changes to the turntable and claimed they had improved safety after the second accident. The changes did not correct any of the causes identified later
Case Study: The Therac-25 (cont.) Why So Many Incidents? (cont.) • Recommendations were made for further changes to enhance safety; the manufacturer did not implement them • The FDA declared the machine defective after the fifth accident • The sixth accident occurred while the FDA was negotiating with the manufacturer on what changes were needed
Increasing Reliability and Safety What goes Wrong? • Design and development problems • Management and use problems • Misrepresentation, hiding problems and inadequate response to reported problems • Insufficient market or legal incentives to do a better job • Re-use of software without sufficiently understanding the code and testing it • Failure to update or maintain a database
Increasing Reliability and SafetyProfessional techniques • Importance of good software engineering and professional responsibility • User interfaces and human factors • Feedback • Should behave as an experienced user expects • Workload that is too low can lead to mistakes • Redundancy and self-checking • Testing • Include real world testing with real users
Increasing Reliability and SafetyLaw, Regulation and Markets • Criminal and civil penalties • Provide incentives to produce good systems, but shouldn't inhibit innovation • Warranties for consumer software • Most are sold ‘as-is’ • Regulation for safety-critical applications • Professional licensing • Arguments for and against • Taking responsibility
Dependence, Risk, and ProgressRisk and Progress • Many new technologies were not very safe when they were first developed • We develop and improve new technologies in response to accidents and disasters • We should compare the risks of using computers with the risks of other methods and the benefits to be gained
Information, Knowledge, and JudgmentEvaluating Information on the Web • Expert information or ‘wisdom of the crowd’? • Daunting amount of information on the web, much of this information is not correct • Search engines are replacing librarians, but Web sites are ranked by popularity, not by expert evaluation • Wisdom of the crowd - ratings by public of website • If millions participate, the results will be useful
Information, Knowledge, and JudgmentEvaluating Information on the Web • Wikipedia: • Written by volunteers, some posts are biased and not accurate • Although anyone can write, most people do not • Those that do typically are educated and experts
Information, Knowledge, and JudgmentEvaluating Information on the Web • Wisdom of the crowd • Problems of unreliable information are not new • The Web magnifies the problems • Rating systems are easy to manipulate • Vulnerable viewers • Less educated individuals • Children • Responsibilities of site operators • Should identify user-supplied content • Make clear which information has been verified
Information, Knowledge, and JudgmentWriting, Thinking and Deciding • New tools have displaced skills that were once important • Abdicating responsibility • People willing to let computers do their thinking • Reliance on computer systems over human judgment may become institutionalized • Fear of having to defend your own judgment if something goes wrong
Information, Knowledge, and Judgment Computer Models • Evaluating Models • How well do the modelers understand the underlying science or theory? • Models necessarily involve assumptions and simplifications of reality • How closely do the results or predictions correspond with the results from physical experiments or real experience?
Information, Knowledge, and Judgment Computer Models • Why models may not be accurate • We might not have complete knowledge of the system we are modeling • The data describing current conditions or characteristics may be incomplete of inaccurate • Computing power may be inadequate for the complexity of the model • It is difficult, if not impossible, to numerically quantify variables that represent human values and choices
The "Digital Divide”Trends in Computer Access • New technologies only available to the wealthy • The time it takes for new technology to make its way into common use is decreasing • Cost is not the only factor; ease of use plays a role • Entrepreneurs provide low cost options for people who cannot otherwise afford something • Government funds technology in schools • As technology becomes more prevalent, the issues shift from the haves and have-nots to level of service
The "Digital Divide”The Global Divide and the Next Billion Users • Approximately one billion people worldwide have access to the Web; approximately five billion do not • Non-profit organizations and huge computer companies are spreading computer access to people in developing countries • Bringing new technology to poor countries is not just a matter of money to buy equipment; PCs and laptops must work in extreme environments • Some people actively working to shrink the digital divide emphasize the need to provide access in ways appropriate to the local culture
Evaluations of the Impact of Computer Technology The Neo-Luddite View of Computers • Computers cause massive unemployment • No real need (We use technologies because they are there, not because they satisfy real needs) • Computers cause social inequity • Benefit big business and the government • Do little or nothing to solve real problems • Computers separate humans from nature and destroy the environment
Evaluations of the Impact of Computer Technology (cont.) Accomplishments of Technology • Prices of food are down and raw materials are abundant • Real buying power is up • Food supplies and GDP are growing faster than the population • Dramatic impact on life expectancy • Assistive technologies benefit those with disabilities
Making Decisions About Technology The Difficulty of Prediction • Each new technology finds new and unexpected uses • The history of technology is full of wildly wrong predictions