560 likes | 713 Views
Developing Medical Software: Pitfalls and Prophylactics. Elliot Jaffe Seminar in Computer Assisted-Surgery, Medical Robots and Medical Imaging Fall 2002. Outline. Why should you be worried? Case Study: Therac-25 US Government Guidelines. What? Me worry?.
E N D
Developing Medical Software: Pitfalls and Prophylactics Elliot Jaffe Seminar in Computer Assisted-Surgery, Medical Robots and Medical Imaging Fall 2002
Outline • Why should you be worried? • Case Study: Therac-25 • US Government Guidelines
What? Me worry? • Software is used in medical devices • Monitoring • Planning • Surgery • Visualization • Software fails
Case Study: Therac-25 • 1983 – 1987 • AECL: Atomic energy of Canada Ltd. • 6 reported “accidents” • Changed the way software is developed and verified as part of a medical device
Medical Linear Accelerators Linac: North Oakland Medical Center
Therac-25 genesis • Therac-6: 6MeV X-ray accelerator • Therac-20: 20MeV Dual Mode (Electron/X-ray) accelerator • Upgraded with Dec PDP-11 minicomputer for ease of use • Could be operated without computer
Therac-25 • Dual Mode 25MeV accelerator • Electron/X-Ray • Can be operated ONLY through the computer • Computer controls and monitors system • Some hardware safety mechanisms and interlocks were replaced with software • First working prototype: 1976 • First commercial product: 1982
Treatment Goals • Deliver high energy radiation for the treatment of cancer • Radiation needs to be focused and controlled • Multiple energy levels • X-Ray • Electron
Therac-25 Operation • Turntable to select from three modes • Visual • Electron • X-Ray • Turntable is moved mechanically • Software monitors position of turntable
Operator Interface Cursor should be here during operation
Therac-25: Error States • Treatment Suspend • Requires complete machine restart • Treatment Pause • Operator types “P” to proceed
Therac-25: Error Messages • HTILT, VTILT, etc. • MALFUNCTION <n> • 1 <= n <= 64 • No documentation • No indication of severity • Occurred on average 40 times a day!
Therac-25: Event #1 • June 1985: 10MeV electron treatment • Patient reported: “tremendous force of heat … this red-hot sensation” • Technician replied that it was impossible • AECL claimed it was impossible • Never reported to FDA
Therac-25: Event #1 • Patient received severe radiation burn • Patient’s breast was removed • Shoulder and arm was paralyzed • AECL refused to believe that it was caused by Therac-25 • Lawsuit settled out-of-court
Therac-25: Event #2 • July 26, 1985 • HTILT message, Treatment Pause • Operators resumed treatment • Repeated 5 times until machine stopped • Patient reported “electric tingling shock”
Therac-25: Event #2 • Patient died of cancer • Autopsy revealed that a total-hip replacement would have been required due to radiation exposure • Reported to AECL, FDA • AECL believed it to be a hardware problem
Therac-25: Event #2 • AECL could not reproduce the reported behavior • AECL modified turntable • “Fixed” potential error in 3-bit turntable location identifier
Therac-25: Event #2 • AECL claimed: • “analysis of the hazard rate of the new solution indicates an improvement over the old system by at least 5 orders of magnitude”
Therac-25: Event #3 • December 1985 • After upgrade from event #2 • Patient developed parallel striped pattern in treatment area • AECL reported: “Could not have been produced by any malfunction of the Therac-25 or by any operator error.” • Not reported to FDA • Patient required surgery to repair tissue damage
Therac-25: #4 • March 21, 1986 • Operator entered “x” instead of “e” • Moved cursor and corrected error • Began treatment • MALFUNCTION 54 • Continued Treatment • MALFUNCTION 54 • Machine shutdown
Therac-25: Event #4 • Patient monitors: video and audio were broken • Patient received electric shock, started to get up and was then shocked in the arm • Patient pounded on treatment door • Patient sent home • Machine checked out ok
Therac-25: Event #4 • Patient died of overdose 5 month later • AECL suggested an electrical problem in the area • Independent engineering firm checked and found no problem
Therac-25: Event #5 • April 11, 1986 • Same operator • Same editing • MALFUNCTION 54 • Audio monitor (now working) reported a loud sound from machine • Patient died May 1, 1986 (three weeks later) of acute high-dose radiation to his brain
Therac-25: Event #5 • Physicist took machine out of service • Reported to AECL • Operator and Physicist were able to reproduce the failure • AECL still could not reproduce the failure • FDA declares system “defective”
Therac-25: Event #5 - cause • Operating system was a hand-coded real-time system developed by one programmer in the 1970’s. • Problem was traced to race condition in the main loop • Result was that x-ray beam could be used through the electron magnet
Therac-25: Event #6 • January 17, 1987 • Operator set turntable to field light position • Gave command to system to “set” turntable to x-ray • Ran treatment • System reported “no dose or dose rate” • Re-ran treatment • Patient died in April, 1987 of problems related to overdose • AECL and FDA notified
Therac-25: Event #6 - cause • Software bug • Register overflow • 8 bit register used for multiple purposes • Once or twice in each setup phase, the register overflows, allowing the system to think that the turntable was reset
Lessons Learned • Studies reported 12 lessons learned • We will cover five of them
Overconfidence in Software • First safety analysis did not include software, even though it was responsible for safety of the system • When problems did occur, it was assumed to be a hardware failure
Reliability vs. Safety • Therac-25 ran for three years in production without a problem • Tens of Thousands of patients were treated before the first known overdose • Reliability leads to complacency • Reliability != Safety
Lack of Defensive Design • Software was designed for small memory footprint • Self Checks, Error Detection, Error handling and Auditing was left out
Unrealistic risk assessment • First Risk Assessment did not include software • AECL claimed 5 orders of magnitude improvement from changing one microswitch • Software is harder to assess for failures than hardware
Inadequate Software Engineering Practices • Software specification was after-the-fact • Dangerous design/coding practices could have been avoided • Audit trails should be built into the production software • Software should be tested at the unit, module and system level • Regression testing on all changes • GUI should be designed, not implemented
Software Reuse • Therac-25 used software from T-20 • Reliability != Safety • Assumptions and Preconditions may have changed • Sometimes its better to rewrite from scratch
US Government Guidelines • Significantly reduce the risk of death or injury • Impose standards and best practices to raise the overall level of the industry • Define minimum requirements for • New products • Derivative products
Level of Concern • Major: device directly affects the patient or operator and failure could result in death or serious injury • Moderate: device directly affects the patient and failure could result in non-serious injury • Minor: failures will not result in injury
Levels of Concern • Does the software • Control life support device? • Control delivery of harmful energy? • Control treatment delivery? • Provide diagnosis as basis for treatment? • Monitor vital signs? • If no to all these questions, then concern is minor
Requirements for minor concern • Software Description • Device Hazard analysis • Software functional Requirements Specification • Architecture Design chart • Validation, Verification and Testing • Release Version Number
Requirements for Moderate/Major concern • Full Software Requirements Spec. • Design Specification • Traceability analysis • Development lifecycle documentation • Configuration management • Maintenance activities • Revision Level History • Unresolved Anomalies (bugs)
Software Requirements Spec • Hardware requirements • Programming languages • Interface requirements • Software functional requirements • Software performance requirements
Software Requirements Spec • Algorithms for therapy, diagnosis, monitoring, alarms, analysis, interpretation (with supporting clinical data) • Device limitation due to software • Internal software tests and checks • Error and interrupt handling
Software Requirements Spec • Fault detection, tolerance and recovery characteristics • Safety requirements • Timing and memory requirements • Use of off-the-shelf software
Risk/Hazard Analysis Tools • Fault Tree Analysis (FTA) • Used in initial design phase • Failure Modes Effect and Criticality Analysis (FMECA) • Used in design and development phase • Failure Reporting and Corrective Action System (FRACAS) • Used during product lifecycle
Fault Tree Analysis • Identify a failure or safety hazard, then attempt to identify all possible ways to create that hazard • Answers the question: • How can event X occur? • Used in Military and Nuclear Industry since the 1970’s
Fault Tree Analysis: Example Simplified fault tree diagram for an infusion pump
Fault Tree Analysis • Demonstrates that the system will not reach an unsafe state • Identifies areas for improvement • Provides a systematic hazard review
FMEA • Assume a basic defect at the component level, assess the effect and identify potential solutions • Answer the question: • What happens if event X occurs? • Used in Automobile manufacturing
FMEA: Example FAILURE MODE AND EFFECTS ANALYSIS (FMEA) Subsystem/Name: DC motor P = Probabilities (chance) of Occurrences Model Year/Vehicle(s): 2000/DC motor S = Seriousness of Failure to the Vehicle D = Likelihood that the Defect will Reach the customer R = Risk Priority Measure (P x S x D) 1 = very low or none 2 = low or minor 3 = moderate or significant 4 = high 5 = very high or catastrophic