330 likes | 468 Views
Computational Methods for Finding Patterns of Human and System ‘Failure’ in Mishap Reports. Chris Johnson University of Glasgow, Scotland. http://www.dcs.gla.ac.uk/~johnson UCD: 12 th December 2003. Johnson, Le Galo and Blaize;
E N D
Computational Methods for Finding Patterns ofHuman and System ‘Failure’ in Mishap Reports Chris Johnson University of Glasgow, Scotland. http://www.dcs.gla.ac.uk/~johnson UCD: 12th December 2003
Johnson, Le Galo and Blaize; European Incident Reporting Requirements in Air Traffic Management, EUROCONTROL, 2000.
bad good
“Centers and contractors used Problem Reporting and Corrective Action database differently, preventing comparisons across the database. NASA safety managers complain that the Web Program Compliance Assurance and Status System is too cumbersome. Personnel use Lessons Learned Information System only on an ad hoc basis. Hazard reports rarely communicated effectively, nor are databases used by engineers and managers capable of translating operational experiences into effective risk management practices. (CAIB, p.189)
Probabilistic information retrieval: • Avoids problem of codification; • But issues of precision and recall. • Conversational case based reasoning: • Extended form of US Navy’s NACODAE system; • Flexible precision & recall. • Word sense disambiguation etc.
Meta-Level Concerns for Aerospace FAA GAIN lacks computational support.Someone must address this opportunity…
Linda, JavaSpaces and Middleware for Incident Reporting • Concurrency and distribution … <B777, 1/12/2003, “On final approach…”> <B737, “Maintenance failure on …”> … <A320, 12/12/2003, “ATC came through…”> <A320, “No clearance…”> Australia UK … < “Weather poor but …”> US
Linda, JavaSpaces and Middleware for Incident Reporting • Overloading of matching operators <A320, ?, ?> <?, ?, match(CRM)> … <B777, 1/12/2003, “On final approach…”> <B737, “Maintenance failure on …”> … <A320, 12/12/2003, “ATC came through…”> <A320, “No clearance…”> Australia UK … < “Weather poor but …”> US
Linda, JavaSpaces and Middleware for Incident Reporting • Leases and persistence <A320, ?, ?> <?, ?, match(CRM)> … <B777, 1/12/2003, “On final approach…”> <B737, “Maintenance failure on …”> … <A320, 12/12/2003, “ATC came through…”> <A320, “No clearance…”> Australia UK … < “Weather poor but …”> US
Case Study 1: FDA Telemedicine • Medical errors lead to: • 45,000-100,000 deaths (US). • RTA=43,000, Aids=16,000. • Additional care $15 billion: • 45% have some mishap. • 17% prolonged hospital stay. Look, I’m not blaming you, I’m just suing you…
SE Virginia medical centres: • 1 nurse monitors system; • 49 remote patients; • 5 ICUs at 3 centres. • Staff 50-80% of ICU budget. Courtesy: NASA Telemedicine Instrumentation Pack project Courtesy: Univ. of Virginia, Office of Telemedicine
Master Event Data File Format Identifier E: Professional information A: MDR Report Identifier B: Event Information F: Distributor Information H: Device Information G: Manufacturer Information Master Event Data File, Section A: MDR Report Identifier MDR Report Key MDR Event Key Report Number Source Code Number of devices Number of patients Date received Master Event Data File, Section G: Manufacturer Information MDR Report Key Manufacturer’s Name Manufacturer’s Address Source Type Date Manufacturer received report Master Event Data File, Section H: Device Information MDR Report Key Made when? Single use device? Remedial Action Use code Correction number Event type Device Data File MDR Report Key Device Event Key Device Seq. Number Device available for examination? Brand Name Generic Name Age? … Patient Data File MDR Report Key Patient Seq. Number Date report received Sequence and treatment Patient Outcome Text Data File MDR Report Key Text key Text type Patient Seq. number Report date Text
Findings from MAUDE: Safety Culture and Telemedical Mishaps • Introduction of telemedicine implies: • less clinical staff more technical staff; • technical staff don’t understand devices/procedures? • Increasing reliance on vendor’s guidance: • vendors in turn rely on manufacturers; • communication often breaks down or is too slow. • No common ‘safety culture’; • many incidents stem from poor communication; • Strong parallels with NASA (CAIB Chapter 7).
Cluster 1: Configuration • EASITM software provides 12-lead ECG data on 5-leads to patient. TECH NOTED EASI 12-LEAD DISPLAY ON CENTRAL STATION FROM TRANSMITTER THAT WASNT EASI CAPABLE. CUSTOMER REPLACED TRANSMITTER, RELOADED CENTRAL STATION SOFTWARE, CONFIRMED ALL SIGNALS WERE CORRECTLY TRANSMITTED AND LABELED. CUSTOMER DID NOT UNDERSTAND DIFFERENCE BETWEEN STANDARD ECG AND EASI. CUSTOMER WAS RETRAINED TO FURTHER THEIR UNDERSTANDING OF DIFFERENCE. (MDR TEXT KEY: 1379795) • Less electrodes reduce work for nurses, improves patient comfort.
Cluster 1: Configuration • Social implications: clinicians and support rely on suppliers’ explanations. • Symptomatic of system safety problems: • manufacturers gain insights that should be caught earlier in development. • Retraining is proposed, no idea of systemic causes of human ‘error’? DURING INVESTIGATION, ENGINEERS CONFIGURED A SYSTEM IN SAME SETUP AS CUSTOMER. FOUND MAINFRAME RECEIVERS CAN RECEIVE INCORRECT BIT TO MISIDENTIFY TRANSMITTER AS EASI CAPABLE… • Report doesn’t state how to prevent mis-configuration.
Cluster 2: Sub-contractors • End-user frustrated by device unreliability and manufacturers’ response: SEVERAL UNITS RETURNED FOR REPAIR HAD FAN UPGRADES TO ALLEVIATE TEMP PROBLEMS. HOWEVER, THEY FAILED IN USE AGAIN AND WERE RETURNED FOR REPAIR… AGAIN SALESMAN STATED ITS NOT A THERMAL PROBLEM ITS A PROBLEM WITH X’s Circuit Board. X ENGINEER STATED Device HAS ALWAYS BEEN HOT INSIDE, RUNNING AT 68⁰C AND THEIR product ONLY RATED AT 70⁰C…. ANOTHER TRANSPONDER STARTED TO BURN…SENT FOR REPAIR. SHORTLY AFTER MONITOR BEGAN RESETTING FOR NO REASON…(MDR TEXT KEY: 1370547) • Manufacturers felt reports not safety-related: • “reports relate to end-user frustration regarding product reliability (not safety)”.
Cluster 2: Subcontractors • Telemedicine applications developed by groups of suppliers: • flexibility and cost savings during development, manufacture, marketing; • problems if incidents stem from sub-components not manufactured by suppliers; • incident reports must be propagated back along the supply chain. • Manufacturer states problems stem from subcontractors circuit board: • more problems after faulty board replaced, customer returns unit again; • connectors to PCB not properly seated but still passes acceptance test? • connector not seated completely during initial repair and gradually loosens over time?
Cluster 2: Subcontractors • “Fly-fix-fly” approach undermines attempts to improve patient safety. • Confused dialogue between clinician, vendor, manufacturer… • End-user may see technical issues as form of excuse (eg PCB connectors)… • Device repairs not only rectify problems, they introduce new ones: • compounds end-user uncertainty and distrust of device reliability; • communication fails and shared safety culture erodes over time.
Cluster 3: Modification Induced Bugs IN SOFTWARE RELEASE VF2, IF PATIENT IN "AUTOADMIT" MODE, PARAMETER DATA AUTOMATICALLY COLLECTED AND STORED IN THE SYSTEMS DATABASE, IF THE PATIENT LATER REMOVED (BUT NOT DISCHARGED) FROM ORIGINAL BED/NETWORK LOCATION, DATA COLLECTION TEMPORARILY DEACTIVATED (EG DURING MOVE FOR TREATMENT). PROBLEM OCCURS WHEN NEW PATIENT ADMITTED TO SAME BED/NETWORK LOCATION BUT ORIGINAL PATIENT NOT DISCHARGED WHILE CONNECTED TO THAT LOCATION. NEW PATIENT ADMISSION STORES DATA IN DATABASE CORRECTLY. HOWEVER, IN PARALLEL, INCORRECTLY APPENDS NEW PATIENT DATA ON TOP OF OLD PATIENT'S RECORD… (MDR TEXT KEY: 1340560)
Safety Culture and Telemedical Mishaps • Software identifies 40-50% more US telemedical mishaps in 6 months. • Analysis of reports suggests no ‘quick fixes’ but: • Regulators need to focus on dialogue between manufacturers and users; • Consider detailed training requirements for telemedicine before approval; • Especially look at end-user maintenance and configuration issues; • Introduce training in safety and risk management for support staff? • Joint US/UK AHRQ presentation in Washington. • Things are only going to get worse…
Da Vinci, 1st robotic aid approved by the FDA: • New York Presbyterian Hospital uses it on atrial septal defects.
Cluster 1: Programming Errors • Pilot didnt check 1st Officer programming FMC. • “ATC informed us we were off course ... it took minutes to figure out what happened. ATC vectored us back onto departure and gave us a climb clearance. ATC also pointed out traffic, but we never saw it. We arent sure if our error caused a conflict. • First Officer programmed FMC. I checked the Route Page to see if it matched our clearance. It showed correct departure and transition. I did not check Legs Pages to see if all fixes were there. I will next time! • We made an error programming the FMC, then became complacent… I should have done a more complete check of the First Officer's programming”
Cluster 1: Programming Errors • Computer flight plan was route ABC. • ATC clearance was via route D-E-F. • Original flight plan should have been destroyed, so as not to accidentally revert to old route. • First Officer very experienced and I had complete trust that he was capable of loading correct waypoints, but both he and I failed to use a visible method of marking the computer flight plan. • 99% of time, cleared route is same as computer flight plan, but not always, as I found out the hard way. ATC caught my error”.
Cluster 1: Programming Errors • Container ship grounds, same route every week. • 4 deck officers, good visibility, 2 radars and GPS. • Charts had courses in black ink, couldnt be erased. • At 0243 altered course to 237°, position plotted. • 45 minutes later, ship grounds at full speed. • Watch officer set auto steering to wrong course. • 237 next to reciprocal 157 for return voyage.
Cluster 2: Warnings as Safety Nets • During the descent, we were doing some HF radio checks, and forgot to arm the altitude select mode on the flight director. As a result, we descended through our altitude.... • We promptly returned to FL280. As a crew, we are very diligent and disciplined about altitude assignments. • But in this case, because our attention was diverted from the task at hand, we flew through our assigned altitude. It was that classic trap: both crew members distracted by something and nobody flying the airplane.
Cluster 2: Warnings as Safety Nets • 3 on fishing vessel, 2 cook, pump bilges, maintain watch. • Skipper asleep on the deck of the wheelhouse. • Vessel’s planned track 0.35 miles from a rig. • Automated radar alarm system set to 0.3 miles. • VHF off; skipper said too much distracting traffic. • Rig ask stand-by safety vessel for help, alongside boat. • Nobody on bridge or deck even after sounding horns. • ‘Abandon platform stations’ as precautionary measure. • Skipper protests on being wakened, “under control”. • Radar warning system is a safety net or final safeguard.
Conclusions • Must make better use of lessons learned systems. • Use Tuple Space and IR to search for key issues: • distributed and persistent architectures for retrieval; • avoids need for standardised formats; • can be used within and between industries. • Caveats: • does it tell us anything new? • how valid are inter-industry comparisons? • how do we get from clusters to recommendations?