Data Mining of Blood Handling Incident Databases

Data Mining of Blood Handling Incident Databases Costas Tsatsoulis Information and Telecommunication Technology Center Dept. of Electrical Engineering and Computer Science University of Kansas tsatsoul@ittc.ku.edu

Background • Incident reports collected for handling of blood products • An initial database was collected to allow experimentation • Goals: • Allow the generation of intelligence from data • Unique events • Event clusters • Event trends • Frequencies • Simplify the job of the QA • Similar reports • Less need for in-depth causal analysis • Allow cross-institutional analysis

Annual Accidental Deaths in U.S.A.

Institute of Medicine Recommendation November 1999 • Establish a national focus of research , to enhance knowledge base about patient safety • Identify and learn from errors through both mandatory and voluntary reporting systems • Raising standards and expectations through oversight organizations • Create safety systems through implementation of safe practices at the delivery level

Near Miss Event Reporting • Useful data base to study system’s failure points • Many more near misses than actual bad events • Source of data to study human recovery • Dynamic means of understanding system operations

The Iceberg Model of Near-Miss Events • 1/2,000,000 fatalities • 1/38,000 ABO incompatible txns • 1/14,000 incorrect units transfused 1/2,000,000 1/38,000 1/14,000 Near-Miss Events

Intelligent Systems • Developed two separate systems: • Case-Based Reasoning (CBR) • Information Retrieval (IR) • Goal was to address most of the needs of the users: • Allow the generation of intelligence from data • Unique events • Event clusters • Event trends • Frequencies • Simplify the job of the QA • Similar reports • Less need for in-depth causal analysis • Allow cross-institutional analysis

Case-Based Reasoning • Technique from Artificial Intelligence that solves problems based on previous experiences • Of significance to us: • CBR must identify a similar situation/problem to know what to do and how to solve the problem • Use CBR’s concept of “similarity” to identify: • similar reports • report clusters • frequencies

What is a Case and how do we represent it? • An incident report is a “case” • Cases are represented by: • indexes • descriptive features of a situation • surface or in-depth or both • their values • symbolic “Technician” • numerical “103 rpm” • sets “{Monday, Tuesday, Wednesday}” • other (text, images, …) • weights • indicate the descriptive significance of the index

Finding Similarity • Define degrees of matching between attributes of an event report. For example: • “Resident” and “MD” are similar • “MLT,” “MT,” and “QA/QC” are similar • A value may match perfectly or partially • “MLT” to “MLT” • “MLT” to “MT” • Different attributes of the event report are weighted • The sum of the matching attributes with their degree of match and their weights, defines similarity • Cases matching over some predefined degree of similarity are retrieved and considered similar

Information Retrieval • Index, search and recall text without any domain information • Preprocess document • remove stop words • stemming • Use some representation for documents • vector-space model • vector of terms with their weight = tf * idf • tf = term frequency = (freq of word)/(freq of most frequent word) • idf = inverse document frequency = log10((total docs)/(docs with term)) • Use some similarity metric between documents • vector algebra to find the cosine of angle between vectors

CBR for • From the incident report features selected a subset as indexes • Semantic similarity defined • (OR, ER, ICU, L&D) • (12-4am, 4-8am), (8am-12pm, 12-4pm), (4-8pm, 8pm-12am) • Domain-specific details defined • Weights assigned • fixed • conditional • weight of some causal codes based on whether they were established using a rough or in-depth analysis

IR for • No deletion of stop words • “or” vs. “OR” • No stemming • Use the vector space model and the cosine comparison measure

Experiments • Database of approx. 600 cases • Selected 24 reports to match against case base • CBR retrieval - CBR_match_value EXPERIMENT 1 • IR retrieval - IR_match_value EXPERIMENT 2 • Combined retrieval EXPERIMENTS 3-11 • WCBR*CBR_match_value +WIR*IR_match_value • weights range from 0.9 to 0.1 in increments of 0.1 • (0.9,0.1), (0.8,0.2), (0.7,0.3), …, (0.2,0.8),(0.1,0.9) • CBR retrieval with all weights set to 1 EXPERIMENT 12 • No retrieval threshold set

Evaluation • Collected top 5 cases for each report for each experiment • Because of duplication, each report had 10-20 cases retrieved for all 12 experiments • A random case was added to the set • Results sent to experts to evaluate • Almost Identical • Similar • Not Very Similar • Not Similar At All

Preliminary Analysis • Determine agreement/disagreement with expert’s analysis • is a case similar? • is a case dissimilar? • Establish accuracy (recall is more difficult to measure) • False positives vs. false negatives • What is the influence of the IR component? • Are the weights appropriate? • What is the influence of varying selection thresholds?

Results with 0.66 threshold

Results with 0.70 threshold

Combined Results Increasing selection threshold

Some preliminary conclusions • The weights used in CBR seem to be appropriate and definitely improve retrieval • In CBR, increasing the acceptance threshold improves selection of retrievable cases but also increases the false positives • IR does an excellent job in identifying non-retrievable cases • Even a 10% inclusion of IR to CBR greatly helps in identifying non-retrievable cases

Future work • Plot performance versus acceptance threshold • identify best case selection threshold • Integrate the analysis of the second expert • Examine how CBR and IR can be combined to exploit each one’s strengths: • CBR performs initial retrieval • IR eliminates bad cases retrieved • Look into temporal distribution of retrieved reports and adjust their matching accordingly • Examine a NLU system for incident reports that have longer textual descriptions • Re-run on different datasets • Get our hands on large datasets and perform other types of data mining (rule induction, predictive models, probability networks, supervised and unsupervised clustering, etc.)

Data Mining of Blood Handling Incident Databases

Data Mining of Blood Handling Incident Databases

Presentation Transcript

Indexing and Data Mining in Multimedia Databases

Incident Handling

Blood sample handling

CSIRT – Incident handling

Incident Handling

INCident Handling BOF (INCH)

Mining Multidimensional Databases

Incident Handling

Incident Handling in Academia

15-826: Multimedia Databases and Data Mining

Privacy-Preserving Databases and Data Mining

Network Incident Handling

Handling of data from multiple databases

Adventures In Incident Handling

Incident Handling Foundations

15-826: Multimedia Databases and Data Mining

Blood Collection and Handling of Blood Samples

Mining Multidimensional Databases

15-826: Multimedia Databases and Data Mining

15-826: Multimedia Databases and Data Mining