340 likes | 362 Views
Discovery of Temporal Patterns in Course-of-Disease Medical Data. Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors. Overview. Objective Contributions Approach TEMPADIS Summary and Conclusions. Objective.
E N D
Discovery of Temporal Patterns in Course-of-Disease Medical Data Jorge C. G. Ramirez Ph.D. Candidate Lynn L. Peterson and Diane J. Cook Supervising Professors
Overview • Objective • Contributions • Approach • TEMPADIS • Summary and Conclusions
Objective • Discover patterns that represent groups of patients that had a similar course of disease for a catastrophic or chronic illness • Motivation • Medical • AI
Contributions • Data Preprocessing • Normalization • Learning Missing Data • Learning Implicit Knowledge • Exploratory Analysis • Event Set Sequence Approach
Contributions • Domain Understanding • New perspective on mass of data • Identify groups of patients for further medical study
Approach • Example Events • Laboratory Results • 461 L WBC 2.70 • 461 L HCT 40.10 • 461 L PLT 239.00 • 461 L CD4% 19.00 • 461 L CD4A 188.00
Approach • Example Events • Example Events • Visits • 468 C CV • Diagnoses • 468 D 043.9 AIDS-RELATED COMPLEX, UNSPECIFIED • Pharmacy • 469 P CTM 60 CO-TRIMOXAZOLE DS • 469 P AZT 200 ZIDOVUDINE 100MG
Approach • Event Set Sequences • Events • Value Event: laboratory test result, visit • Duration Event: pharmacy, diagnosis • Event Set is all Events that occur in a window of time • Event Set Sequence is all Event Sets that occur over a long period of time • Event Set Sequences
Approach • Example Event Set • 461 L WBC 2.70 • 461 L HCT 40.10 • 461 L PLT 239.00 • 461 L CD4% 19.00 • 461 L CD4A 188.00 • 468 C CV • 468 D 043.9 AIDS-RELATED COMPLEX, UNSPECIFIED • 469 P CTM 60 CO-TRIMOXAZOLE DS • 469 P AZT 200 ZIDOVUDINE 100MG
Approach • Normalization • Normal for each patient is different • Especially when effected by a catastrophic or chronic illness • Example: CD4A • General Population Normal: 416 - 1751 • Well HIV-positive patient: 200 - 350 • Severely immune-compromised patient: 0 - 50
Approach • Normalization (continued) • Scale to -4…0…+4 • 0 is normal • Each number represents a deviation from normal • 1 and 2 are noticeable but not severe • 3 is severe • 4 is very severe
Approach • Replace Missing Data • Diagnosis data very incomplete • Learn severity of condition from pharmacy data • Induce decision tree to classify conditions
Approach • Create Health Status Categories • = HIV-positive asymptomatic • = Asymptomatic, on anti-HIV therapy • = Immune-compromised, on prophylactic therapy • = Active illness • = Severe active illness
Approach • Learn Implicit Knowledge • Need to augment explicit knowledge • Recovery time is expert’s implicit knowledge • Use neural network to learn recovery time function • 0 = Nothing to recover from • 1-4 = weeks to recover • 5 = 5 or more weeks to recover
Approach • Categorize Pharmacy Data • A myriad of drugs prescribed • Need to understand significance • Categorize by use
Approach • Categories • Nucleoside Analogs • Protease Inhibitors • Prophylaxis Therapies • Intraveneous antibiotics • Anti-virals • Anti-PCP/Toxoplasmosis • Anti-mycobacterials
Approach • Categories (continued) • Anti-wasting syndrome • Anti-fungals • Chemotherapies
Approach • Result: Understandable representation of patient data • 861 C 1.1 26.1 167 0.0 0 16 0 • 862 0.0 0.0 0 0.0 0 0 2 24: 30 38: 50 • 867 H 4.3 19.2 144 0.0 0 11 3 0: 3 22: 1 35: 2 • 868 H 2.2 26.2 144 0.0 0 5 3 0: 3 22: 1 35: 2 • 869 0.0 0.0 0 0.0 0 0 1 35: 60 • 874 C 1.3 32.4 0 0.0 0 17 0 • 889 C 1.1 30.4 154 0.0 0 36 0 • 890 0.0 0.0 0 0.0 0 0 3 22: 30 38: 50 39:480 • 923 0.0 0.0 0 0.0 0 0 1 39:480 • 933 H 3.6 20.4 182 0.0 0 11 3 0: 2 22: 1 39: 12
Approach • Result: Understandable representation of patient data • 861 C 3 1 -4 -3 0 -9 -9 –1 0 0 2 0 0 0 0 0 0 0 • 867 H 4 4 0 -4 -1 -9 -9 –2 0 0 2 0 0 0 1 1 0 0 • 868 H 4 1 -2 -3 -1 -9 -9 –4 0 0 2 0 0 0 1 1 0 0 • 874 C 4 3 -4 -1 -9 -9 -9 0 0 0 2 0 0 0 1 1 0 0 • 889 C 4 2 -4 -2 -1 -9 -9 2 0 0 2 0 0 0 1 1 0 0 • 933 H 4 4 0 -4 0 -9 -9 –2 0 0 1 0 0 0 0 2 0 0
Approach • Result: Understandable representation of patient data • < { (EV C)(HS 3)(RT 1)(WBC -4)(HCT -3)(PLT 0) • (LMPH –1)(onD 0010000000) } • { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT -1) • (LMPH –2)(onD 0010001100) } • { (EV H)(HS 4)(RT 1)(WBC -2)(HCT -3)(PLT -1) • (LMPH –4)(onD 0010001100) } • { (EV C)(HS 4)(RT 3)(WBC -4)(HCT -1) • (onD 00010001100) } • { (EV C)(HS 4)(RT 2)(WBC -4)(HCT -2)(PLT -1) • (LMPH 2)(onD 0010001100) } • { (EV H)(HS 4)(RT 4)(WBC 0)(HCT -4)(PLT 0) • (LMPH –2)(onD 0010000100) } >
Approach • Inexact Match • Use set difference • Partial match, feature by feature • Assumes default partial match for missing data • Use weakest-link/average-link • Require minimum degree of match • Require average degree of match
Raw Target Data Data Cleaning Data Normalization Normalized Database TEMPADIS
Decision Tree Normalized Database Reduced, Knowledge-Added Data Neural Net TEMPADIS
Knowledge-Added Database Sequence Builder Temporal Patterns TEMPADIS
Results • Validation • Results are temporal patterns that demonstrate groups of patients had similar experience during the course of disease • Only medical experts can assess validity of discovered patterns • These results have been validated by the experts in the HIV Clinical Research Group
Results • Given a database of patients followed for 4 to 9 years • Discovered interesting patterns • Interestingness has multiple dimensions • Length • Data that appears in the patterns • Data that does not appear in the patterns
Results • Advanced patients, subject to various OIs • < { (EV C)(HS 3)(RT 0)(WBC 0)(HCT -1)(PLT 0)(LMPH -3) • (onD 0000000000) } • { (EV E)(HS 3)(RT 2)(WBC 3)(HCT -1)(PLT 1)(LMPH 4) • (onD 0000000000) } • { (EV C)(HS 3)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) • (CD4A -1)(LMPH 0)(onD 1010000000) } • { (EV C)(HS 3)(RT 1)(WBC -1)(HCT -1)(PLT 1)(LMPH 2) • (onD 1010000000) } • { (EV E)(HS 3)(RT 1)(WBC 2)(HCT -1)(PLT 1)(LMPH 4) • (onD 0000000000) } • { (EV C)(HS 3)(RT 1)(WBC 1)(HCT 0)(PLT 0)(CD4P -3) • (CD4A -2)(LMPH 0)(onD 1010000000) } >
Advanced patients, fairly stable • < { (EV C)(HS 3)(RT 0)(WBC -1)(HCT -1)(PLT 1)(CD4P -4) • (CD4A -4)(LMPH 0)(onD 0010000000) } • { (EV C)(HS 3)(RT 0)(WBC 0)(HCT 0)(PLT -1)(CD4P -4) • (CD4A -4)(LMPH 0)(onD 1010000000) } • { (EV C)(HS 3)(RT 0)(onD 1010000000) } • { (EV C)(HS 3)(RT 0)(WBC -2)(HCT 0)(PLT -1)(CD4P -4) • (CD4A -4)(LMPH 0)(onD 0010000000) } • { (EV C)(HS 4)(RT 1)(WBC 1)(HCT -4)(PLT 0)(CD4P -4) • (CD4A -4)(LMPH -4)(onD 0011001000) } • { (EV C)(HS 3)(RT 3)(onD 0010000000) } • { (EV )(HS 3)(RT 1)(WBC 0)(HCT 0)(PLT 0)(LMPH 0) • (onD 0000000000) } • { (EV C)(HS 3)(RT 0)(CD4A -4)(onD 0010000000) } >
Asymptomatic period • < { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 1)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV E)(HS 1)(RT 0)(WBC -1)(HCT 0)(PLT 1)(CD4P -1) • (CD4A -2)(LMPH 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(CD4A 0)(onD 0010000000) } • { (EV C)(HS 1)(RT 0)(CD4A 0)(onD 0010000000) } • { (EV E)(HS 1)(RT 0)(WBC 1)(HCT 0)(PLT 0)(CD4P 0) • (CD4A 0)(LMPH 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } • { (EV C)(HS 1)(RT 0)(onD 0000000000) } >
Summary • Nine Steps of KDD • Identify goal • Identify target data set • Data cleaning and preprocessing • Data reduction and projection • Identify data mining method
Summary • Nine Steps of KDD • Exploratory Analysis • Data Mining • Interpretation of Mined Patterns • Acting on Discovered Knowledge
Conclusions • Objective Met with Contributions • Patterns discovered representing groups of patients with similar experience in course of disease • This perspective on the data has not previously been produced • This kind of computation on this kind of data has not previously been produced
Future Work • Improve discovery algorithm • Backtracking is a barrier to overcome • Improve search control • Develop heuristic for measuring interestingness • Add ability to identify clinically identical/similar patterns
Future Work • Move database to new Intelligent Systems in Medicine and Biology Lab • Bring database up to date • Include more domain data in Event Sets • Explore impact of new developments in HIV treatment