400 likes | 643 Views
Carol Friedman, PhD Department of Biomedical Informatics Columbia University. Discovering Novel Adverse Drug Events Using Natural Language Processing and Mining of Electronic Health Records. Motivation: Severity of Problem. Clinical trials do not test a broad population
E N D
Carol Friedman, PhD Department of Biomedical Informatics Columbia University Discovering Novel Adverse Drug Events Using Natural Language Processing and Mining of Electronic Health Records July 21 - AIME 2009
Motivation: Severity of Problem • Clinical trials do not test a broad population • Adverse Drug Events (ADEs) world-wide problem • *Expense from ADEs is $5.6 billion annually • *Estimated that over 2 million patients hospitalized due to ADEs • *ADEs are fourth leading cause of death *In US alone July 21 - AIME 2009
Motivation: Limitations of Approaches • Manual review of case reports (Venulet J 1988) • Spontaneous reporting to designated agency (Evans JM 2001; Eland IA 1999; Wysowski DK 2005) • Serious ADEs reported less than 1-10% of time • Reporting is voluntary for physicians/patients • Recognition of ADEs is highly subjective • Difficult to determine cause of ADE • Biased by length of time on market and other factors • Cannot determine number of patients on drug or percent at risk • Drug prescribing/claims data (Hershman D 2007; Ray WA 2009) July 21 - AIME 2009
Severity of Under Reporting Study showed 87% of time physicians ignored patient reports of known ADEs (Golumb et al. Physicians response to patient reports of adverse drug effects. Drug Safety 2007) July 21 - AIME 2009
Related Work • Automated methods mainly based on spontaneous reporting databases • Most methods use (Evans SJ 2001; Szarfman A 2002) • Surrogate observed-to-expected ratios • Incidence of drug-event reporting compared to background reporting across all drugs and events • Some research aimed at improving effectiveness of SPR databases • Create ontology of higher order adverse events • MedDRA • Avoid fragmentation of signal July 21 - AIME 2009
Related Work • Pharmacoepidemiology databases used to confirm suspicions • General practice research database (GPRD) (Wood & Martinez 2004) • New Zealand Intensive Medicines Monitoring (IMMP) (Coulter 1998) • Medicine Monitoring Unit (MEMO) (Evans et al. 2001) • EHR databases used to find signals (Brown JS et al. 2007; Berlowitz DR et al. 2006; Wang X et al. 2009) • Mainly coded data used • Has potential for active real time surveillance • Should reduce biased reporting July 21 - AIME 2009
Related Work • Consortiums involving multiple EHRs • EU-ADR project (http://www.alert-project.org/) • eHealth initiative (http://www.ehealthinitiative.org/drugSafety/) • Related work using EHR to detect known ADEs – not aimed at discovering novel ADEs (Bates DW 2003; Hongman B 2001) July 21 - AIME 2009
Text notes • Applications • Decision support • Patient Safety • Acquire knowledge • Discovery • Guidelines • Surveillance • Patient management • Clinical Trial • recruitment • Improved • documentation • Quality assurance primary care special- ties admit history inpatient progress Labs bun 83 Executable Data Centralized Data inr 1.3 hct 22 … … Orders lasix … pepcid … … … Exploiting the Electronic Health Record D A T A NLP + Integration July 21 - AIME 2009
The Electronic Health Record (EHR) • Rich source of patient information • Mostly untapped • Primary use for EHR • Documenting care in multi-provider environment • Manual review by providers • More complete than coded ICD-9 codes • Symptoms • Clinical conditions not beneficial for billing • Fragmented • Heterogeneous • Noisy July 21 - AIME 2009
Research Opportunities: NLP Issues • Occurrence of clinical events in natural language • Drugs, diseases, symptoms • Temporal information is critical • Irregularity of reports • Section headings important but abbreviated/missing • Use of indentation, lists, run on sentences • Tables & semi-structured data in reports • Abbreviations • 2/2 meaning secondary to • co meaning cardiac output or complaining of • Mapping terms in text to an ontology/controlled vocabulary • infiltrate in chest x-ray means chest infiltrate • ontology terms more limited than language July 21 - AIME 2009
Research Opportunities: Statistical Issues • Find associations between drug, symptoms, and diseases • Not explicit in EHR • Large volumes of data • Statistical significance vs. clinical significance • Statistical associations – not relationships • Drug treats condition / Drug causes condition • Integrating time sequences is important • For treats: condition must precede drug event • For causes: drug event must precede condition July 21 - AIME 2009
Research Opportunities: Statistical Issues • Confounding (indirect associations) • Metolazonetreatsheart failure (HF) • HF is manifested by shortness of breath (SOB) • Metolazone and SOB indirectly related • Higher order associations • Drug interactions: Drug1, drug2, condition • Drug-contraindications: Drug, disease, condition • Rare ADEs July 21 - AIME 2009
Other Research Opportunities: Knowledge Acquisition • Structured Knowledge bases • UMLS relations (may_be_treated_by) • Proprietary ones – usually unavailable • Text/Semi-Structured Knowledge (need NLP) • Spontaneous reporting databases: indications, drugs, adverse events • Literature (Medline) • Web sites (WebMD, Micromedix) • Online medical textbooks • Claims Data (Health IT payors) July 21 - AIME 2009
Text Mining for Knowledge Acquisition • Statistical methods: co-occurrences • Discovered associations between diseases and diets from literature (Weeber M 2002) • Identified disease candidate genes ( Hristovski D 2005) • NLP systems • Trends in medications based on the literature and narrative clinical reports (Chen ES 2007, 2008) • Semantic relations in the literature (Hristovski D 2006) July 21 - AIME 2009
MedLEE NLP Standardize & integrate EHR Selecting & filtering Detect associations Narrative records Coded data Eliminate confounding Medical knowledge ADE Signals Overview of Our NLP-EHR based Pharmacovigilance System July 21 - AIME 2009
MedLEE NLP Narrative records Standardize & integrate EHR Selecting & filtering Detect associations Coded data Eliminate confounding ADE Signals Medical knowledge Natural Language Processing of EHR July 21 - AIME 2009
Meds: Tegretol xr Zocor All: Several sz meds PMHx: sz d/o - well controlled on tegretol high chol - on zocor CAD - 60% lesion in LADM by cath MR - secondary to mitral prolapse PSHx: rib fx in 2001, shoulder fx secondary to trauma Vitals: 130/80 12 80 A/P: 54 y/o m with mult med problems, all relatively well controlled. Pt sz free, not anemic as of 2/2003. Concerned of MR and its possible long term effects. July 21 - AIME 2009
Coded Output from NLP med:tegretol xr sectname>> report medication item code>> UMLS:C0592163_Tegretol XR med:zocor sectname>> report medication item code>> UMLS:C0678181_Zocor ......... problem:mitral valve regurgitation sectname>> report past history item code>> UMLS:C0026266_Mitral Valve Insufficiency …….. problem:rib fracture date>> 2001 sectname>> report past history item July 21 - AIME 2009
Coding Issues • Not all conditions have codes • Non-communicative • Some conditions are combinations of codes • Difficulty sleeping • Vascular injury • Granularity of coding system • Many different codes for a concept Asthma: asthma exacerbation, asthma disturbing sleep, moderate asthma, suspected asthma, … July 21 - AIME 2009
Coded data EHR Narrative records Standardizing Coded Data MedLEE NLP C0744727: low hematocrit Standardize & integrate HCT:20 Selecting & filtering Detect associations Eliminate confounding ADE Signals Medical knowledge July 21 - AIME 2009
Standardizing Coded EHR Data:Laboratory Tests and Medications • Lab values denoting normal/abnormal vary • Abnormal range may depend on age, sex, ethnicity, weight • Change in lab values and duration must be considered • Standardizing medications is complex & requires additional knowledge • Tradename to generic (Avandia rosaglitazone) • Handling of combination medications • 1.5% Lidocaine with 1:200,000 Epinephrine • Handling of dose & Route • Diazepam 2 MG Oral Tablet July 21 - AIME 2009
MedLEE NLP Standardize & integrate EHR Selecting & filtering Detect associations Narrative records Coded data Eliminate confounding Medical knowledge ADE Signals Selecting and Filtering • Select using UMLS classes • (diseases, medications) • Filter out: • negations, past info, … • wrong time order July 21 - AIME 2009
Selecting and Filtering • Dependence on accuracy of semantic classification • UMLS classification errors - Finding: birth history, cardiac output, divorce + Finding: cardiomegaly, fever • Temporal information difficult to obtain • An adverse drug event should only follow drug event • Processing of explicit time information is complex and vague • Yesterday, last admission, 2/5 • Information typically occur in reports without dates July 21 - AIME 2009
MedLEE NLP Standardize & integrate EHR Selecting & filtering Detect associations Narrative records Coded data Eliminate confounding Medical knowledge ADE Signals Detect Associations • Obtain event frequencies • Co-occurrence frequencies • Form 2x2 tables • Calculate associations July 21 - AIME 2009
Detect Associations • Correct temporal sequence is critical • Drug event should precede adverse event • Dates are not usually stated along with events • Section of reports helpful surrogate • Statistical associations correspond to different clinical relations • For pharmacovigilance: • Want drug causes adverse event • Confounding caused by dependencies in data July 21 - AIME 2009
Confounding Interdependencies Disease Manifested by Treats Adverse Event Drug Cause_ADE July 21 - AIME 2009
Confounding Interdependencies HD SOB ML ML: Metolazone; HD: Hypertensive Disease; SOB: Shortness of Breath July 21 - AIME 2009
Drug Associations Network Rx1-n treatment association ADE treatment Sx1-n Sx Rx association ADE process treatment process process process Dx1-n Dx association July 21 - AIME 2009
MedLEE NLP Standardize & integrate EHR Selecting & filtering Detect associations Narrative records Coded data ADE Signals Reduce Confounding Eliminate confounding Medical knowledge July 21 - AIME 2009
Reduce Confounding • Collect knowledge from external sources and associations • Drug-treat-disease • Disease-manifested by-symptom • Drug-interacts with-drug • Use Information theory • Mutual Information (MI) • Data processing inequality MI3 < (MI1,MI3) Disease MI2 MI1 Adverse Event Drug MI3 July 21 - AIME 2009
Initial Study: Methods • 6 drugs chosen • Ibuprofen, Morphine, Warfarin: longtime on market with known ADEs • Bupropion, Paroxetine, Rosiglitazone: ADEs discovered after 2004 • 1 drug class: ACE inhibitors • 25,074 textual discharge summaries in 2004 from NYPH processed using MedLEE NLP • Reference standard created using expert knowledge sources • Drug-potential ADE pairs determined • Recall/precision calculated • Qualitative analysis performed to classify drug-potential ADE pairs detected July 21 - AIME 2009
Initial Study: Results • Quantitative • recall (.75), precision (.30) • Qualitative analysis: potential drug-ADE pairs • Known drug-ADEs: 30% • Drug-indication pairs: 30% • Remote drug-indication pair: 33% • Unknown clinical associations: 6% July 21 - AIME 2009
Confounding Interdependencies Disease Disease2 Manifested by Treats Adverse Event Drug Cause_ADE July 21 - AIME 2009
Study 2: Reduction of Confounding • Evaluation set • 14 associations related to 2 drugs from Study 1 • Reference standard • Drug-ADE associations determined and MI, DPI used to automatically classify them July 21 - AIME 2009
Results • Precision • 0.86 when handling confounding • 0.31 when without handling confounding July 21 - AIME 2009
Discussion: Limitations& Future Directions • Mutual information only strategy to handle confounding • More complex MI strategy will be explored • Other statistical/knowledge based methods will be explored • Inpatient data only/sicker patient population • The same methods could be used for outpatient data as well - possibly more noisy • Drug dosage, drug-drug and more complex interactions should be explored July 21 - AIME 2009
Discussion: Limitations& Future Directions • Small evaluation data set • More comprehensive evaluation • Limitations inherent from NLP, coding, association detection • Limitations due to fragmented/incomplete patient data July 21 - AIME 2009
Summary • Need for more pharmacovigilance research • Based on the EHR • Using available databases and text • Studies demonstrated promising results • Many interesting research opportunities • Natural language processing • Statistical methods • Integrating different sources of data • Gathering knowledge from different sources • Automated knowledge acquisition for evidence based medicine July 21 - AIME 2009
Acknowledgement • NLP Data Mining group at DBMI at Columbia • George Hripcsak • Marianthi Markatou • Herb Chase • Xiaoyan Wang • David Albers • Jung-wei Fan • Lyudmila Shagina • Noemie Elhadad • Grants • R01 LM007659 from NLM • R01 LM008635 from NLM • R01 LM06910 from NLM • 5T15LM007079 from NLM training grant July 21 - AIME 2009
QUESTIONS THANK YOU! July 21 - AIME 2009