300 likes | 382 Views
A Review of Natural Language Processing for Biosurveillance. Wendy W. Chapman, PhD. Biomedical Language Understanding. University of Pittsburgh. Dept of Biomedical Informatics. Current Surveillance. New strain of H5N1 Avian influenza. Cough 咳嗽. Respiratory.
E N D
A Review of Natural Language Processing for Biosurveillance Wendy W. Chapman, PhD Biomedical Language Understanding University of Pittsburgh Dept of Biomedical Informatics
Current Surveillance New strain of H5N1 Avian influenza Cough 咳嗽 Respiratory
Leverage More Data for Surveillance Respiratory Patients
Bioterrorist Threats Natural Disease Outbreaks Disaster Management Detect attacks Detect outbreaks Understand situation Biosurveillance Much of the useful data in textual format Need natural language processing
Textual Data Sources Non-clinical Clinical
Internet Mapping of Outbreaks • HealthMap • Global Health Monitor • Global Public Health Intelligence Network (GPHIN)—Public Health Agency of Canada
Clinical Data for Biosurveillance Pneumonia Yes History Cough Yes 3 days Fever Yes 3 days Textprocessor Textual clinical data
Clinical Data for Biosurveillance What types of data are available? How do we transform the data? How well can we process the data?
What types of data are available? Trade-off
Chief Complaints Content • Patient's reason for seeking care • 1-2 symptoms “Cough/headache” “n/v/d” “Motor vehicle accident” Useful for early detection of larger outbreaks Timeliness X Registration Physician Exam Discharge Home
Content Risk factors Travel history Homelessness Duration of illness Exposure to contacts Ambulatory & Inpatient Notes Clinical Epidemiological • Symptoms • Findings • Medications • Allergies • Diagnoses • Chronic conditions Useful for targeted case detection, disease surveillance, and situational awareness Timeliness X Registration Physician Exam Discharge Home
Content Cause Time of death Discharge Reports Clinical Death • Reason for hospitalization • Summary of care • Findings • Procedures performed • Plan for follow-up Most detailed but least timely—potentially usefulfor situational awareness Timeliness X Registration Physician Exam Discharge Home
How do we transform the data? Text Processing Textprocessor Chief Complaints classify Syndrome Category cough/sob Respiratory extract Clinical Conditions • Pneumonia • historical • absent Textual Notes • Cough • recent • present No past history of pneumonia—presents with two day history of cough.
Three Methods for Interpreting Text • Keyword-based • NYC Syndromic Macros • If “cough*” or “wheez*” Respiratory • Symbolic • Semantics, syntax, discourse • stomach cramp is a type of abdominal pain • Statistical • P ( localized infiltrate | anatomic location = lower lobe,finding = hazy opacity ) = 0.96
Synonyms Short of breath dyspnea Coughing cough Coughs cough Abbreviations ha headache abd abdominal gx ground transportation Acronyms n/v nausea/vomiting sob shortness of breath Processing Chief Complaints—Challenges Substantial word variation • Truncations • diar diarrhea • poss possible • Concatenations • blurredvision burred vision • flus sxs flu symptoms • Misspellings & typographic errors • nausa nausea • diahrea diarrhea
Processing Notes—Challenges Contain linguistically complex narrations • Linguistic variation • Polysemy • Negation • Contextual information • Implication • Coreference
Negation Approximately half of all clinical concepts in dictated reports are negated • Explicit absence “The mediastinum is not widened” • Mediastinal widening: absent • Implied absence “Lungs are clear upon auscultation” • Rales/crackles: absent • Rhonchi: absent • Wheezing: absent • Uncertainty
Contextual Information • Temporality • Three-day history of cough • Past history of pneumonia • Finding Validation • She received her influenza vaccine • His temperature was taken in the ED • Hypothetical conditions • He should return for fever
Performance using this data Chief ComplaintsIdentifying Syndromic Cases • Seven studies • One on pediatric population • Beitel, Chapman, Espino, Gesteland, Ivanov • Reference standards • ICD-9 discharge diagnoses • Physician review of ED reports • Eight syndromic definitions • Five febrile syndromic definitions
77 75 74 72 60 46 34 31 31 39 30 22 27 10
Febrile Syndromes Sensitivity 0% – 12% Chapman and Dowling, J ISDS, 2007
Ambulatory Notes Triage Notes NC-Detect • EMT-P + NegEx • Performs well at identifying clinical conditions ED Reports Better case detection than chief complaints • Topaz (Chapman) • MCVS (Elkin) • MedLEE (Friedman, South)
Inpatient Notes Chest radiograph reports Pneumonia - > 90% sens and spec • SymText (Fiszman and Chapman) • MedLEE (Friedman and Hripcsak) • MCVS (Elkin) Widened mediastinum • IPS System (Chapman) Tuberculosis (Hripcsak) • MedLEE (Hripcsak)
Identifying Syndromic Cases from Textual Notes CC vs full text record for Influenza-like Illness South et al.
Identifying Epidemiological Factors from Clinical Notes Gundlapalliet al.
Moderate performance at identifying syndromic cases Poor performance at identifying specific syndromes Chief complaints Textual Notes • Good performance at identifying syndromic cases • Ability to identify specific conditions • Ability to identify epidemiological factors
Where do we go from here? Identifying cases • Most work on chief complaints • Current emphasis on reports • Need better algorithms and more research • Temporality and other contextual information Conveying information • Little if any applied work on characterizing outbreaks and conveying information to public health
Conclusion • Data in clinical texts are useful for biosurveillance • Chief complaints most frequently used data source • Poor to moderate performance • Clinical notes promise better performance • More complicated text • Timeliness dependent on institution • Early stages of development and evaluation • Need to develop more applications applying NLP to characterization
Thank You Wendy W. Chapman: wec6@pitt.edu Biomedical Language Understanding Lab www.dbmi.pitt.edu/blulab Chapter on NLP for Biosurveillance to appear in InfectiousDisease Informatics and Biosurveillance: Research, Systems, and CaseStudies