1 / 30

A Review of Natural Language Processing for Biosurveillance

A Review of Natural Language Processing for Biosurveillance. Wendy W. Chapman, PhD. Biomedical Language Understanding. University of Pittsburgh. Dept of Biomedical Informatics. Current Surveillance. New strain of H5N1 Avian influenza. Cough 咳嗽. Respiratory.

kuri
Download Presentation

A Review of Natural Language Processing for Biosurveillance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Review of Natural Language Processing for Biosurveillance Wendy W. Chapman, PhD Biomedical Language Understanding University of Pittsburgh Dept of Biomedical Informatics

  2. Current Surveillance New strain of H5N1 Avian influenza Cough 咳嗽 Respiratory

  3. Leverage More Data for Surveillance Respiratory Patients

  4. Bioterrorist Threats Natural Disease Outbreaks Disaster Management Detect attacks Detect outbreaks Understand situation Biosurveillance Much of the useful data in textual format Need natural language processing

  5. Textual Data Sources Non-clinical Clinical

  6. Internet Mapping of Outbreaks • HealthMap • Global Health Monitor • Global Public Health Intelligence Network (GPHIN)—Public Health Agency of Canada

  7. Clinical Data for Biosurveillance Pneumonia Yes History Cough Yes 3 days Fever Yes 3 days Textprocessor Textual clinical data

  8. Clinical Data for Biosurveillance What types of data are available? How do we transform the data? How well can we process the data?

  9. What types of data are available? Trade-off

  10. Chief Complaints Content • Patient's reason for seeking care • 1-2 symptoms “Cough/headache” “n/v/d” “Motor vehicle accident” Useful for early detection of larger outbreaks Timeliness X Registration Physician Exam Discharge Home

  11. Content Risk factors Travel history Homelessness Duration of illness Exposure to contacts Ambulatory & Inpatient Notes Clinical Epidemiological • Symptoms • Findings • Medications • Allergies • Diagnoses • Chronic conditions Useful for targeted case detection, disease surveillance, and situational awareness Timeliness X Registration Physician Exam Discharge Home

  12. Content Cause Time of death Discharge Reports Clinical Death • Reason for hospitalization • Summary of care • Findings • Procedures performed • Plan for follow-up Most detailed but least timely—potentially usefulfor situational awareness Timeliness X Registration Physician Exam Discharge Home

  13. How do we transform the data? Text Processing Textprocessor Chief Complaints classify Syndrome Category cough/sob Respiratory extract Clinical Conditions • Pneumonia • historical • absent Textual Notes • Cough • recent • present No past history of pneumonia—presents with two day history of cough.

  14. Three Methods for Interpreting Text • Keyword-based • NYC Syndromic Macros • If “cough*” or “wheez*”  Respiratory • Symbolic • Semantics, syntax, discourse • stomach cramp is a type of abdominal pain • Statistical • P ( localized infiltrate | anatomic location = lower lobe,finding = hazy opacity ) = 0.96

  15. Synonyms Short of breath  dyspnea Coughing  cough Coughs  cough Abbreviations ha  headache abd  abdominal gx  ground transportation Acronyms n/v  nausea/vomiting sob  shortness of breath Processing Chief Complaints—Challenges Substantial word variation • Truncations • diar  diarrhea • poss  possible • Concatenations • blurredvision  burred vision • flus sxs  flu symptoms • Misspellings & typographic errors • nausa  nausea • diahrea  diarrhea

  16. Processing Notes—Challenges Contain linguistically complex narrations • Linguistic variation • Polysemy • Negation • Contextual information • Implication • Coreference

  17. Negation Approximately half of all clinical concepts in dictated reports are negated • Explicit absence “The mediastinum is not widened” • Mediastinal widening: absent • Implied absence “Lungs are clear upon auscultation” • Rales/crackles: absent • Rhonchi: absent • Wheezing: absent • Uncertainty

  18. Contextual Information • Temporality • Three-day history of cough • Past history of pneumonia • Finding Validation • She received her influenza vaccine • His temperature was taken in the ED • Hypothetical conditions • He should return for fever

  19. Performance using this data Chief ComplaintsIdentifying Syndromic Cases • Seven studies • One on pediatric population • Beitel, Chapman, Espino, Gesteland, Ivanov • Reference standards • ICD-9 discharge diagnoses • Physician review of ED reports • Eight syndromic definitions • Five febrile syndromic definitions

  20. 77 75 74 72 60 46 34 31 31 39 30 22 27 10

  21. Febrile Syndromes Sensitivity 0% – 12% Chapman and Dowling, J ISDS, 2007

  22. Ambulatory Notes Triage Notes NC-Detect • EMT-P + NegEx • Performs well at identifying clinical conditions ED Reports Better case detection than chief complaints • Topaz (Chapman) • MCVS (Elkin) • MedLEE (Friedman, South)

  23. Inpatient Notes Chest radiograph reports Pneumonia - > 90% sens and spec • SymText (Fiszman and Chapman) • MedLEE (Friedman and Hripcsak) • MCVS (Elkin) Widened mediastinum • IPS System (Chapman) Tuberculosis (Hripcsak) • MedLEE (Hripcsak)

  24. Identifying Syndromic Cases from Textual Notes CC vs full text record for Influenza-like Illness South et al.

  25. Identifying Epidemiological Factors from Clinical Notes Gundlapalliet al.

  26. Moderate performance at identifying syndromic cases Poor performance at identifying specific syndromes Chief complaints Textual Notes • Good performance at identifying syndromic cases • Ability to identify specific conditions • Ability to identify epidemiological factors

  27. Where do we go from here? Identifying cases • Most work on chief complaints • Current emphasis on reports • Need better algorithms and more research • Temporality and other contextual information Conveying information • Little if any applied work on characterizing outbreaks and conveying information to public health

  28. Conclusion • Data in clinical texts are useful for biosurveillance • Chief complaints most frequently used data source • Poor to moderate performance • Clinical notes promise better performance • More complicated text • Timeliness dependent on institution • Early stages of development and evaluation • Need to develop more applications applying NLP to characterization

  29. Thank You Wendy W. Chapman: wec6@pitt.edu Biomedical Language Understanding Lab www.dbmi.pitt.edu/blulab Chapter on NLP for Biosurveillance to appear in InfectiousDisease Informatics and Biosurveillance: Research, Systems, and CaseStudies

More Related