1 / 18

Selen Bozkurt, PhD Stanford University, Biomedical Data Science, Biomedical Informatics Research

An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing S63: NLP for Phenotyping. Selen Bozkurt, PhD Stanford University, Biomedical Data Science, Biomedical Informatics Research. Disclosure.

addo
Download Presentation

Selen Bozkurt, PhD Stanford University, Biomedical Data Science, Biomedical Informatics Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing S63: NLP for Phenotyping Selen Bozkurt, PhD Stanford University, Biomedical Data Science, Biomedical Informatics Research

  2. Disclosure • We have NO relevant relationships with commercial interests to disclose. • Acknowledgement • Researchreported in thispublicationwassupportedbytheNationalCancerInstitute of theNationalInstitutes of HealthunderAwardNumber R01CA183962. Thecontent is solelytheresponsibility of theauthorsanddoes not necessarilyrepresenttheofficialviews of theNationalInstitutes of Health. • Authors: • Selen Bozkurt, PhD, Jung In Park, PhD, RN , Kathleen Mary Kan MD , • Michelle Ferrari, RN , Daniel L Rubin, MD, MS James D Brooks, MD, • Tina Hernandez Boussard, PhD AMIA 2018 | amia.org

  3. Learning Objectives • After participating in this session the learner should be better able to learn about: • An NLP framework for automatic identification of Prostate Cancer Quality Metrics • A rule-based approach enriched with terms learned from the corpus using distributional semantics algorithms might be used to get patient centered outcomes • Prostate Cancer Quality Metrics • Patient-Centered Outcome Phenotypes AMIA 2018 | amia.org

  4. Introduction • Prostate Cancer Quality Metrics • Patient-reported outcomes (e.g. urinary incontinence, erectile dysfunction) • Quality of life measures (global mental and physical health) • Digital rectal exam (DRE) as a pre-treatment assessment • Pretreatment process quality measure, • Documentation within 6 months prior to initial treatment • DRE as a quality metric for prostate cancer treatment AMIA 2018 | amia.org

  5. Motivation • DRE • Not recorded systematically or included in billing or claims datasets • Limited to labor-intensive approaches, manual chart reviews to assess documentation • Well documented in the patient record DRE: Moderately enlarged, smooth, symmetric prostate without any induration or nodularity Report 1 If he does decide to undergo active surveillance, he will need frequent PSA checks (up to every 3 months), frequent rectal examinations and possibly rebiopsyin the future. Report 2 At the time of his diagnosis his rectal examination showed no abnormalities. Rectal: Normal perianal skin, good sphincter tone and normal rectal mucosa. Prostate is moderately enlarged and without nodularity or induration. Report 3 AMIA 2018 | amia.org

  6. Assessing Prostate Cancer QM from EHRs Problems Solutions Natural Language Processing Documented in clinical narratives Lack of labeled data Rule-based Approaches Terminology Ontologies, Lexicons, Dictionaries Dictionary Development Domain Knowledge, Manual developments or Using Distributional Semantics AMIA 2018 | amia.org

  7. Research Questions and Goals • Can we extract DRE documentation from clinical notes? • Develop an NLP solution • Can we follow up DRE documentation for all DB? • Integrate NLP solution to research database and follow up documentation AIM 2 Structured Data EHR + NLP Output EHR + Unstructured Data NLP pipeline AIM 1 Evaluation AMIA 2018 | amia.org

  8. Method: Data Source • The Stanford prostate cancer research database • Data were linked to the California Cancer Registry • from 2005 to February 9, 2018 • ICD diagnostic codes, ICD-9-CM:185 and ICD-10-CM: C61 Reference: Seneviratne, M. G., Seto, T., Blayney, D. W., Brooks, J. D., & Hernandez-Boussard, T. (2018). Architecture and Implementation of a Clinical Research Data Warehouse for Prostate Cancer.  eGEMs (Generating Evidence & Methods to improve patient outcomes), 6(1). AMIA 2018 | amia.org

  9. Method: Data Set All Database ICD-9-CM:185 or ICD-10-CM:C61 from Jan 1, 2005 to Mar 30, 2017 N = 15,834 7443 excluded for not receiving initial treatment for prostate cancer at our hospital N = 8391 458,339 notes N = 7353 1038 missing notes and note dates Development + Test Set Dictionary Creation N = 301 Development Set N = 101 Notes # 101 Test Set N = 200 Notes # 200 AMIA 2018 | amia.org

  10. Method: Development and Test Set Development + Test Set N = 301 Test Set N = 200 Notes # 200 Development Set N = 101 Notes # 101 Hypothetical Deferred Refused Examined Historical Manually annotated as sentence level by two domain experts: inter-rater reliability (Cohen’s κ = .97) Development process, error analysis and adjustments (development set) Precision, recall and F-score (test set) AMIA 2018 | amia.org

  11. Method: The Proposed Pipeline Output 1 PRE-PROCESSING Proposed Terms List for DRE Findings Terms List Creation Revised-CONTEXT Tagger Sentence Splitter Key Term Mapping Ontologies from NCBO Tokenization Negation Experts’ domain knowledge Named Entity Tagging Learning vector space representations of words and phrases in clinical notes Stop word, numbers, punctuation removal Temporality Words and Phrases Output 2 Rule based Information Extraction AMIA 2018 | amia.org

  12. Method: Dictionary Creation Initial list of terms was generated based on domain knowledge Matched with existing ontologies from NCBO Candidate terms using distributional information on words and phrases 1) Bigram, trigram: rectal_exam, digital_rectal_exam 2) Word2vec: skip-gram model, vector length 100, context window width of 5 The final list of terms was reviewed by the domain experts AMIA 2018 | amia.org

  13. Method: Terms added to the ConText Modifier List AMIA 2018 | amia.org

  14. Results: Accuracy Metrics AMIA 2018 | amia.org

  15. Results: DRE Documentation Stats AMIA 2018 | amia.org

  16. Conclusion • We built a rule based NLP pipeline to follow DRE documentation • As a quality metric • As a patient centered outcome • Could be expanded to other quality metrics and PCOs • with NLP techniques, it is feasible to accurately and efficiently identify and extract features associated with quality metrics AMIA 2018 | amia.org

  17. Future Works • Testing our algorithms in another healthcare system to ensure their generalizability • Expanding for other quality metrics • The clinical terms used in our algorithms will be disseminated with a national repository (pheKB.org) AMIA 2018 | amia.org

  18. Thank you! • Selen Bozkurt - selenb@stanford.edu • Tina Hernandez-Boussardboussard@stanford.edu • Boussard Lab looking for • new postdocs.

More Related