1 / 15

SAS-Based NLP for Identifying Incident and Recurrent Malignancies

This research study explores the use of SAS-based Natural Language Processing (NLP) to identify incident and recurrent malignancies from pathology reports. The study uses the SCENT tool, which demonstrates favorable results in identifying and extracting information about primary and recurrent malignancies. SCENT has the potential to expedite chart reviews, improve measurement accuracy, and extract standardized disease scores and other clinical information.

wquimby
Download Presentation

SAS-Based NLP for Identifying Incident and Recurrent Malignancies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. May 1, 2012 •2012 HMORN Conference • Seattle, Washington Use of SAS-Based Natural Language Processing to Identify Incident and Recurrent Malignancies Justin A. Strauss, MA Research Associate III Kaiser Permanente Southern California

  2. Co-Authors • Chun R. Chao, PhD • Marilyn L. Kwan, PhD • Syed A. Ahmed, MD • Joanne E. Schottinger, MD • Virginia P. Quinn, PhD

  3. Acknowledgements & Funding • Mayra Martinez, Michelle McGuire, Melissa Preciado, Nirupa Ghai, and Jeff Slezak (KPSC); Lawrence Kushi (KPNC); Debra Ritzwoller (KPCO); Joan Warren (NCI); Jianyu Rao and Jiaoti Huang (UCLA) • Funding was provided by KPSC Community Benefit and the Cancer Research Network

  4. Malignancy Identification • Malignancy identification is important for clinical and epidemiologic cancer research. • Limited quality and availability of incident and recurrent malignancy data within health plans. • Delayed availability of incident malignancy data from cancer registries. • Few registries track cancer recurrences. • Manual chart abstraction slow and expensive. • Previous research has shown electronic diagnosis codes (e.g., ICD-9) to be unreliable.

  5. Natural Language Processing • Natural language processing (NLP) can be used to identify and extract information from electronic clinical text, including incident and recurrent malignancy data. • Increasing opportunity for NLP with adoption of electronic clinical systems in patient care delivery. • Despite its potential value in clinical and research settings, NLP usage has been relatively sparse. Contributing factors may include: • Technical complexity • Systems integration requirements • Habitual use of existing methods

  6. SCENT Overview • A SAS-based coding, extraction, and nomenclature tool (SCENT) was developed to identify incident and recurrent malignancies using text from pathology reports. • SCENT is currently being implemented in two research studies at Kaiser Permanente Southern California (KPSC): • Intervention to improve medication adherence among breast cancer patients. • Differences in the prognosis of prostate cancer patients according to their genetic factors • Use of SAS programming minimizes implementation barriers and increases availability for multisite research.

  7. Description of Methods • SCENT identifies non-negated clinical concepts within pathology report text. • Built using SAS Base (does not require Text Miner add-on). • Makes extensive use of SAS hash objects and regular expressions. • Includes components for preprocessing, matching, negation and uncertainty detection, extracting diagnostic information (e.g., staging and Gleason score), and classifying report malignancy status. • Flexibility to assign codes using variety of coding systems. • Validation used subset of SNOMED 3.x (~1000 concepts).

  8. SCENT Process Diagram

  9. Sample Report Coding Preprocessed Text Coded Text

  10. Validation Study • To validate SCENT, trained chart abstractors reviewed electronic pathology reports. • Random samples of breast (n=400) and prostate (n=400) cancer patients. • Patients diagnosed at KPSC between 2000-2007. • Reports included from six months post-diagnosis through end of 2008. • In total, 206 breast and 186 prostate cancer patients contributed 490 and 425 eligible reports, respectively. • SCENT classifications were compared with those of abstractors.

  11. Classification Concordance Note: incident contralateral breast malignancies were considered to be recurrences.

  12. SCENT Performance Metrics * Shown with Wilson's 95% confidence interval.

  13. Conclusions • Favorable results suggest SCENT can identify and extract information about primary and recurrent malignancies from pathology reports. • Rapid cancer case identification. • Improved measurement accuracy of common study endpoint. • SCENT has the potential to expedite chart reviews by narrowing the search and highlighting relevant concepts. • Generalized utility for extracting standardized disease scores and other clinical information. • SCENT is proof of concept for SAS-based NLP that can be easily shared between institutions to support research.

  14. Limitations & Next Steps • SCENT has a number of limitations, including: • Unable to disambiguate and contextualize identified clinical concepts without part-of-speech (POS) tagging. • More susceptible to changes in text structure and increased linguistic variability than statistical NLP approaches. • General purpose NLP (e.g., cTAKES) likely to perform better outside of pathology. • Next steps include: • Release SCENT source code and requisite support files. • Optimize current functionality and assess feasibility of adding methods (e.g., POS tagging, n-grams, statistical classifiers). • Attempt to identify non-pathologically diagnosed malignancies using radiology reports and clinical progress notes. • Quantify cost savings associated with SCENT-assisted chart reviews.

  15. Questions?

More Related