180 likes | 196 Views
Using natural language processing to identify contralateral breast cancer events in patient data sets from EHR notes and surgical reports for improved surveillance and outcome measurement.
E N D
Contralateral Breast Cancer Event Detection Using Nature Language Processing Session Title: NLP for Population heath Surveillance Session Number: S05 Speaker: Zexian Zeng Mentor: Yuan Luo Northwestern University
Motivation – Identify Breast Cancer Outcome Measurement Contralateral event is an outcome measurement for breast cancer study • Contralateral breast cancer is defined as a solid tumor developed in the opposite breast after the detection of the first primary breast cancer • Woman with a first primary breast cancer has two to six folds of increased risk to develop a contralateral breast cancer compared to the normal population • Efforts have been devoted to studying the shared risk factors between the first and second primary breast cancer Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 1
Motivation – Problems with Chart Review Manual chart review is widely used • Researchers still heavily rely on a manual chart review to identify the sub-cohorts with contralateral breast cancer • The review process is error-prone, labor-intensive, and time-consuming, making it difficult to scale to large cohort studies 7000 *10 =70000 minutes 7000 /60/10 =166.7 days Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 2
Motivation – Information in EHR Electronic health records (EHR) contains abundant information • Abundant available information in EHR makes phenotyping in large cohort studies achievable • Information in free text makes natural language processing (NLP) an indispensable tool for text-mining Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 3
Objectives • Objectives: Develop a model using natural language processing and machine learning to identify contralateral events in breast cancer patients’ data set Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 4
Data Sources – Progress Notes Patient’s progressive information and clinical status are well recorded in the progress notes • Progress notes serve to communicate opinions, findings, and plans between healthcare professionals • Progress notes are readily and prevalently available Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 5
Data Sources – Surgical Pathology Report Diagnostic procedure for breast cancer generates at least one pathology report • Pathology reports contain anatomic site information Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 6
Methods – Workflow Obtain a set of positive CUIs • Progress notes from 15 women with contralateral breast cancer were extracted and reviewed • Sentences or partial sentences indicating the occurrence of contralateral breast cancer and cancer diagnoses related events were retrieved • Sentences were annotated using MetaMap • 42 CUIs were generated Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 7
Methods – Workflow Generate features from progress notes • Preprocess • Remove duplicate copies • Divide the notes into sentences • Remove non-English symbols • Filter terms • Negation • Not fall in positive concept set • Concepts combination • Power sets • Combine any two and three CUIs that are extracted from the same sentence Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 8
Methods – Workflow One example to illustrate the process to generate features from progress notes Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 9
Methods – Workflow Generate features from surgical pathology report Algorithms: New_feature=‘0’ If ‘left’ in at least one pathology report: If ‘right’ in at least one pathology report: New_feature = ‘1’ One new binary feature indicating whether the patient has pathology reports for both sides were derived Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 10
Methods – Workflow Train model & evaluation • Support Vector Machine (SVM) • Grid search was performed • Evaluation • Baseline studies were performed • Combined MetaMap • Pathology Report Count • Positive Dictionary without Combination • Bag of Words • Five-fold cross validation • Held-out test Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 11
Results– Cross Validation Five-fold cross validation results Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 12
Results– Cross Validation Held-out test results Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 13
Results– Feature Study Top ranked features Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 14
Motivation – Discussions and Conclusions Discussions and conclusions • Patients with contralateral events usually have pathology reports for both sides of breast cancer • Progress notes do not contain mentions for all contralateral events • Putting these two dimensions of features together improves the performance • This method can be replicated due to the simplicity of feature generation Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 15
Thank You ! Zexian Zeng (Northwestern) AMIA 2017 11/05/2017 16