180 likes | 188 Views
DeepPhe project focuses on advancing cancer phenotyping methods by combining information extraction, representation, and visualization from medical records. It enables high-throughput processing to annotate patient data comprehensively.
E N D
University of Pittsburgh Vanderbilt University Dana Farber Cancer Institute BCH Harry Hochheiser Olga Medvedeva Zhou Yuan Mike Davis Past: Rebecca Jacobson, MPI Girish Chavan Eugene Tseytlin Melissa Castine Adrian Lee John Kirkwood FrancesmaryModugno Guergana Savova, PI Sean Finan Timothy Miller Chen Lin David Harris James Masanz Elizabeth Buchbinder Jeremy Warner Alicia Beeghly-Fadiel DeepPhe NCI U24 CA132672 Cancer Deep Phenotyping from Electronic Medical Records (Savova, PI)
EHR Notes Background: eMERGE, PGRN, i2b2, SHARP Manual chart review HPI: 48 woman with right breast mammogram. Doc1, 0, 2, 3… Final diagnosis: Breast, Left, Needle Biopsy: Invasive Duct Carcinoma.. Doc2, 0, 0, 1… Patient currently on neoadjuvant therapy with Taxol. Due to cardiomyopathy, patient not candidate for Transtuzumab… Doc3, 1,4,0… Phenotype Label • positive/negative Automated chart review EHR Notes HPI: 48 woman with right breast mammogram. Final diagnosis: Breast, Left, Needle Biopsy: Invasive Duct Carcinoma.. Phenotype Label Patient currently on neoadjuvant therapy with Taxol. Due to cardiomyopathy, patient not candidate for Transtuzumab… • positive/negative Classifier
DeepPhe Project deepphe.healthnlp.org • Goal is to develop next generation cancer deep phenotyping methods • No longer dichotomization for a particular phenotype of interest • Rather, all phenotypes associated with a patient • Addresses information extraction but also representation and visualization • Support high throughput approach • process and annotate all data at multiple levels (from mention to phenotype) and across time • Combine IE with structured data (cancer registry) • Driven by translational research scientific goals as well as surveillance (SEER supplement)
NEOPLASM 1 Subject: Patient Infilatrating Ductal Carcinoma (+3d) Location: Right Breast Nuclear grade: 2 (+3d) ER: negative (+3d) PR: negative (+3d) Her2/Neu: positive (+3d) Size: 2.9 x 2.5 x 2.0 (+3d) Stage: T2N0M0 (interval, +3d, +119d) T2N0M1 (+656d) Lymphadenopathy: Negative, Clinical exam (+0d) Negative, MRI (+1d) Metastasis: Brain (+656d) TREATMENT 1: Neoadjuvant Chemotherapy Agents: Pacitaxel(interval, >+3d, >+119d) TREATMENT 2: Her2Neu MAB DISEASE 1: Subject: patient Cardiomyopathy NEOPLASM 2: Subject: mother Cancer ……. ……. DeepPhe HPI: 48 woman with right breast mammogram. Final diagnosis: Breast, Left, Needle Biopsy: Invasive Duct Carcinoma.. Patient currently on neoadjuvant therapy with Taxol. Due to cardiomyopathy, patient not candidate for Transtuzumab…
Architecture • Portability • Extensibility • Modularity • Ontology driven
DeepPhe Information Model Hochheiser, H., Castine, M., Harris, D., Savova G, Jacobson R. An information model for computable cancer phenotypes BMC Med Inform DecisMak. 2016 Sep 15;16(1):121. doi: 10.1186/s12911-016-0358-4
Invasive Ductal Carcinoma. 4.4 cm 58 yo F presents to the ER with slurred speech. DeepPhe NLP Pipeline Patient has triple negative breast cancer. Path ER Tumor is ER-, PR-, Her2-. deepPHE Tumor Phenotype Tumor Phenotype Tumor Phenotype
DeepPheGold Set (>500 documents) Cancer Registry ML/Annotators Evaluation OVCA BRCA MEL • Final contact < 12/31/13 • “Analytic” – all care within system <Ann/> <Ann/> <Ann/> ~165 Breast Cancer Notes ~165 Ovarian Cancer Notes ~165 Melanoma Notes All patients with known Cancer e.g. TNM, Stage, Metastasis Cancer Specific Annotations (DeepPhe) e.g. identifying temp rel of two events BRCA MEL OVCA • Select patients within 2 SD of mean # reports • Random Sample Text reports • Collect DS, RAD, PATH, PGN • Filter reports to specific windows of interest Temporal Relations (THYME) e.g. identifying chains relating tumor, mass and cancer Filter and Sample Clinical Coreference (ODIE) e.g. LocationOf UMLS Entity and Relation (SHARP) Deidentify and Preprocess
Discussion • Mention level • Word sense disambiguation, e.g. mass • Phenotype level • Error propagation, e.g. locations incorrectly linked to a tumor • Granularity levels, e.g. system extracts right nipple while gold standard has right breast • Multiple primaries, e.g. incorrect merging of tumors in left and right breast into one cancer (coreference)
Visualization • Display extracted and summarized information • At all levels of information model • Link summary items across levels • Provenance • Support verification and in-context interpretation of results • From high-level summary to individual mentions • Future – cohort searching and creation
Next • Application of the system to other types of cancers, including melanoma, lung cancer, and ovarian neoplasms • Episode classification • Treatment regimen extraction • Expansion of extracted information to include clinical genomic observations, such as somatic variants (e.g. BRAF status), or gene rearrangements (e.g. EML4/ALK) • Inclusion of structured EMR data (e.g. synoptic pathology reports)
Contact: Guergana.Savova@childrens.harvard.edu • Project Wiki http://deepphe.healthnlp.org • Code Repository https://github.com/DeepPhe/DeepPhe-Release • Publication in the Cancer Research Journal Savova, G., Tseytlin, E., Finan, S., Castine, M., Miller, T., Medvedeva, O., Haris, D., Hochheiser, H., Lin, C., Chavan, G., Jacobson R. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records Cancer Research 77(21), November 2017 DOI: 10.1158/0008-5472.CAN-17-0615.