160 likes | 174 Views
This study focuses on developing a model to detect metastases in lung cancer pathology reports using a combination of dictionary-based and probabilistic methods, achieving high recall and precision rates. The research aims to improve cancer metastasis identification and prognosis prediction for better patient outcomes.
E N D
Identifying Metastases from Pathology Reports in Lung Cancer Patients Ergin Soysal, MD, PhD; Jeremy L Warner, MD, MS; Joshua C Denny, MD, MS; Hua Xu, PhD SBMI, Univof Texas Health SciCtrat Houston Department of Biomedical Informatics, Vanderbilt Univ
Background • Cancer research data requirements • Cancer Metastasis • Prognostic factor • Site specific results • lung to brain = bad • breast to bone = better • Tumor registries • Only records first recurrence and major site • Generally not mandated • Pathology reports
Aims • Detection of • Metastasis status • Metastasis site • Detect specimen site • Lung, bone, liver, brain, bone marrow, pleura, peritoneum, adrenal gland, skin, lymph nodes • Existence of tumor • Histologic types, metastatic status
metastatic_status location Model has_met_stat has_location has_finding histological_type Specimen diagnosis has_procedure has_grade grade procedure
Non-standard terms • Variety of terms referring a single class
Tools • Annotation and machine learning • Encode term use hierarchy to classify the site • Use available tools for encoding • Dictionary based • Probabilistic methods
Overview report Diagnosis Section Section Identifier Chunker Terms and phrases Rule Extraction Entity Classification Phrase look up metastasis status , site Lexicon with semantic links and LVG and SPECIALIST SNOMED CT UMLS
Entity Recognition/Classification • Combination of dictionary based + probabilistic methods • p(lung_site | L1,..,Ln) • p(L1|lung_site)*p(lung_site)/p(L1)
Phrase Lookup • Grading • <degree> differentiated<degree> grade • <degree> = {poor(ly), poor to intermediate, low…} • Metastatic status metastatic, primary
Dataset • 540 pathology reports from 262 patients with lung cancer from the Vanderbilt tumor registry • with metastatic lesions at an unknown site • 217 reports (of 100 patients) were spared for evaluation
Evaluation - Annotation • 216 reports of 100 metastatic patients • Annotation • Metastasis: yes/no • Metastasis site detection • PUL, OSS, HEP, BRA, MAR, PLE, PER, ADR, SKI, LYM, OTH • Failed to assign: FN, Wrong class: FP, Correct Class: TP
Results • Metastatic status detection (4 suspicious omitted) • Recall: 82.69% • Precision 87.76% • Metastasis site detection • Recall: 89.62% • Precision 93.13%
Future Work • Better utilize • Procedure • Finding Location relationships • Histologic type • SVM, CRF for classification • Generalizable?
Acknowledgement • Supported by • NCI - U24 CA194215 • CPRIT - R1307