280 likes | 431 Views
Text Mining for Data Quality Analysis of Melanoma Tumor Depth. 2019 NAACCR/IACR Combined annual conference Vancouver, BC. Outline. Background Objectives Case selection criteria Source documents Algorithm development & testing Preliminary analyses Next steps. acknowledgements.
E N D
Text Mining for Data Quality Analysis of Melanoma Tumor Depth 2019 NAACCR/IACR Combined annual conference Vancouver, BC
Outline • Background • Objectives • Case selection criteria • Source documents • Algorithm development & testing • Preliminary analyses • Next steps
acknowledgements Cancer Registries IMS NCI Louisiana Detroit New Jersey New York Utah Glenn Abastillas Ariel Brest Linda Coyle Jennifer Stevens Peggy Adamo Clara Lam Serban Negoita Valentina Petkov
Melanoma tumor depth • Most important determinant of prognosis for melanomas • Pre-2018 diagnoses: CS SSF1 • Greatest measured thickness from any procedure recorded • Recorded in hundredths of millimeters (mm) • Three-digit field with implied decimal point between 1st & 2nd digits • Measurement of 2.0 mm coded as 200
Coding concerns • Decimal point errors • Transcription errors • Miscoding of tumor size for tumor depth • Incomplete information
objectives • Develop, test algorithm to identify accurate melanoma depth measurement values from unstructured text • Conduct assessment of error distribution & effect on stage & survival estimates • Provide registries with set of flagged cases with high likelihood of inaccurate depth measurement values for review • Provide registries with method for automatic error correction • Disseminate algorithm logic & query algorithm files to cancer surveillance & clinical research community • Provide evidence-based input for registrar training materials
Melanoma cases Must meet all criteria • Diagnosed between 2010-2014 • Behavior Code = 3 (invasive cancers) • Primary Site = C44_ • Histology Codes = 8720-8790 • Reportable to SEER
Melanoma cases, cont. Exclude • Death-certificates-only diagnoses (Reporting Source =7) • Cases with scanned images
Source documents Include • All NAACCR source abstracts • E-path reports Exclude • Pathology reports dated before diagnosis date
Source documents, cont. • E-path reports contain up to 8 different regions known as segments • 3 of 8 regions included in our source documentation to develop algorithm • Final diagnosis • Microscopic diagnosis/description • Synoptic report
Algorithm development & testing • What are we trying to capture? • Any numerical values relevant to melanoma tumor depth • Qualifier words that might indicate a measurement (e.g. at least) • Key words (e.g. Breslow’s, depth, thickness) • Process, Process, Process • Checks, verifications put in place to confirm measurements are relevant measurements
Algorithm development & testing, cont. Building Consolidated Results Data Set • Process raw measurement data to obtain standardized tumor depth measurement values • Select best standardized measurement value at source document level • Select best source document • Transform standardized measurement from best source document into NAACCR standard code value • Add new machine-generated code values to original CTC record (from SEER*DMS) to create analytic data set
Algorithm development & testing, cont. Building Gold Standard • Two experienced CTRs code melanoma depth, reconcile discrepancies • CTRs use all available data sources to determine measurement value for each consolidated case (CTC) • CTR reported value is “gold standard” (GS) value • Once CTR review of random sample data complete, algorithm/machine generated (MG) valued and “gold standard” values compared to originally reported values (OCTC)
Next steps • Continue refining algorithm • Develop GS for remaining registries • Increase from 240 to 480 GS cases • Run algorithm on all of registry’s invasive melanoma of skin cases • Provide registry with reports to use to determine which cases to review
Thank you!! Questions?