1 / 28

Text Mining for Data Quality Analysis of Melanoma Tumor Depth

Text Mining for Data Quality Analysis of Melanoma Tumor Depth. 2019 NAACCR/IACR Combined annual conference Vancouver, BC. Outline. Background Objectives Case selection criteria Source documents Algorithm development & testing Preliminary analyses Next steps. acknowledgements.

fiorella
Download Presentation

Text Mining for Data Quality Analysis of Melanoma Tumor Depth

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Mining for Data Quality Analysis of Melanoma Tumor Depth 2019 NAACCR/IACR Combined annual conference Vancouver, BC

  2. Outline • Background • Objectives • Case selection criteria • Source documents • Algorithm development & testing • Preliminary analyses • Next steps

  3. acknowledgements Cancer Registries IMS NCI Louisiana Detroit New Jersey New York Utah Glenn Abastillas Ariel Brest Linda Coyle Jennifer Stevens Peggy Adamo Clara Lam Serban Negoita Valentina Petkov

  4. Background

  5. Melanoma tumor depth • Most important determinant of prognosis for melanomas • Pre-2018 diagnoses: CS SSF1 • Greatest measured thickness from any procedure recorded • Recorded in hundredths of millimeters (mm) • Three-digit field with implied decimal point between 1st & 2nd digits • Measurement of 2.0 mm coded as 200

  6. Coding concerns • Decimal point errors • Transcription errors • Miscoding of tumor size for tumor depth • Incomplete information

  7. objectives

  8. objectives • Develop, test algorithm to identify accurate melanoma depth measurement values from unstructured text • Conduct assessment of error distribution & effect on stage & survival estimates • Provide registries with set of flagged cases with high likelihood of inaccurate depth measurement values for review • Provide registries with method for automatic error correction • Disseminate algorithm logic & query algorithm files to cancer surveillance & clinical research community • Provide evidence-based input for registrar training materials

  9. Case selection criteria

  10. Melanoma cases Must meet all criteria • Diagnosed between 2010-2014 • Behavior Code = 3 (invasive cancers) • Primary Site = C44_ • Histology Codes = 8720-8790 • Reportable to SEER

  11. Melanoma cases, cont. Exclude • Death-certificates-only diagnoses (Reporting Source =7) • Cases with scanned images

  12. Source documents

  13. Source documents Include • All NAACCR source abstracts • E-path reports Exclude • Pathology reports dated before diagnosis date

  14. Source documents, cont. • E-path reports contain up to 8 different regions known as segments • 3 of 8 regions included in our source documentation to develop algorithm • Final diagnosis • Microscopic diagnosis/description • Synoptic report

  15. Algorithm development & testing

  16. Algorithm development & testing • What are we trying to capture? • Any numerical values relevant to melanoma tumor depth • Qualifier words that might indicate a measurement (e.g. at least) • Key words (e.g. Breslow’s, depth, thickness) • Process, Process, Process • Checks, verifications put in place to confirm measurements are relevant measurements

  17. Algorithm development & testing, cont. Building Consolidated Results Data Set • Process raw measurement data to obtain standardized tumor depth measurement values • Select best standardized measurement value at source document level • Select best source document • Transform standardized measurement from best source document into NAACCR standard code value • Add new machine-generated code values to original CTC record (from SEER*DMS) to create analytic data set

  18. Algorithm development & testing, cont. Building Gold Standard • Two experienced CTRs code melanoma depth, reconcile discrepancies • CTRs use all available data sources to determine measurement value for each consolidated case (CTC) • CTR reported value is “gold standard” (GS) value • Once CTR review of random sample data complete, algorithm/machine generated (MG) valued and “gold standard” values compared to originally reported values (OCTC)

  19. Preliminary analyses

  20. MG & OCTC Code Values Agreement with GS by Tumor Thickness

  21. MG & OCTC Code Values Agreement with GS by T category

  22. MG & OCTC Code Values Agreement with GS by Source Documents

  23. Error Analysis – I2e & GS

  24. Error Analysis – SAS & GS

  25. Error Analysis – OCTC & GS

  26. Next steps

  27. Next steps • Continue refining algorithm • Develop GS for remaining registries • Increase from 240 to 480 GS cases • Run algorithm on all of registry’s invasive melanoma of skin cases • Provide registry with reports to use to determine which cases to review

  28. Thank you!! Questions?

More Related