Automatic extraction of BI-RADS breast tissue composition classes from mammography reports

Automatic extraction of BI-RADS breast tissue composition classes from mammography reports • Bethany Percha (Stanford) • Houssam Nassif (U. Wisconsin) • Jafi Lipson (Stanford) • Elizabeth Burnside (U. Wisconsin) • Daniel Rubin (Stanford)

Breast density is an important aspect of the radiological evaluation of the breast. • Dense fibroglandular tissue is a risk factor for breast cancer. • Dense tissue decreases mammographic sensitivity. • Breast tissue composition is partially genetic.

The BI-RADS system divides breast composition into four categories. • Fatty • Scattered fibroglandular • Heterogeneously dense • Dense

These standardized categories... • Help stratify patients at time of screening. • Enable radiologists to quality observations with discussion of how mammographic sensitivity may limit them. • Minimize ambiguity.

Limitations of the BI-RADS system • Breast composition information typically reported as part of narrative text. • No one textual pattern can extract composition information with 100% accuracy. • Large research studies require information from thousands of reports.

Our approach... • Borrow techniques from text-mining to extract breast tissue composition information quickly and easily. • Use pattern-matching and regular expression to identify and extract descriptions.

Methods: Data • Three distinct mammography corpora from Stanford, UCSF, and the Marshfield Clinic. • Used 34,489 reports from Stanford’s radTF database and 146,972 reports from UCSF Medical Center to construct set of textual patterns indicative of each breast composition class. • Independent test set composed of 500 annotated reports from Stanford and 100 from the Marshfield Clinic.

Methods: Evaluation • Reports independently annotated by radiologists. • Annotators blinded to automatically-extracted composition classes when assessing reports.

Methods: Rule Construction • Using free-text mammography reports as its input, our algorithm classifies each into one of five classes: • predominantly fat (class 1) • scattered densities (class 2) • heterogeneously dense (class 3) • dense (class 4) • no descriptors present (class 5) • Annotators blinded to automatically-extracted composition classes when assessing reports.

Results

Automatic extraction of BI-RADS breast tissue composition classes from mammography reports

Automatic extraction of BI-RADS breast tissue composition classes from mammography reports

Presentation Transcript

Automatic Acquisition of Lexical Classes and Extraction Patterns for Information Extraction

Breast Milk Composition

BI-RADS Terminology for Mammography Reports: What Residents Need to Know

Lactogenesis and Composition of Breast Milk

Nutritional Composition of Breast Milk

Semi-Automatic Content Extraction from Specifications

Extraction DNA From Plant Tissue

BI-RADS

BI – RADS (breast imaging reporting and data system)

Automatic Extraction of Subcategorization Frames From Corpora

Automatic Extraction of Hierarchical Relations from Text

Automatic Extraction of Function Bodies from Software Binaries

Automatic Creation of Web Services from Extraction Ontologies

3D Mammography | Dense Breast Info Inc

Breast Cancer - Affects Breast Tissue

Know About Digital Mammography | Breast Mammography Test in Bangalore

Breast Milk Composition

Automatic term extraction from domain corpora

BioNLP, Information Extraction from Radiology Reports

Nutritional Composition of Breast Milk

Extracting BI-RADS Features from Portuguese Clinical Texts

Lactogenesis and Composition of Breast Milk