140 likes | 329 Views
Automatic extraction of BI-RADS breast tissue composition classes from mammography reports. Bethany Percha (Stanford) Houssam Nassif (U. Wisconsin) Jafi Lipson (Stanford) Elizabeth Burnside (U. Wisconsin) Daniel Rubin (Stanford).
E N D
Automatic extraction of BI-RADS breast tissue composition classes from mammography reports • Bethany Percha (Stanford) • Houssam Nassif (U. Wisconsin) • Jafi Lipson (Stanford) • Elizabeth Burnside (U. Wisconsin) • Daniel Rubin (Stanford)
Breast density is an important aspect of the radiological evaluation of the breast. • Dense fibroglandular tissue is a risk factor for breast cancer. • Dense tissue decreases mammographic sensitivity. • Breast tissue composition is partially genetic.
The BI-RADS system divides breast composition into four categories. • Fatty • Scattered fibroglandular • Heterogeneously dense • Dense
These standardized categories... • Help stratify patients at time of screening. • Enable radiologists to quality observations with discussion of how mammographic sensitivity may limit them. • Minimize ambiguity.
Limitations of the BI-RADS system • Breast composition information typically reported as part of narrative text. • No one textual pattern can extract composition information with 100% accuracy. • Large research studies require information from thousands of reports.
Our approach... • Borrow techniques from text-mining to extract breast tissue composition information quickly and easily. • Use pattern-matching and regular expression to identify and extract descriptions.
Methods: Data • Three distinct mammography corpora from Stanford, UCSF, and the Marshfield Clinic. • Used 34,489 reports from Stanford’s radTF database and 146,972 reports from UCSF Medical Center to construct set of textual patterns indicative of each breast composition class. • Independent test set composed of 500 annotated reports from Stanford and 100 from the Marshfield Clinic.
Methods: Evaluation • Reports independently annotated by radiologists. • Annotators blinded to automatically-extracted composition classes when assessing reports.
Methods: Rule Construction • Using free-text mammography reports as its input, our algorithm classifies each into one of five classes: • predominantly fat (class 1) • scattered densities (class 2) • heterogeneously dense (class 3) • dense (class 4) • no descriptors present (class 5) • Annotators blinded to automatically-extracted composition classes when assessing reports.