E N D
A Mammography Image Set for Observer Training and Assessment in BI-RADS Density ClassificationClaire E. Mercer1, Peter Hogg2, Judith Kelly3, Rita Borgen4, David Enion4, Beverley Hilton4, Sara Millington3, Patsy Whelehan51) The Nightingale Centre, University Hospital of South Manchester 2) University of Salford 3) Countess of Chester Hospital NHS Foundation Trust4) East Lancashire Breast Screening Unit, Burnley General Hospital 5) University of Dundee Background Breast density categorisation consistency is important when performing research where density is a relevant variable. Minimisation of inter and intra-observer variability is essential if findings are to be meaningful. This research aimed to validate a set of mammography images for visual breast density estimation to help achieve consistency in future research projects, and to determine observer performance (inter- and intra-observer agreement)1-3. Strong agreement between paired observers was demonstrated in 10 of 28 pairs on first scoring round, and 12 of 28 pairs on second. No observers demonstrated a delta variance above 1. Fleiss’ Kappa was used to evaluate concordance between all observers on first and second scoring rounds, with values of 0.64 and 0.56 respectively (Figure 2). Delta variance – None of the observers demonstrated a delta variance above 1 (a difference between scoring sets of 1 BI-RADS grade). On this basis we propose that our image set is suitable for determining whether an observer can participate in a research study which involves scoring BI-RADS density. • Method • To ensure inter and intra agreement between observers, 50 film-screen mammograms were scored twice by eight observers using the American College of Radiology BI-RADS (Breast Imaging Reporting and Data System)4 four-category density scale. • Agreement within and between observers was assessed. • Limiting factors: • Images used were analogue not digital; thus not representing current clinical/research practice. However, a large comparison study of mammographic density5comparing BI-RADS density in film-screen versus digitally acquired images concluded that regardless of image acquisition type, reported breast density BI-RADS categories were similar. • BI-RADS scale limitation; future of research in this area would evaluate 3D volumetric data. • Outcomes • Six of eight observers achieved near complete intra-observer agreement (Cohen’s Kappa >0.81) [Figure 1], oneshowing strong agreement and one moderate agreement. Conclusion We confirmed the 50 images were suitable for observer training and assessment for research purposes. Some variability existed between observers, but density classification agreement was strong overall. Further work includes repeating this study for digitally acquired images. Relevance/ Impact This exercise has set a gold-standard score for the test set and enabled the observers’ scoring consistency to be evaluated. This will facilitate rigour in future research where BI-RADS mammographic density scores are relevant. • References: • 1. AssiV, Warwick J, Cuzick J, Duffy S.W. Clinical and epidemiological issues in mammographic density. Clinical Oncology 2012; 9:33-40 • 2. GaoJ, Warren R, Warren-Forward H; Forbes J.F. Reproducibility of visual assessment on mammographic density. Breast Cancer Research and Treatment 2008; 108(1):121-7 • 3. CiattoS et al. Categorizing breast mammographic density: intra and interobserver reproducibility of BI-RADS density categories. Breast 2005; 14:269-275 • 4. D’OrsiCJ, Bassett LW, Berg WA. Mammography, 4th Edition. Breast Imaging Reporting and Data System: ACR BI-RADS®. Reston VA: American College of Radiology, 2003. • 5. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977; 33:159-74