160 likes | 273 Views
Validation and Evaluation of Algorithms. Vincent A. Magnotta The University of Iowa June 30, 2008. Software Development. In many areas of medical imaging, the generation of an algorithm is the “easy” aspect of the project Now that I have an algorithm what is the next step?
E N D
Validation and Evaluation of Algorithms Vincent A. Magnotta The University of Iowa June 30, 2008
Software Development • In many areas of medical imaging, the generation of an algorithm is the “easy” aspect of the project • Now that I have an algorithm what is the next step? • Validate the algorithm • Evaluate reliability • Evaluate biological relevance • These are very different and give the developer information that is useful to enhance an algorithm
Validation • Degree of accuracy of a measuring device • Validation of medical image analysis is a major challenge • What are our standards • Actual structure of interest • Another technique • Manual raters • Comparison with the literature
Validation Based on Actual Specimens Laser scanned surface Traced surface Surface Distance Map
Doppler US and Phase Contrast MR From Ho et al. Am. J. Roentgenol. 178 (3): 551, 2002
Manual Raters • Often we are left with manual raters in medical imaging to serve as a standard • Need to evaluate rater reliability • May be subject to rater drift and bias • Algorithms such as STAPLE have been developed to estimate the probability of a voxel being in a region-of-interest • Several metrics to evaluate reliability • Percent difference • Intraclass correlation • Border distance • Overlap metrics: Dice, Jaccard, Spatial Overlap • Sensitivity and Specificity
Metrics Intraclass Correlation Coefficient R2=(σsubject)2/ [(σsubject)2+ (σmethod)2+ (σerror)2] Volume(A∩B) Jaccard Metric = ------------------- Volume(AUB) 2*Volume(A∩B) Dice Metric = -------------------------------- [Volume(A)+Volume(B)] Volume(A∩B) Spatial Overlap = ------------------ Volume(A)
Intraclass Correlation Data Set 1 Data Set 2
Performance of Overlap Metrics Jaccard Metric Dice Metric
Reliability • Ability to reproduce measurements within a subject across trials • Most algorithms will give the same results when run on the same image data • Typically evaluated on a scan/rescan basis • Provides an estimate of the noise introduced by the algorithm • Helps to determine the sample size required to measure a known effect size
Scan/Resan of DTI Fiber Tract Scan 1 Scan 2 FA Dist (mm)
Evaluation • Use of digital phantoms • Easily define cases of interest • Can readily adjust SNR • Usually a simplification of biological structure • Lacks physiological noise • Often do not model the PSF and partial volume artifacts • Does the method replicate findings in the literature or known via observation
Conclusions • Validation and evaluation of tools can be the most difficult part of a neuroimaging project • There exist several methods for evaluating algorithms that have there strengths and weaknesses • Validation determines how close we are to the actual process of interest • Reliability determines in part our ability to measure changes • In general, neuroimaging provides an index of brain volumes and function; not absolute measurements
Acknowledgements • Department of Psychiatry • Hans Johnson • Department of Radiology • Stephanie Powell • Peng Cheng • MIMX Lab • Nicole Grosland • Nicole DeVries • Ester Gassman