Validation and Evaluation of Algorithms

Validation and Evaluation of Algorithms Vincent A. Magnotta The University of Iowa June 30, 2008

Software Development • In many areas of medical imaging, the generation of an algorithm is the “easy” aspect of the project • Now that I have an algorithm what is the next step? • Validate the algorithm • Evaluate reliability • Evaluate biological relevance • These are very different and give the developer information that is useful to enhance an algorithm

Validation • Degree of accuracy of a measuring device • Validation of medical image analysis is a major challenge • What are our standards • Actual structure of interest • Another technique • Manual raters • Comparison with the literature

Validation Based on Actual Specimens Laser scanned surface Traced surface Surface Distance Map

Doppler US and Phase Contrast MR From Ho et al. Am. J. Roentgenol. 178 (3): 551, 2002

Manual Raters • Often we are left with manual raters in medical imaging to serve as a standard • Need to evaluate rater reliability • May be subject to rater drift and bias • Algorithms such as STAPLE have been developed to estimate the probability of a voxel being in a region-of-interest • Several metrics to evaluate reliability • Percent difference • Intraclass correlation • Border distance • Overlap metrics: Dice, Jaccard, Spatial Overlap • Sensitivity and Specificity

Metrics Intraclass Correlation Coefficient R2=(σsubject)2/ [(σsubject)2+ (σmethod)2+ (σerror)2] Volume(A∩B) Jaccard Metric = ------------------- Volume(AUB) 2*Volume(A∩B) Dice Metric = -------------------------------- [Volume(A)+Volume(B)] Volume(A∩B) Spatial Overlap = ------------------ Volume(A)

Intraclass Correlation Data Set 1 Data Set 2

Performance of Overlap Metrics Jaccard Metric Dice Metric

Reliability • Ability to reproduce measurements within a subject across trials • Most algorithms will give the same results when run on the same image data • Typically evaluated on a scan/rescan basis • Provides an estimate of the noise introduced by the algorithm • Helps to determine the sample size required to measure a known effect size

Scan/Resan of DTI Fiber Tract Scan 1 Scan 2 FA Dist (mm)

Evaluation • Use of digital phantoms • Easily define cases of interest • Can readily adjust SNR • Usually a simplification of biological structure • Lacks physiological noise • Often do not model the PSF and partial volume artifacts • Does the method replicate findings in the literature or known via observation

Age Related FA Changes

Conclusions • Validation and evaluation of tools can be the most difficult part of a neuroimaging project • There exist several methods for evaluating algorithms that have there strengths and weaknesses • Validation determines how close we are to the actual process of interest • Reliability determines in part our ability to measure changes • In general, neuroimaging provides an index of brain volumes and function; not absolute measurements

Acknowledgements • Department of Psychiatry • Hans Johnson • Department of Radiology • Stephanie Powell • Peng Cheng • MIMX Lab • Nicole Grosland • Nicole DeVries • Ester Gassman

Validation and Evaluation of Algorithms

Validation and Evaluation of Algorithms

Presentation Transcript

On the Validation of Traffic Classification Algorithms

Model Evaluation, Validation and Testing

Evaluation of Fast Electrostatics Algorithms

Performance Evaluation of Shadow Detection Algorithms

Part II.3 Evaluation of algorithms

Medical Device Evaluation and Validation

Performance Evaluation of Machine Learning Algorithms

Suomi NPP Sounding EDR Validation and Evaluation

Evaluation of Queue Management Algorithms

Ch.5 Evaluation and Validation

Performance evaluation of some clustering algorithms and validity indices

Algorithms for emittance evaluation

Evaluation and validation

Validation of CIRA Tropical Cyclone Algorithms

Performance Evaluation of Grouping Algorithms

Chapter 6 System Test, Evaluation, and Validation

Performance evaluation of some clustering algorithms and validity indices

Algorithms and Efficiency of Algorithms

Model-based evaluation of clustering validation measures

DATA VALIDATION-I Evaluation of editing and imputation