David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014

Comprehensive Introduction to the Evaluation of Neural Networks and other Computational Intelligence Decision Functions: Receiver Operating Characteristic, Jackknife, Bootstrap and other Statistical Methodologies David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014

Course Outline • Performance measures for Computational Intelligence (CI) observers • Accuracy • Prevalence dependent measures • Prevalence independent measures • Maximization of performance: Utility analysis/Cost functions • Receiver Operating Characteristic (ROC) analysis • Sensitivity and specificity • Construction of the ROC curve • Area under the ROC curve (AUC) • Error analysis for CI observers • Sources of error • Parametric methods • Nonparametric methods • Standard deviations and confidence intervals • Boot strap methods • Theoretical foundation • Practical use • References

What’s the problem? • Emphasis on algorithm innovation to exclusion of performance assessment • Use of subjective measures of performance – “beauty contest” • Use of “accuracy” as a measure of success • Lack of error bars—My CIO is .01 better than yours (+/- ?) • Flawed methodology—training and testing on same data • Lack of appreciation for the many different sources of error that can be taken into account

Original imageLena. Courtesy of the Signal and Image Processing Institute at the University of Southern California.

CI improved imageBaboon. Courtesy of the Signal and Image Processing Institute at the University of Southern California.

Panel of expertsfunnymonkeysite.com

I. Performance measures for computational intelligence (CI) observers • Task based: (binary) discrimination task • Two populations involved: “normal” and “abnormal,” • Accuracy – Intuitive but incomplete • Different consequences for success or failure for each population • Some measures depend on the prevalence (Pr) some do not, Pr = • Accuracy, positive predictive value, negative predictive value • Sensitivity, specificity, ROC, AUC • True optimization of performance requires knowledge of cost functions or utilities for successes and failures in both populations

How to make a CIO with >99% accuracy • Medical problem: Screening mammography (“screening” means testing in an asymptomatic population) • Prevalence of breast cancer in the screening population Pr = 0.5 % • My CIO always says “normal” • Accuracy (Acc) is 99.5% (accuracy of accepted present-day systems ~75%) • Accuracy in a diagnostic setting (Pr~20%) is 80% -- Acc=1-Pr (for my CIO)

CIO operates on two different populations Normal cases p(t|0) Abnormal cases p(t|1) Threshold t = T t-axis

Must consider effects on normal and abnormal populations separately • CIO output t • p(t|0) probability distribution of t for the population of normals • p(t|1) probability distribution of t for the population of abnormals • Threshold T. Everything to the right of T called abnormal, and everything to the left of T called normal • Area of p(t|0) to left of T is the true negative fraction (TNF = specificity) and to the right the false positive fraction (FPF = type 1 error). • TNF + FPF = 1 • Area of p(t|1) to left of T is the false negative fraction (FNF = type 2 error) and to the right is the true positive fraction (TPF = sensitivity) • FNF + TPF = 1 • TNF, FPF, FNF, TPF all are prevalence independent, since each is some fraction of one of our two probability distributions • {Accuracy = Pr x TPF + (1-Pr) x TNF}

Normal cases FPF (.5) TNF (.5) t-axis Threshold T Abnormal cases TPF (.95) FNF (.05) t-axis

Prevalence dependent measures • Accuracy (Acc) • Acc = Pr x TPF + (1-Pr) x TNF • Positive predictive value (PPV): fraction of positives that are true positives • PPV = TPF x Pr / (TPF x Pr + FPF x (1-Pr)) • Negative predictive value (NPV): fraction of negatives that are true negatives • NPV = TNF x (1-Pr) / (TNF x (1-Pr) + FNF x Pr) • Using the mammography screening Pr and previous TPF, TNF, FNF, FPF values: Pr = .05, TPF = .95, TNF = 0.5, FNF=.05, FPF=0.5 • Acc = .05x.95+.95x.5 = .52 • PPV = .95x.05/(.95x.05+.5x.95) = .10 • NPV = .5x.95/(.5x.95+.05x.05) = .997

Prevalence dependent measures • Accuracy (Acc) • Acc = Pr x TPF + (1-Pr) x TNF • Positive predictive value (PPV): fraction of positives that are true positives • PPV = TPF x Pr / (TPF x Pr + FPF x (1-Pr)) • Negative predictive value (NPV): fraction of negatives that are true negatives • NPV = TNF x (1-Pr) / (TNF x (1-Pr) + FNF x Pr) • Using the mammography screening Pr and previous TPF, TNF, FNF, FPF values: Pr = .005, TPF = .95, TNF = 0.5, FNF=.05, FPF=0.5 • Acc = .005x.95+.995x.5 = .50 • PPV = .95x.005/(.95x.005+.5x.995) = .01 • NPV = .5x.995/(.5x.995+.05x.005) = .995

Acc, PPV, NPV as functions of prevalence(screening mammography) • TPF=.95 • FNF=.05 • TNF=0.5 • FPF=0.5

Acc = NPV as function of prevalence(forced “normal” response CIO)

Prevalence independent measures • Sensitivity = TPF • Specificity = TNF (1-FPF) • Receiver Operating Characteristic (ROC) = TPF as a function of FPF (Sensitivity as a function of 1 – Specificity) • Area under the ROC curve (AUC) = Sensitivity averaged over all values of Specificity

Threshold Normal / Class 0 subjects Entire ROC curve ROC slope TPF, sensitivity Abnormal / Class 1 subjects FPF, 1-specificity

Empirical ROC data for mammography screening in the US Craig Beam et al.

Maximization of performance • Need to know utilities or costs of each type of decision outcome – but these are very hard to estimate accurately. You don’t just maximize accuracy. • Need prevalence • For mammography example • TPF: prolongation of life minus treatment cost • FPF: diagnostic work-up cost, anxiety • TNF: peace of mind • FNF: delay in treatment => shortened life • Hypothetical assignment of utilities for some decision threshold T: • UtilityT= U(TPF) x TPF x Pr + U(FPF) x FPF x (1-Pr) + U(TNF) x TNF x (1-Pr) + U(FNF) x FNF x Pr • U(TPF) = 100, U(FPF) = -10, U(TNF) = 4, U(FNF) = -20 • UtilityT= 100 x .95 x .05 – 10 x .50 x .95 + 4 x .50 x .95 – 20 x .05 x .05 = 1.85 • Now if we only knew how to trade off TPF versus FPF, we could optimize (?) medical performance.

Utility maximization(mammography example)

Choice of ROC operating point through utility analysis—screening mammography

Utility maximization(mammography example)

Utility maximization calculation u = (UTPFTPF+UFNFFNF)PR+(UTNFTNF+UFPFFPF)(1-PR) =(UTPFTPF+UFNF(1-TPF))PR+(UTNF(1-FPF)+UFPFFPF)(1-PR) du/dFPF=(UFPF-UTNF)(1-PR)+(UTPF-UFNF)PRdTPF/dFPF =0  dTPF/dFPF=(UTNF-UFPF)(1-PR)/(UTPF-UFNF)PR PR=.005  dTPF/dFPF = 23. PR=.05  dTPF/dFPF = 2.2 (UTPF=100, UFNF=-20, UTNF=4, UFPF=-20)

Threshold Normal cases Entire ROC curve ROC slope TPF, sensitivity Abnormal cases FPF, 1-specificity

Estimators • TPF, FPF, TNF, FNF, Accuracy, the ROC curve, and AUC are all fractions or probabilities. • Normally we have a finite sample of subjects on which to test our CIO. From this finite sample we try to estimate the above fractions • These estimates will vary depending upon the sample selected (statistical variation). • Estimates can be nonparametric or parametric

Number of abnormals that would be selected by CIO in the population Number of abnormals that were selected by CIO in the sample Number of abnormals in the population Number of abnormals in the sample Estimators • TPF= • TPF= • Number in sample << Number in population (at least in theory)

II. Receiver Operating Characteristic (ROC) • Receiver Operating Characteristic • Binary Classification • Test result is compared to a threshold

Distribution of CIO Output for all Subjects Threshold Computational intelligence observer output

Distribution of Output for Normal / Class 0 Subjects, p(t|0) Distribution of Output for Abnormal / Class 1 Subjects, p(t|1) Threshold t-axis Computational intelligence observer output

Distribution of Output for Normal / Class 0 Subjects, p(t|0) Threshold Abnormal / Class 1 subjects

Distribution of Output for Normal / Class 0 Subjects, p(t|0) Specificity = True Negative Fraction = TNF Threshold Abnormal / Class 1 subjects Sensitivity = True Positive Fraction = TPF

Normal / Class 0 subjects Specificity Decision D0 D1 TNF 0.50 Threshold Truth H1 H0 TPF 0.95 Abnormal / Class 1 subjects Sensitivity

Normal / Class 0 subjects 1 - Specificity = False Positive Fraction = FPF Threshold Abnormal / Class 1 subjects 1 - Sensitivity = False Negative Fraction = FNF

Normal / Class 0 subjects 1 - Specificity Decision D0 D1 TNF 0.50 FPF 0.50 Threshold Truth H1 H0 FNF 0.05 TPF 0.95 Abnormal / Class 1 subjects 1 - Sensitivity

Normal / Class 0 subjects high sensitivity TPF, sensitivity Threshold Abnormal / Class 1 subjects FPF, 1-specificity

Normal / Class 0 subjects sensitivity = specificity TPF, sensitivity Threshold Abnormal / Class 1 subjects FPF, 1-specificity

Normal / Class 0 subjects TPF, sensitivity Threshold high specificity Abnormal / Class 1 subjects FPF, 1-specificity

Which CIO is best? Normal / Class 0 subjects CIO #3 CIO #2 TPF, sensitivity Threshold CIO #1 Abnormal / Class 1 subjects FPF, 1-specificity

Do not compare rates of one class, e.g. TPF, at different rates of the other class (FPF). Normal / Class 0 subjects CIO #3 CIO #2 TPF, sensitivity Threshold CIO #1 Abnormal / Class 1 subjects FPF, 1-specificity

Threshold Normal / Class 0 subjects Entire ROC curve TPF, sensitivity Abnormal / Class 1 subjects FPF, 1-specificity

AUC=0.98 Entire ROC curve chance line TPF, sensitivity AUC=0.85 Discriminability -or- CIO performance FPF, 1-specificity AUC=0.5

AUC (Area under ROC Curve) • AUC is a separation probability • AUC = probability that • CIO output for abnormal > CIO output for normal • CIO correctly tells which of 2 subjects is normal • Estimating AUC from finite sample • Select abnormal subject score = xi • Select normal subject score = yk • Is xi > yk ? • Average over all x,y:

David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014

David G. Brown and Frank Samuelson Center for Devices and Radiological Health, FDA 6 July 2014

Presentation Transcript

Radiological Dispersion Devices and Nuclear Weapons An Overview

Radiological Devices Advisory Committee Meeting

Center for Biologics Evaluation and Research FDA

David W. Feigal, M.D., M.P.H. Director Center for Devices and Radiological Health, US Food and Drug Administration

by David G. Brown, VP and Dean (ICCEL) Wake Forest University (brown@wfu.edu)

JULY 4 – 6, 2014

MEDICAL DEVICES: GOING HOME Food and Drug Administration Center for Devices and Radiological Health June 2004

Medical Devices - Center for Devices and Radiological Health (CDRH)

July 6, 2014

Office of Science and Technology Center for Devices and Radiological Health, FDA

By David G. Brown Vice President and Dean International Center for Computer Enhanced Learning

Center for Devices and Radiological Health MassMedic May 2006

Center for Biologics Evaluation and Research FDA

Federal Radiological Monitoring and Assessment Center (FRMAC)

By David G. Brown WFU VP and ICCEL Dean May 18, 1999

Radiological Dispersion Devices and Nuclear Weapons An Overview

July 6 - 12, 2014

Selecting House Fitness center Health and fitness Devices

Buy Engine Parts for Case and David Brown

FDA Regulation of Pharmaceuticals and Devices

FDA Regulation of Pharmaceuticals and Devices

Office of Science and Technology Center for Devices and Radiological Health, FDA