230 likes | 398 Views
ROC Statistics for the Lazy Machine Learner in All of Us. Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005. Why Should I Care?. Imagine you have 2 different probabilistic classification models e.g. logistic regression vs. neural network
E N D
ROCStatistics for the LazyMachine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005
Why Should I Care? • Imagine you have 2 different probabilistic classification models • e.g. logistic regression vs. neural network • How do you know which one is better? • How do you communicate your belief? • Can you provide quantitative evidence beyond a gut feeling and subjective interpretation?
Accuracy • What does this mean? • What is the difference between “accuracy” and an “accurate prediction”? • Contingency Table Interpretation (True Positives) + (True Negatives) (True Positives) + (True Negatives) + (False Positives) + (False Negatives) • Is this a good measure? (Why or Why Not?)
Note on Discrete Classes • TRADITION … Show contingency table when reporting predictions of model. • BUT … probabilistic models do not provide discrete calculations of the matrix cells!!! • IN OTHER WORDS … Regression does not report the number of individuals predicted positive (e.g. has a heart attack) … well, not really • INSTEAD … report probability the output will be certain variable (e.g. 1 or 0)
Visual Perspective ?? ?? ??
ROC Curves • Originated from signal detection theory • Binary signal corrupted by Guassian noise • What is the optimal threshold (i.e. operating point)? • Dependence on 3 factors • Signal Strength • Noise Variance • Personal tolerance in Hit / False Alarm Rate
ROC Curves • Receiver operator characteristic • Summarize & present performance of any binary classification model • Models ability to distinguish between false & true positives
Use Multiple Contingency Tables • Sample contingency tables from range of threshold/probability. • TRUE POSITIVE RATE (also called SENSITIVITY) True Positives (True Positives) + (False Negatives) • FALSE POSITIVE RATE (also called 1 - SPECIFICITY) False Positives (False Positives) + (True Negatives) • Plot Sensitivity vs. (1 – Specificity) for sampling and you are done
ROC Plot LOGISTIC NEURAL
Sidebar: Use More Samples (These are plots from a much larger dataset – See Malin 2005)
ROC Quantification • Area Under ROC Curve • Use quadrature to calculate the area • e.g. trapz (trapezoidal rule) function in Matlab will work • Example – Appears “Neural Network” model is better.
Theory: Model Optimality • Classifiers on convex hull are always “optimal” • e.g. Net & Tree • Classifiers below convex hull are always “suboptimal” • e.g. Naïve Bayes Decision Tree Neural Net Naïve Bayes
Building Better Classifiers • Classifiers on convex hull can be combined to form a strictly dominant hybrid classifier • ordered sequence of classifiers • can be converted into “ranker” Decision Tree Neural Net
Some Statistical Insight • Curve Area: • Take random healthy patient score of X • Take random heart attack patient score of Y • Area estimate of P [Y > X] • Slope of curve is equal to likelihood: P (score | Signal) P (score | Noise) • ROC graph captures all information in conting. table • False negative & true negative rates are complements of true positive & false positive rates, resp.
Can Always QuantifyBest Operating Point • When misclassification costs are equal, best operating point is … • 45 tangent to curve closest to (0,1) coord. • Verify this mathematically (economic interpretation) • Why?
Quick Question • Are ROC curves always appropriate? • Subjective operating points? • Must weight the tradeoffs between false positives and false negatives • ROC curve plot is independent of the class distribution or error costs • This leads into utility theory (not touching this today)
Much Much More on ROC • Oh, if only I had more time. • You should also look up and learn about: • Iso-accuracy lines • Skew distributions and why the 45 line isn’t always “best” • Convexity vs. non-convexity vs. concavity • Mann-Whitney-Wilcoxon sum of ranks • Gini coefficient • Calibrated thresholds • Averaging ROC curves • Precision-Recall (THIS IS VERY IMPORTANT) • Cost Curves
Some References Good Bibliography: http://splweb.bwh.harvard.edu:8000/pages/ppl/zou/roc.html Drummond C and Holte R. What ROC curves can and can’t do (and cost curves can). In Proceedings of the Workshop on ROC Analysis in AI; in conjunction with the European Conference on AI. Valencia, Spain. 2004. Malin B. Probabilistic prediction of myocardial infarction: logistic regression versus simple neural networks. Data Privacy Lab Working Paper WP-25, School of Computer Science, Carnegie Mellon University. Sept 2005. McNeil BJ, Hanley JA. Statistical approaches to the analysis of receiver operating characteristic (ROC) curves. Medical Decision Making. 1984; 4: 137-50. Provost F and Fawcett T. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the 15th International Conference on Machine Learning. Madison, Wisconsin. 1998: 445-453. Swets J. Measuring the accuracy of diagnostic systems. Science. 1988; 240(4857): 1285-1293. (based on his 1967 book Information Retrieval Systems)