200 likes | 349 Views
F. Provost and T. Fawcett. Analysis and visualization of classifier performance Comparison under Imprecise CLASS AND COST DISTRIBUTIONS. Ramazan Bitirgen CSL - ECE. Confusion Matrix. Introduction. Data mining requires: Experiments with a wide variety of learning algorithms
E N D
F. Provost and T. Fawcett Analysis and visualization of classifier performanceComparison under Imprecise CLASS AND COST DISTRIBUTIONS Ramazan Bitirgen CSL - ECE
Confusion Matrix Bitirgen - CS678
Introduction • Data mining requires: • Experiments with a wide variety of learning algorithms • Using different algorithm parameters • Varying output threshold values • Using different training regimens • Using accuracy alone is inadequate because: • Class distributions are skewed • Misclassification (FP, FN) costs are not uniform Bitirgen - CS678
Class Distributions -Problems with Acc. • … assumes that class distribution among examples is constant and relatively balanced (-which is not the case in real life-) • Classifiers are generally used to scan ‘large number of normal entities’ to find ‘small number of unusual ones’ • Looking for defrauded customers • Checking an assembly line • Skews of 106 were reported (Clearwater & Stern 1991) Bitirgen - CS678
Misclassification Costs -Problems with Acc. • ‘Equal error costs’ does not hold in real life problems • Disease tests, fraud detection… • Instead of maximizing the accuracy, we need to minimize the error cost. Cost = FP•c(Y,n) + FN•c(N,p) Bitirgen - CS678
ROC Plot and ROC Area • Receiver Operator Characteristic • Developed in WWII to statistically model “false positive” and “false negative” detections of radar operators • Becoming more popular in ML and standard measure in medicine and biology • However does poor job on deciding the choice of classifiers Bitirgen - CS678
ROC graph of four classifiers Informally a point in ROC space is better than the other if it is to the northwest. Bitirgen - CS678
Iso-performance Lines • Expected cost of a classification by a classifier (FP,TP): • Therefore, two points have the same performance if • Iso-perf. line: All classifiers corresponding to points on the line have the same expected cost. Bitirgen - CS678
ROC Convex Hull • If a point is not on the convex hull the classifier represented by that point cannot be optimal. • In this example B and D cannot be optimal because none or their points are on the convex hull. Bitirgen - CS678
How to use the ROC Convex Hull • p(n):p(p) = 10:1 • Scenario A: • c(N,p) = c(Y,n) • m(iso_perf) = 10 • Scenario B: • c(N,p) = 100 • c(Y,n) • m(iso_perf) = 0.1 Bitirgen - CS678
Adding New Classifiers • Adding new classifiers may or may not extend the existing hull. • E may be optimal under some circumstances since it extends the hull • F and G cannot be optimal Bitirgen - CS678
What if distributions & costs are unknown? • ROC convex hull gives us an idea about all classifiers that may be optimal under any conditions. • With complete information the method identifies the optimal classifiers. • In between ? Bitirgen - CS678
Sensitivity Analysis • Imprecise distribution info defines a range of slopes for iso-perf lines. • p(n):p(p) = 10:1 • Scenario C: • $5 < c(Y,n) < $10 • $500 < c(N,p) < $1000 • 0.05 < m(iso_perf) < 0.2 Bitirgen - CS678
Sensitivity Analysis - 2 • Imprecise distribution info defines a range of slopes for iso-perf lines. • p(n):p(p) = 10:1 • Scenario D: • 0.2 < m(iso_perf) < 2 Bitirgen - CS678
Sensitivity Analysis - 3 • Can “do nothing” strategy be better than any of the available classifiers? Bitirgen - CS678
Conclusion • Accuracy alone as a performance metric is incapable for various reasons • ROC plots give more accurate information about the performance of classifiers • ROC convex hull method • Is an efficient solution to the problem of comparing multiple classifiers in imprecise environments • Allows us to incorporate new classifiers easily • Allows us to select the classifiers that are potentially optimal Bitirgen - CS678