A Bootstrap Interval Estimator for Bayes’ Classification Error

A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawesa,b, Carey E. Priebea a The Johns Hopkins University, Dept of Applied Mathematics & Statistics b The Johns Hopkins University Applied Physics Laboratory Abstract Theory PMH Distribution • No approach to estimate Bayes’ error can work for all joint distributions FXY: • Devroye 1982: For any (fixed) integer n, e>0, and classification rule gn there exists a distribution FXY with Bayes’ error L*=0 such that • there exist conditions on FXY for which our technique applies • AsymptotickNN-rule error rates form an interval bound on L*: • Devijver 1979: For fixed k: , where lower bound is asymptotic error rate of the kNN-rule with reject option (Hellman 1970) • if estimate asymptotic rates w/ finite sample, we have L* estimate • KNN-rule’s unconditional error follows known form for class of distributions FXY: • Snapp & Venkatesh 1998: Under regularity conditions on FXY, the finite sample unconditional error rate of the kNN-rule, for fixed k, follows the asymptotic expansion • there exists known parametric form for kNN-rule’s error rate decay • Given finite length classifier training set, we propose a new estimation approach that provides an interval estimate of the Bayes’-optimal classification error L*, by: • Assuming power-law decay for unconditional error rate of k-nearest neighbor (kNN) classifier • Constructing bootstrap-sampled training sets of varying size • Evaluating kNN classifier on bootstrap training sets to estimate unconditional error rate • Fitting resulting kNN error rate decay as function of training set size to assumed power-law form • Standard kNN rule provides upper bound on L* • Hellman’s (k,k’) nearest neighbor rule with reject option provides lower bound on L* • Result is asymptotic interval estimate of L* using finite sample • We apply this L* interval estimator to two classification datasets Priebe, Marchette, Healy (PMH) distribution has known L* = 0.0653, d=6: Training size n = 2000 Test set size m = 2000 Symbols are bootstrap estimates of unconditional error rate Interval estimate: Pima Indians Motivation Approach: Part 1 UCI Pima Indian Diabetes distribution has unknown L*, d=8: • Knowledge of Bayes’-optimal classification error L* tells us the best any classification rule could do on a given classification problem: • Difference between your classifier’s error rate Ln and L* indicates how much improvement is possible by changes to your classifier, for a fixed feature set • If L* is small and |Ln-L*| is large, then it’s worth spending time & money to improve your classifier • Knowledge of Bayes’-optimal classification error L* indicates how good our features are for discriminating between our (two) classes: • If L* is large and |Ln-L*| is small, then better to spend time & money finding better features (changing FXY) than improving your classifier • Estimate of Bayes’ error L* is useful for guiding where to invest time & money for classification rule improvement and feature development • Construct B bootstrap-sampled training datasets of size nj from Dn using • For each bootstrap-constructed training dataset, estimate kNN-rule conditional error rate on test set Tm, yielding • Estimate mean & variance of for training sample size nj: • Mean provides estimate of unconditional error rate • Variance used for weighted fitting of error rate decay curve • Repeat steps 1 and 2 for desired training sample sizes : • Yields estimates • Construct estimated unconditional error rate decay curve versus training sample size n Training size n = 500 Test set size m = 268 Symbols are bootstrap estimates of unconditional error rate Interval estimate: Approach: Part 2 Model & Notation References • Assume kNN-rule error rates decay according to simple power-lay form: • Perform weighted nonlinear least squares fit to constructed error rate curve: • Use variance of bootstrapped conditional error rate estimates as weights • Resulting forms upper bound for L*: • Strong assumption on form of error rate decay enables estimate of asymptotic error rate using only a finite sample • Repeat entire procedure using Hellman’s (k,k’) nearest neighbor rule with reject option to form lower bound estimate for L*: • This yields interval estimate for Bayes’ classification error as [1] Devijver, P. “New error bounds with the nearest neighbor rule,” IEEE Trans. on Informtion Theory, 25, 1979. We have training data: We have testing data: Feature Vector: [2] Devroye, L. “Any discrimination rule can have an arbitrarily bad probability of error for finite sample size,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 4, 1982. Class Label: We build k-nearest neighbor (kNN) classification rule: [3] Hellman, M. “The nearest neighbor classification rule with a reject option,” IEEE Trans. on Systems Science & Cybernetics, 6, 1970. denoted as Conditional probability of error for kNN rule: [4] Priebe, C., D. Marchette, & D. Healy. “Integrated sensing and processing decision trees,” IEEE Trans. on PatternAnalysis & Machine Intelligence, 26, 2004. Finite sample: Asymptotic: [5] Snapp, R. & S. Venkatesh. “Asymptotic expansions of the k nearest neighbor risk,” Annals of Statistics, 26, 1998. Unconditional probability of error for kNN rule: Finite sample: Asymptotic: Empirical distribution puts mass 1/n on n training samples

A Bootstrap Interval Estimator for Bayes’ Classification Error

A Bootstrap Interval Estimator for Bayes’ Classification Error

Presentation Transcript

Chapter 6: Errors, Error Detection, and Error Control

Bayes for Beginners

Ch 10 Taxonomy and Classification

Data Mining: Classification

Switched Supervisory Control

Population Health Surveys Bootstrap Hands-on Workshop

Nonparametric Methods III

Recursive Bayes Filtering Advanced AI

The DoD Human Factors Analysis and Classification System (HFACS)

Classification of Epilepsy in Children

Classification of Living Things

Chapter 6. Classification and Prediction

CS490D: Introduction to Data Mining Prof. Chris Clifton

Lecture 3-4: Classification & Clustering

K-nearest neighbor & Naive Bayes

Welcome to the Retired Disability Income Estimator

TECHNICAL TRAINING INVERTER-Y

Classification, nomenclature, taxonomy,identification

Bayes, birds and brains: applications of inference and probabilistic modelling

Text Classification and Na ï ve Bayes

A Bootstrap Interval Estimator for Bayes’ Classification Error