10 likes | 108 Views
A Bootstrap Interval Estimator for Bayes’ Classification Error. Chad M. Hawes a,b , Carey E. Priebe a. a The Johns Hopkins University, Dept of Applied Mathematics & Statistics. b The Johns Hopkins University Applied Physics Laboratory. Abstract. Theory. PMH Distribution.
E N D
A Bootstrap Interval Estimator for Bayes’ Classification Error Chad M. Hawesa,b, Carey E. Priebea a The Johns Hopkins University, Dept of Applied Mathematics & Statistics b The Johns Hopkins University Applied Physics Laboratory Abstract Theory PMH Distribution • No approach to estimate Bayes’ error can work for all joint distributions FXY: • Devroye 1982: For any (fixed) integer n, e>0, and classification rule gn there exists a distribution FXY with Bayes’ error L*=0 such that • there exist conditions on FXY for which our technique applies • AsymptotickNN-rule error rates form an interval bound on L*: • Devijver 1979: For fixed k: , where lower bound is asymptotic error rate of the kNN-rule with reject option (Hellman 1970) • if estimate asymptotic rates w/ finite sample, we have L* estimate • KNN-rule’s unconditional error follows known form for class of distributions FXY: • Snapp & Venkatesh 1998: Under regularity conditions on FXY, the finite sample unconditional error rate of the kNN-rule, for fixed k, follows the asymptotic expansion • there exists known parametric form for kNN-rule’s error rate decay • Given finite length classifier training set, we propose a new estimation approach that provides an interval estimate of the Bayes’-optimal classification error L*, by: • Assuming power-law decay for unconditional error rate of k-nearest neighbor (kNN) classifier • Constructing bootstrap-sampled training sets of varying size • Evaluating kNN classifier on bootstrap training sets to estimate unconditional error rate • Fitting resulting kNN error rate decay as function of training set size to assumed power-law form • Standard kNN rule provides upper bound on L* • Hellman’s (k,k’) nearest neighbor rule with reject option provides lower bound on L* • Result is asymptotic interval estimate of L* using finite sample • We apply this L* interval estimator to two classification datasets Priebe, Marchette, Healy (PMH) distribution has known L* = 0.0653, d=6: Training size n = 2000 Test set size m = 2000 Symbols are bootstrap estimates of unconditional error rate Interval estimate: Pima Indians Motivation Approach: Part 1 UCI Pima Indian Diabetes distribution has unknown L*, d=8: • Knowledge of Bayes’-optimal classification error L* tells us the best any classification rule could do on a given classification problem: • Difference between your classifier’s error rate Ln and L* indicates how much improvement is possible by changes to your classifier, for a fixed feature set • If L* is small and |Ln-L*| is large, then it’s worth spending time & money to improve your classifier • Knowledge of Bayes’-optimal classification error L* indicates how good our features are for discriminating between our (two) classes: • If L* is large and |Ln-L*| is small, then better to spend time & money finding better features (changing FXY) than improving your classifier • Estimate of Bayes’ error L* is useful for guiding where to invest time & money for classification rule improvement and feature development • Construct B bootstrap-sampled training datasets of size nj from Dn using • For each bootstrap-constructed training dataset, estimate kNN-rule conditional error rate on test set Tm, yielding • Estimate mean & variance of for training sample size nj: • Mean provides estimate of unconditional error rate • Variance used for weighted fitting of error rate decay curve • Repeat steps 1 and 2 for desired training sample sizes : • Yields estimates • Construct estimated unconditional error rate decay curve versus training sample size n Training size n = 500 Test set size m = 268 Symbols are bootstrap estimates of unconditional error rate Interval estimate: Approach: Part 2 Model & Notation References • Assume kNN-rule error rates decay according to simple power-lay form: • Perform weighted nonlinear least squares fit to constructed error rate curve: • Use variance of bootstrapped conditional error rate estimates as weights • Resulting forms upper bound for L*: • Strong assumption on form of error rate decay enables estimate of asymptotic error rate using only a finite sample • Repeat entire procedure using Hellman’s (k,k’) nearest neighbor rule with reject option to form lower bound estimate for L*: • This yields interval estimate for Bayes’ classification error as [1] Devijver, P. “New error bounds with the nearest neighbor rule,” IEEE Trans. on Informtion Theory, 25, 1979. We have training data: We have testing data: Feature Vector: [2] Devroye, L. “Any discrimination rule can have an arbitrarily bad probability of error for finite sample size,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 4, 1982. Class Label: We build k-nearest neighbor (kNN) classification rule: [3] Hellman, M. “The nearest neighbor classification rule with a reject option,” IEEE Trans. on Systems Science & Cybernetics, 6, 1970. denoted as Conditional probability of error for kNN rule: [4] Priebe, C., D. Marchette, & D. Healy. “Integrated sensing and processing decision trees,” IEEE Trans. on PatternAnalysis & Machine Intelligence, 26, 2004. Finite sample: Asymptotic: [5] Snapp, R. & S. Venkatesh. “Asymptotic expansions of the k nearest neighbor risk,” Annals of Statistics, 26, 1998. Unconditional probability of error for kNN rule: Finite sample: Asymptotic: Empirical distribution puts mass 1/n on n training samples