Performance Evaluation: Estimation of Recognition rates

Machine Learning Performance Evaluation Performance Evaluation:Estimation ofRecognition rates • J.-S. Roger Jang ( 張智星 ) • CSIE Dept., National Taiwan Univ. • http://mirlab.org/jang • jang@mirlab.org

Outline • Performance index of a given model/classifier • Accuracy (recognition rate) • Computation load • Methods to estimate the recognition rate • Inside test • One-sided holdout test • Two-sided holdout test • M-fold cross validation • Leave-one-out test

Performance Index • Performance indices of a model • Recognition rate • Requires an objective procedure to derive it • Computation load • Design-time computation • Run-time computation • Our focus • Recognition rate and the procedures to derive it

Methods for Deriving Recognition rates • Methods to derive the recognition rates • Inside test (resubstitution recog. rate) • One-sided holdout test • Two-sided holdout test • M-fold cross validation • Leave-one-out test (Jackknife procedure) • Data partitioning • Training set • Training and test sets • Training, validating, and test sets

Inside Test • Dataset partitioning • Use the whole dataset for training & evaluation • Recognition rate • Inside-test recognition rate • Resubstitution accuracy

Inside Test (2) • Characteristics • Too optimistic since RR tends to be higher • For instance, 1-NNC always has an RR of 100%! • Can be used as the upper bound of the true RR. • Potential reasons for low inside-test RR: • Bad features of the dataset • Bad method for model construction, such as • Bad results from neural network training • Bad results from k-means clustering

One-side Holdout Test • Dataset partitioning • Training set for model construction • Test set for performance evaluation • Recognition rate • Inside-test RR • Outside-test RR

One-side Holdout Test (2) • Characteristics • Highly affected by data partitioning • Usually Adopted when design-time computation load is high

Two-sided Holdout Test • Dataset partitioning • Training set for model construction • Test set for performance evaluation • Role reversal

construction Data set A Model A RRA evaluation Model B RRB evaluation Data set B construction Two-sided Holdout Test (2) • Two-sided holdout test (used in GMDH) Inside-test RR = (RRA + RRB)/2

Two-sided Holdout Test (3) • Characteristics • Better usage of the dataset • Still highly affected by the partitioning • Suitable for models/classifiers with high design-time computation load

M-fold Cross Validation • Data partitioning • Partition the dataset into m fold • One fold for test, the other folds for training • Repeat m times

M-fold Cross Validation (2) construction . . . . . . m disjoint sets Model k evaluation RRk . . . CV RR = (Sk RRk)/m

M-fold Cross Validation (3) • Characteristics • When m=2  Two-sided holdout test • When m=n  Leave-one-out test • The value of m depends on the computation load imposed by the selected model/classifier.

Leave-one-out Cross Validation • Data partitioning • When m=n and Si=(xi, yi)

Leave-one-out Cross Validation (2) • Leave-one-out CV (a.k.a. jackknife procedure) construction . . . . . . n i/o pairs Model k evaluation . . .

Leave-one-out Cross Validation (3) • General method for LOOCV • Perform model construction (as a blackbox) n times  Slow! • To speed up the computation LOOCV • Construct a common part that will be used repeatedly, such as • Global mean and covariance for QC • More info of cross-validation on Wikipedia

Applications and Misuse of CV • Applications of CV • Input (feature) selection • Model complexity determination • Performance comparison among different models • Misuse of CV • Do not try to boost validation RR too much, or you are running the risk of indirectly training the left-out data!

Performance Evaluation: Estimation of Recognition rates