150 likes | 272 Views
A Posteriori Corrections to Classification Methods. Włodzisław Duch & Łukasz Itert Department of Informatics, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk . Motivation.
E N D
A Posteriori Corrections to Classification Methods Włodzisław Duch & Łukasz Itert Department of Informatics, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk
Motivation So you’ve got you model … but that’s not the end. Try to derive as much information from it as possible. • In pathological cases NN, FS, RS and other systems lead to results below the base rates. How to avoid it? • In controlled experiment the split was 50%-50%. In real life it is 5-95%. How to deal with it? • So your model is accurate; that doesn’t impress me much.How about the costs? Confidence in results? Sensitivity? Specificity? Can you improve it quickly? A posteriori corrections may help and are (almost) for free.
Corrections increasing accuracy NN, kNN, FIS, RS & others do not estimate probabilities rigorously, but some estimations of p(Ci|X) are obtained. Many systems do not optimize error functions. Idea: linear scaling of probabilities: K classes, CK is the majority class. ki = 0 gives majority classifier, ki = 1 gives original one. Optimize ki = 0.
Softmax If ki [0,1] then p. of the majority class may only grow. Solution: assume ki [0,∞], and kK =1, use softmax This will flatten probabilities; for 2 classes: P(C|X)[(1+e-1)-1,(1+e+1)-1][0.27,0.73].
Cost function Pi(X) are “true” probabilities, if given, or 1 if the label of the training vector X is CiPi(X) = 0 otherwise kNNs, Kohonen nets, Decision Trees, many fuzzy and rough systems do not minimize such cost function. Alternative: stacking with linear perceptron.
Cost function with linear rescaling Due to normalization:
Minimum of E() - solution Elegant solution is found in the LMS sense.
Numerical example The primate splice-junction DNA gene sequences: 60 nucleotides, distinguish if there is an intron => exon, exon => intron boundary, or neither. 3190 vectors (2000 training + 1190 test) kNN (k=11, Manhattan) gave initial probabilities. Before correction: 85.8% (train), 85.7% (test) After correction: 86.4% (train), 86.9% (test) k1= 1,0282; k2= 0,8785 MSE improvement: better probabilities, even if not always correct answers.
Changes in the a priori class distribution A priori class distribution is different in training/test data. If data comes from the same process the densities p(X|Ci)=const, posteriors p(Ci|X) change. Bayes theorem for training pt(Ci|X) and test p(Ci|X):
Estimation of a priori probabilities How to estimate new p(Ci) ? Estimate confusion matrix on the training set pt(Ci|Cj) (McLachlan and Basford 1988); estimate ptest(C) from applying classifier to test data. Solve linear equations: Experiment: use MLP on small 50-50 training sample
What to optimize? Overall accuracy is not always the most important thing to optimize. Given a model M, confusion matrix for a class + and all other classes is (rows=true, columns=predicted by M):
Quantities derived from p(Ci|Cj) Several quantitiesare used to evaluate classification models M created to distinguish C+ class:
Error functions Best classifier selected using Recall (Precision) curves or ROC curves Sensitivity(1-Specificity), i.e. S+(1-S-) Confidence in M may be increased by rejecting some cases
Errors and costs Optimization with explicit costs: For a = 0 this is equivalent to maximization of and for large a to the maximization of
Conclusions Applying a trained model in real world application does not end with classification, it may be only the beginning . 3 types of corrections to optimize the final model have been considered: • a posteriori, improving accuracy by scaling probabilities • restoring the balance between the training/test distributions • improving confidence, selectivity or specificity of results. They are especially useful for optimization of logical rules. They may be combined, for example a posteriori corrections may be applied to accuracy for a chosen class (sensitivity), confidence, cost optimization etc.