1 / 15

A Posteriori Corrections to Classification Methods

A Posteriori Corrections to Classification Methods. Włodzisław Duch & Łukasz Itert Department of Informatics, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk . Motivation.

tanika
Download Presentation

A Posteriori Corrections to Classification Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Posteriori Corrections to Classification Methods Włodzisław Duch & Łukasz Itert Department of Informatics, Nicholas Copernicus University, Torun, Poland. http://www.phys.uni.torun.pl/kmk

  2. Motivation So you’ve got you model … but that’s not the end. Try to derive as much information from it as possible. • In pathological cases NN, FS, RS and other systems lead to results below the base rates. How to avoid it? • In controlled experiment the split was 50%-50%. In real life it is 5-95%. How to deal with it? • So your model is accurate; that doesn’t impress me much.How about the costs? Confidence in results? Sensitivity? Specificity? Can you improve it quickly? A posteriori corrections may help and are (almost) for free.

  3. Corrections increasing accuracy NN, kNN, FIS, RS & others do not estimate probabilities rigorously, but some estimations of p(Ci|X) are obtained. Many systems do not optimize error functions. Idea: linear scaling of probabilities: K classes, CK is the majority class. ki = 0 gives majority classifier, ki = 1 gives original one. Optimize ki = 0.

  4. Softmax If ki  [0,1] then p. of the majority class may only grow. Solution: assume ki  [0,∞], and kK =1, use softmax This will flatten probabilities; for 2 classes: P(C|X)[(1+e-1)-1,(1+e+1)-1][0.27,0.73].

  5. Cost function Pi(X) are “true” probabilities, if given, or 1 if the label of the training vector X is CiPi(X) = 0 otherwise kNNs, Kohonen nets, Decision Trees, many fuzzy and rough systems do not minimize such cost function. Alternative: stacking with linear perceptron.

  6. Cost function with linear rescaling Due to normalization:

  7. Minimum of E() - solution Elegant solution is found in the LMS sense.

  8. Numerical example The primate splice-junction DNA gene sequences: 60 nucleotides, distinguish if there is an intron => exon, exon => intron boundary, or neither. 3190 vectors (2000 training + 1190 test) kNN (k=11, Manhattan) gave initial probabilities. Before correction: 85.8% (train), 85.7% (test) After correction: 86.4% (train), 86.9% (test) k1= 1,0282; k2= 0,8785 MSE improvement: better probabilities, even if not always correct answers.

  9. Changes in the a priori class distribution A priori class distribution is different in training/test data. If data comes from the same process the densities p(X|Ci)=const, posteriors p(Ci|X) change. Bayes theorem for training pt(Ci|X) and test p(Ci|X):

  10. Estimation of a priori probabilities How to estimate new p(Ci) ? Estimate confusion matrix on the training set pt(Ci|Cj) (McLachlan and Basford 1988); estimate ptest(C) from applying classifier to test data. Solve linear equations: Experiment: use MLP on small 50-50 training sample

  11. What to optimize? Overall accuracy is not always the most important thing to optimize. Given a model M, confusion matrix for a class + and all other classes is (rows=true, columns=predicted by M):

  12. Quantities derived from p(Ci|Cj) Several quantitiesare used to evaluate classification models M created to distinguish C+ class:

  13. Error functions Best classifier selected using Recall (Precision) curves or ROC curves Sensitivity(1-Specificity), i.e. S+(1-S-) Confidence in M may be increased by rejecting some cases

  14. Errors and costs Optimization with explicit costs: For a = 0 this is equivalent to maximization of and for large a to the maximization of

  15. Conclusions Applying a trained model in real world application does not end with classification, it may be only the beginning . 3 types of corrections to optimize the final model have been considered: • a posteriori, improving accuracy by scaling probabilities • restoring the balance between the training/test distributions • improving confidence, selectivity or specificity of results. They are especially useful for optimization of logical rules. They may be combined, for example a posteriori corrections may be applied to accuracy for a chosen class (sensitivity), confidence, cost optimization etc.

More Related