1 / 11

Improving the Reliability of Decision Trees and Naïve Bayes Learners

ICDM Nov 2004. Improving the Reliability of Decision Trees and Naïve Bayes Learners. David Lindsay and Si â n Cox Computer Learning Research Centre, Royal Holloway University of London, Egham, Surrey, UK PhD Supervisors: Alex Gammerman and Volodya Vovk. Motivation and Outline.

stu
Download Presentation

Improving the Reliability of Decision Trees and Naïve Bayes Learners

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICDM Nov 2004 Improving the Reliability of Decision Trees and Naïve Bayes Learners David Lindsay and Siân Cox Computer Learning Research Centre, Royal Holloway University of London, Egham, Surrey, UK PhD Supervisors: Alex Gammerman and Volodya Vovk

  2. Motivation and Outline Aim: To assess the improvement of the quality of probability forecasts by the VPM meta-learner applied to Naïve Bayes and C4.5 base-learners. • Define problem of probability forecasting. • Introduce resolution and reliability. • Detail methods we used for assessing reliability – focus on ERC plots. • Summarise results. • Discuss conclusions and future work.

  3. Reliable Probability Forecasts • Pattern recognition  Probability Forecasting • Good quality forecasts should be: • Resolute: forecasts useful for ranking labels in order of likelihood of occurring • Reliable: do not lie, labels assigned with forecast ≈ p should occur with frequency ≈ p (a.k.a. ) calibrated Predicted Probability We focus on Reliability! We focus on Reliability! Empirical frequency

  4. Methods for Assessing Quality of Probability Forecasts

  5. Empirical Reliability Curves (ERC) ERC Plots Generated from Naïve Bayes learners’ forecasts on Abdominal Pain Medical Dataset Naïve Bayes VPM Naïve Bayes ERC Dev. Area = 0.153 ERC Dev. Area = 0.006

  6. Learners Tested • Naïve Bayes and C4.5 Decision Tree • Meta-learners applied: • Binning • Venn Probability Machine (VPM) • Laplace (just to C4.5) • Tested on multi-class and binary data from UCI repository

  7. Results • Notice that VPM C4.5 error rate slightly more than Binning C4.5 0.1% • But… VPM performs best in improving reliability! • And… VPM improves resolution  overall loss is improved

  8. Conclusions • VPM is slow, but gives good improvement in reliability and resolution. • ERC nice visualisation and measure of reliability individually. • Reliability should not be overlooked, classification accuracy is not always useful to look at!

  9. Current and Future Work • Have tested much larger set of meta-learners: • ERC re-calibration • Bagging • Boosting • Find Best Weights (FBW) • Tested other base learners: • SVM • Neural Networks • K-Nearest Neighbours • Bayesian Belief Networks • Developed extension to WEKA

  10. Bibliography • A. P. Dawid. Calibration-based empirical probability (with discussion). Annals of Statistics, 13:1251-1285, 1985 • V. Vovk, G. Shafer, and I. Nouretdinov. Self-calibrating probability forecasting. In Advances in Neural Information Processing Systems 16, 2003 • D. Lindsay. Visualising and improving reliability – a machine learning perspective. CLRC-TR-04-01, Royal Holloway University, England, 2004 • A. H. Murphy. A new vector parturition of the probability score. Journal of Applied Meteorology, 12:595-600 1973

  11. VPM Compared With Underlying Naïve Bayes Learner Key: Predcited = underlined, Actual class =

More Related