1 / 28

Learning from observations (b)

This text discusses the importance of experimentation protocols and performance measures in machine learning, including the distinction between training and test sets, the use of k-fold evaluation, and the leave-one-out method. It also covers performance evaluation measures such as precision, recall, and confusion matrices, as well as learning curves and the prevention of overfitting. The text explores ensemble methods, learning methods, memory-based learning, and unsupervised learning techniques.

pearlbush
Download Presentation

Learning from observations (b)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning from observations (b) • How good is a machine learner? • Experimentation protocols • Performance measures • Academic benchmarks vs Real Life KI1 / L. Schomaker - 2007

  2. Experimentation protocols • Fooling yourself: training a decision tree on 100 example instances from earth and sending the robot to Mars •  training set / test set distinction • both must be of sufficient size: • large training set for reliable ‘h’ (coefficients etc.) • large test set for reliable prediction of real-lifeperformance KI1 / L. Schomaker - 2007

  3. Experimentation protocols • one training set / one test set, four yearsPhD project: still fooling yourself! • Solution: • training set • test set • final evaluation set with real-life data • k-Fold evaluation: k subsets fromlarge data base, measuring standard deviationof performance over experiments KI1 / L. Schomaker - 2007

  4. Experimentation protocols • What to do if your don’t have enough data? • Solution: • Leave-one-out: use N-1 samples for training, • use the Nth sample for testing • repeat for all samples • compute the average performance KI1 / L. Schomaker - 2007

  5. Performance • Example: % correctly classified samples (P) • Ptrain • Ptest • Preal Ptest KI1 / L. Schomaker - 2007

  6. Performance, two-class KI1 / L. Schomaker - 2007

  7. Performance, two-class KI1 / L. Schomaker - 2007

  8. Performance, two-class Precision = 100 * #correct_hits / #says_Yes [%] Recall = 100 * #correct_hits / #is_Yes [%] KI1 / L. Schomaker - 2007

  9. Performance, multi-class KI1 / L. Schomaker - 2007

  10. Performance, multi-class Confusion matrix KI1 / L. Schomaker - 2007

  11. Rankings / hit lists • Given a query Q, systems returns a hitlist of matches M: an ordered set, with instances i in decreasing likelihood of correctness • Precision: proportion of correct instances M in the hit list • Recall: proportion of correct instances from totalnumber of target samples in the database KI1 / L. Schomaker - 2007

  12. Function approximation • For, e.g. regression models,learning an ‘analog’output • Example: target function t(x) • Obtained output function o(x) • For performance evaluation computeroot-mean square error (RMS error): •  =  (  (o(x)-t(x))2 / N ) KI1 / L. Schomaker - 2007

  13. Learning curves P [% OK] #epochs (presentations of training set) KI1 / L. Schomaker - 2007

  14. Learning curves performance on training set P [% OK] performance on test set #epochs (presentations of training set) KI1 / L. Schomaker - 2007

  15. Learning curves 100% P [% OK] performance on training set performance on test set #epochs (presentations of training set) KI1 / L. Schomaker - 2007

  16. Learning curves 100% P [% OK] performance on training set no generalization,overfit Stop! performance on test set #epochs (presentations of training set) KI1 / L. Schomaker - 2007

  17. Overfitting • The learner learns the training set • Even perfectly, like a lookup table (LUT) •  memorizing training instances • without correctly handling unseen data • Usual cause: more parameters in the learner than in the data KI1 / L. Schomaker - 2007

  18. Preventing Overfit • For good generalization: • number of training examples must be much larger than the number of attributes (features): Nsamples / Nattr >> 1 KI1 / L. Schomaker - 2007

  19. Preventing Overfit • For good generalization: • also: Nsamples >> Ncoefficients e.g.: solving linear equation: 2 coefficients, needing 2 data points in 2D Coefficients: model parameters, weights etc. KI1 / L. Schomaker - 2007

  20. Preventing Overfit • For good generalization: • Ndatavalues >> Ncoefficients Coefficients: model parameters, weights etc. Ndatavalues = Nsamples * Nattributes e.g.: use Ndatavalues/Ncoefficients for system comparison KI1 / L. Schomaker - 2007

  21. Example: machine-print OCR • Very accurate, today, but: • Needs 5000 examples of each character • Printed on ink-jet, laser printers, matrixprinters, fax copies • of many brands of printers • on many paper types •  for 1 font & point size! . . A . . . KI1 / L. Schomaker - 2007

  22. Ensemble methods • Boosting: • train a learner h[m] • weigh each of the instances • weigh the method m • train a new learner h[m+1] • perform majority voting on ensembleopinions KI1 / L. Schomaker - 2007

  23. The advantage of democracy: partly intelligent, independent deciders KI1 / L. Schomaker - 2007

  24. Learning methods • Gradient descent, parameter finding(multi-layer perceptron, regression) • Expectation Maximization (smart Monte Carlo search for best model, given the data) • Knowledge-based, symbolic learning (Version Spaces) • Reinforcement learning • Bayesian learning KI1 / L. Schomaker - 2007

  25. Memory-based ‘learning’ • Lookup-table (LUT) • Nearest neighbour argmin(dist) • k-Nearest neighbour majority(Nargmin(dist,k) KI1 / L. Schomaker - 2007

  26. Unsupervised learning • K-means clustering • Kohonen self-organizing maps (SOM) • Hierarchical clustering KI1 / L. Schomaker - 2007

  27. Summary (1) • Learning needed for unknown environments (and/or) lazy designers • Learning agent = performance element + learning element • Learning method depends on type of performance element, available • feedback, type of component to be improved, and its representation KI1 / L. Schomaker - 2007

  28. Summary (2) • For supervised learning, the aim is to find a simple hypothesis • that is approximately consistent with training examples • Decision tree learning using information gain: entropy-based • Learning performance = prediction accuracy measured on test set(s) KI1 / L. Schomaker - 2007

More Related