1 / 37

Taming the Learning Zoo

Taming the Learning Zoo. Supervised Learning Zoo. Bayesian learning (find parameters of a probabilistic model) Maximum likelihood Maximum a posteriori Classification Decision trees (discrete attributes, few relevant) Support vector machines (continuous attributes) Regression

akando
Download Presentation

Taming the Learning Zoo

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Taming the Learning Zoo

  2. Supervised Learning Zoo • Bayesian learning (find parameters of a probabilistic model) • Maximum likelihood • Maximum a posteriori • Classification • Decision trees (discrete attributes, few relevant) • Support vector machines (continuous attributes) • Regression • Least squares (known structure, easy to interpret) • Neural nets (unknown structure, hard to interpret) • Nonparametric approaches • k-Nearest-Neighbors • Locally-weighted averaging / regression

  3. Agenda • Quantifying learner performance • Cross validation • Error vs. loss • Confusion matrix • Precision & recall • Computational learning theory

  4. Cross-Validation

  5. Assessing Performance of a Learning Algorithm • Samples from X are typically unavailable • Take out some of the training set • Train on the remaining training set • Test on the excluded instances • Cross-validation

  6. - - + - + - - - - + + + + - - + + + + - - - + + Cross-Validation • Split original set of examples, train Examples D Train Hypothesis space H

  7. - - - - + + + + + + - - + Cross-Validation • Evaluate hypothesis on testing set Testing set Hypothesis space H

  8. Cross-Validation • Evaluate hypothesis on testing set Testing set - + + - + + + Test - + + - - - Hypothesis space H

  9. - - - - + + + + + + - - + Cross-Validation • Compare true concept against prediction 9/13 correct Testing set - + + - + + + - + + - - - Hypothesis space H

  10. Common Splitting Strategies • k-fold cross-validation • Leave-one-out (n-fold cross validation) Dataset Train Test

  11. Computational complexity • k-fold cross validation requires • k training steps on n(k-1)/k datapoints • k testing steps on n/k datapoints • (There are efficient ways of computing L.O.O. estimates for some nonparametric techniques, e.g. Nearest Neighbors) • Average results reported

  12. Bootstrapping • Similar technique for estimating the confidence in the model parameters • Procedure: • Draw k hypothetical datasets from original data. Either via cross validation or sampling with replacement. • Fit the model for each dataset with k parameters k • Return the standard deviation of 1,…,k (or a confidence interval) Can also estimate confidence in a prediction y=f(x)

  13. Example: average of N numbers • Data D={x(1),…,x(N)}, model is constant  • Learning: minimize E() = i(x(i)-)2=> compute average • Repeat for j=1,…,k : • Randomly sample subset x(1)’,…,x(N)’ from D • Learn j = 1/N i x(i)’ • Return histogram of 1,…,j

  14. Beyond Error Rates

  15. Beyond Error Rate • Predicting security risk • Predicting “low risk” for a terrorist, is far worse than predicting “high risk” for an innocent bystander (but maybe not 5 million of them) • Searching for images • Returning irrelevant images is worse than omitting relevant ones

  16. Biased Sample Sets • Often there are orders of magnitude more negative examples than positive • E.g., all images of Mark Wilson on Facebook • If I classify all images as “not Mark” I’ll have >99.99% accuracy • Examples of Mark should count much more than non-Mark!

  17. False Positives True concept Learned concept x2 x1

  18. False Positives An example incorrectly predicted to be positive True concept Learned concept x2 New query x1

  19. False Negatives An example incorrectly predicted to be negative True concept Learned concept x2 New query x1

  20. Precision vs. Recall • Precision • # of relevant documents retrieved / # of total documents retrieved • Recall • # of relevant documents retrieved / # of total relevant documents • Numbers between 0 and 1

  21. Precision vs. Recall • Precision • # of true positives / (# true positives + # false positives) • Recall • # of true positives / (# true positives + # false negatives) • A precise classifier is selective • A classifier with high recall is inclusive

  22. Option 1: Classification Thresholds • Many learning algorithms (e.g., linear models, NNets, BNs, SVM) give real-valued output v(x) that needs thresholding for classification v(x) > t => positive label given to x v(x) < t => negative label given to x • May want to tune threshold to get fewer false positives or false negatives

  23. Reducing False Positive Rate True concept Learned concept x2 x1

  24. Reducing False Negative rate True concept Learned concept x2 x1

  25. Loss functions & Weighted datasets • General learning problem: “Given data D and loss function L, find the best hypothesis from hypothesis class H” • Loss functions: L contains weights to favor accuracy on positive or negative examples • E.g., L= 10 E++ 1 E- • Weighted datasets: attach a weight w to each example to indicate how important it is • Or construct a resampled dataset D’ where each example is duplicated proportionally to its w

  26. Precision-Recall curves Measure Precision vs Recall as tolerance (or weighting) is tuned Perfect classifier Recall Actual performance Precision

  27. Precision-Recall curves Measure Precision vs Recall as tolerance (or weighting) is tuned Recall Penalize false negatives Equal weight Penalize false positives Precision

  28. Precision-Recall curves Measure Precision vs Recall as tolerance (or weighting) is tuned Recall Precision

  29. Precision-Recall curves Measure Precision vs Recall as tolerance (or weighting) is tuned Recall Better learningperformance Precision

  30. Model Selection

  31. Complexity Vs. Goodness of Fit • More complex models can fit the data better, but can overfit • Model selection: enumerate several possible hypothesis classes of increasing complexity, stop when cross-validated error levels off • Regularization: explicitly define a metric of complexity and penalize it in addition to loss

  32. Model Selection with k-fold Cross-Validation • Parameterize learner by a complexity level C • Model selection pseudocode: • For increasing levels of complexity C: • errT[C],errV[C] = Cross-Validate(Learner,C,examples) • If errT has converged, • Find value Cbest that minimizes errV[C] • Return Learner(Cbest,examples)

  33. Regularization • Minimize: • Cost(h) = Loss(h) +  Complexity(h) • Example with linear models y = Tx: • L2 error: Loss() = i (y(i)-Tx(i))2 • Lq regularization: Complexity(): j|j|q • L2 and L1 are most popular in linear regularization • L2regularization leads to simple computation of optimal  • L1 is more complex to optimize, but produces sparse models in which many coefficients are 0!

  34. Other topics in Machine Learning • Unsupervised learning • Dimensionality reduction • Clustering • Reinforcement learning • Agent that acts and learns how to act in an environment by observing rewards • Learning from demonstration • Agent that acts and learns how to act in an environment by observing demonstrations from an expert

  35. Issues in Practice • The distinctions between learning algorithms diminish when you have a lot of data • The web has made it much easier to gather large scale datasets than in early days of ML • Understanding data with many more attributes than examples is still a major challenge! • Do humans just have really great priors?

  36. Project Midterm Report • Due 11/10 • ~1 page description of current progress, challenges, changes in direction

  37. Next Lectures • Intelligent agents (R&N 2) • Decision-theoretic planning • Reinforcement learning • Applications of AI

More Related