1 / 49

Learning From Examples

Learning From Examples. AIMA Chapter 18. Outline. Motivation for Learning Supervised Learning Expert Systems. Learning – What and Why?. An agent is said to be learning if it improves its performance on task based on experience/observations/data

mdustin
Download Presentation

Learning From Examples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning From Examples AIMA Chapter 18

  2. Outline • Motivation for Learning • Supervised Learning • Expert Systems

  3. Learning – What and Why? • An agent is said to be learning if it improves its performance on task based on experience/observations/data • must be fixed, must be measurable, must exist • e.g., image recognition, game playing, driving in urban environments… Reasons for learning: • Unknown, dynamically changing task environments • Hard to encode all knowledge (e.g., face recognition) • Easier to program

  4. ML is Everywhere! Computational Biology Natural Language Processing Targeted Advertising Face Recognition

  5. Designing Learning Elements

  6. Supervised Learning Key idea: Learn an unknown function from examples

  7. Supervised Learning – a Probabilistic Take Key idea: Learn an unknown function from examples

  8. Probably Approximately Correct Learning Key idea: Learn an unknown function from examples

  9. Learning a Classifier • Our goal is to find a function that fits the data: what is the likeliest function to have generated our dataset?

  10. I. III. II. Simple Explanations – “Occam’s Razor” vs. Low error rate

  11. Choosing Simple Hypotheses • Generalize better in the presence of noisy data • Faster to search through simple hypothesis space • Easier and faster to use simple hypothesis

  12. Linear Classifiers Assumption: data was generated by some linear function is called a linear classifier.

  13. Linear Classifiers Question: given a dataset, how would we determine which linear classifier is “good”?

  14. Least Squared Error Let Empirical error: given a data point labelled , our classifier is wrong if Objective: “find that minimize the total number of mistakes on dataset” Other loss functions are possible!

  15. Support Vector Machines Many candidates for a linear function minimizing LSE; which one should we pick?

  16. Support Vector Machines Question: why is the middle line “good”?

  17. Support Vector Machines One possible approach: find a hyperplane that is “perturbation resistant”

  18. Support Vector Machines • The distance between the hyperplane and some is • Given a dataset labeled , the margin of with respect to the dataset is the minimal distance between the hyperplane defined by and the datapoints.

  19. Support Vector Machines Assume that . To find the best hyperplane, solve: Subject to “Maximize the margin, but do not misclassify!”

  20. Support Vector Machines Subject to Suppose that are an optimal solution. Then there is at least one for which . Thus the target can be replaced with and the optimal solution doesn’t change

  21. Regret Minimization

  22. Regret Minimization A group of experts; choosing one results in loss of either

  23. Regret We want to do well – benchmark against something! • Do at least as well as the best algorithm? • Do at least as well as the best expert?

  24. Best algorithm: pick expert 1 in rounds 1 & 2, and expert 2 in rounds 3 & 4 Best expert: expert 2 did the best in hindsight!

  25. Regret Minimization

  26. Round 1 – Greedy algorithm

  27. The greedy algorithm has regret of ? ? ? ? ? ? ? ? ? ?

  28. Theorem: Proof: let be the set of best actions at time . Whenever greedy incurs a loss and the best action does not, must lose at least one action (the one that greedy chose). Since , this can happen at most times before the best action is chosen.

  29. Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

  30. Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

  31. Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

  32. Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

  33. Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

  34. Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

  35. Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

  36. Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

  37. Round 2 – Randomized Greedy

  38. The randomized greedy algorithm has regret of ? ? ? ? ? ? ? ? ? ?

  39. Multiplicative Weights Updates “The multiplicative weights algorithm was such a great idea, that it was discovered three times” – C. Papadimitriou [Seminar Talk]

  40. The MWU algorithm has regret of ? ? ? ? ? ? ? ? ? ?

  41. Discussion Points • This idea can easily be translated to gains instead of losses • Losses don’t have to be in ; can be any real numbers in a range : but if is very large, affects learning rate. • A (beautiful) first step into the world of expert learning!

More Related