520 likes | 548 Views
Learning From Examples. AIMA Chapter 18. Outline. Motivation for Learning Supervised Learning Expert Systems. Learning – What and Why?. An agent is said to be learning if it improves its performance on task based on experience/observations/data
E N D
Learning From Examples AIMA Chapter 18
Outline • Motivation for Learning • Supervised Learning • Expert Systems
Learning – What and Why? • An agent is said to be learning if it improves its performance on task based on experience/observations/data • must be fixed, must be measurable, must exist • e.g., image recognition, game playing, driving in urban environments… Reasons for learning: • Unknown, dynamically changing task environments • Hard to encode all knowledge (e.g., face recognition) • Easier to program
ML is Everywhere! Computational Biology Natural Language Processing Targeted Advertising Face Recognition
Supervised Learning Key idea: Learn an unknown function from examples
Supervised Learning – a Probabilistic Take Key idea: Learn an unknown function from examples
Probably Approximately Correct Learning Key idea: Learn an unknown function from examples
Learning a Classifier • Our goal is to find a function that fits the data: what is the likeliest function to have generated our dataset?
I. III. II. Simple Explanations – “Occam’s Razor” vs. Low error rate
Choosing Simple Hypotheses • Generalize better in the presence of noisy data • Faster to search through simple hypothesis space • Easier and faster to use simple hypothesis
Linear Classifiers Assumption: data was generated by some linear function is called a linear classifier.
Linear Classifiers Question: given a dataset, how would we determine which linear classifier is “good”?
Least Squared Error Let Empirical error: given a data point labelled , our classifier is wrong if Objective: “find that minimize the total number of mistakes on dataset” Other loss functions are possible!
Support Vector Machines Many candidates for a linear function minimizing LSE; which one should we pick?
Support Vector Machines Question: why is the middle line “good”?
Support Vector Machines One possible approach: find a hyperplane that is “perturbation resistant”
Support Vector Machines • The distance between the hyperplane and some is • Given a dataset labeled , the margin of with respect to the dataset is the minimal distance between the hyperplane defined by and the datapoints.
Support Vector Machines Assume that . To find the best hyperplane, solve: Subject to “Maximize the margin, but do not misclassify!”
Support Vector Machines Subject to Suppose that are an optimal solution. Then there is at least one for which . Thus the target can be replaced with and the optimal solution doesn’t change
Regret Minimization A group of experts; choosing one results in loss of either
Regret We want to do well – benchmark against something! • Do at least as well as the best algorithm? • Do at least as well as the best expert?
Best algorithm: pick expert 1 in rounds 1 & 2, and expert 2 in rounds 3 & 4 Best expert: expert 2 did the best in hindsight!
The greedy algorithm has regret of ? ? ? ? ? ? ? ? ? ?
Theorem: Proof: let be the set of best actions at time . Whenever greedy incurs a loss and the best action does not, must lose at least one action (the one that greedy chose). Since , this can happen at most times before the best action is chosen.
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
The randomized greedy algorithm has regret of ? ? ? ? ? ? ? ? ? ?
Multiplicative Weights Updates “The multiplicative weights algorithm was such a great idea, that it was discovered three times” – C. Papadimitriou [Seminar Talk]
The MWU algorithm has regret of ? ? ? ? ? ? ? ? ? ?
Discussion Points • This idea can easily be translated to gains instead of losses • Losses don’t have to be in ; can be any real numbers in a range : but if is very large, affects learning rate. • A (beautiful) first step into the world of expert learning!