520 likes | 550 Views
This guide explores the fundamentals of machine learning, focusing on supervised learning methods such as linear classifiers and support vector machines. Learn how to optimize learning algorithms to improve performance in various tasks with real-world applications.
E N D
Learning From Examples AIMA Chapter 18
Outline • Motivation for Learning • Supervised Learning • Expert Systems
Learning – What and Why? • An agent is said to be learning if it improves its performance on task based on experience/observations/data • must be fixed, must be measurable, must exist • e.g., image recognition, game playing, driving in urban environments… Reasons for learning: • Unknown, dynamically changing task environments • Hard to encode all knowledge (e.g., face recognition) • Easier to program
ML is Everywhere! Computational Biology Natural Language Processing Targeted Advertising Face Recognition
Supervised Learning Key idea: Learn an unknown function from examples
Supervised Learning – a Probabilistic Take Key idea: Learn an unknown function from examples
Probably Approximately Correct Learning Key idea: Learn an unknown function from examples
Learning a Classifier • Our goal is to find a function that fits the data: what is the likeliest function to have generated our dataset?
I. III. II. Simple Explanations – “Occam’s Razor” vs. Low error rate
Choosing Simple Hypotheses • Generalize better in the presence of noisy data • Faster to search through simple hypothesis space • Easier and faster to use simple hypothesis
Linear Classifiers Assumption: data was generated by some linear function is called a linear classifier.
Linear Classifiers Question: given a dataset, how would we determine which linear classifier is “good”?
Least Squared Error Let Empirical error: given a data point labelled , our classifier is wrong if Objective: “find that minimize the total number of mistakes on dataset” Other loss functions are possible!
Support Vector Machines Many candidates for a linear function minimizing LSE; which one should we pick?
Support Vector Machines Question: why is the middle line “good”?
Support Vector Machines One possible approach: find a hyperplane that is “perturbation resistant”
Support Vector Machines • The distance between the hyperplane and some is • Given a dataset labeled , the margin of with respect to the dataset is the minimal distance between the hyperplane defined by and the datapoints.
Support Vector Machines Assume that . To find the best hyperplane, solve: Subject to “Maximize the margin, but do not misclassify!”
Support Vector Machines Subject to Suppose that are an optimal solution. Then there is at least one for which . Thus the target can be replaced with and the optimal solution doesn’t change
Regret Minimization A group of experts; choosing one results in loss of either
Regret We want to do well – benchmark against something! • Do at least as well as the best algorithm? • Do at least as well as the best expert?
Best algorithm: pick expert 1 in rounds 1 & 2, and expert 2 in rounds 3 & 4 Best expert: expert 2 did the best in hindsight!
The greedy algorithm has regret of ? ? ? ? ? ? ? ? ? ?
Theorem: Proof: let be the set of best actions at time . Whenever greedy incurs a loss and the best action does not, must lose at least one action (the one that greedy chose). Since , this can happen at most times before the best action is chosen.
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:
The randomized greedy algorithm has regret of ? ? ? ? ? ? ? ? ? ?
Multiplicative Weights Updates “The multiplicative weights algorithm was such a great idea, that it was discovered three times” – C. Papadimitriou [Seminar Talk]
The MWU algorithm has regret of ? ? ? ? ? ? ? ? ? ?
Discussion Points • This idea can easily be translated to gains instead of losses • Losses don’t have to be in ; can be any real numbers in a range : but if is very large, affects learning rate. • A (beautiful) first step into the world of expert learning!