Mastering Machine Learning: A Comprehensive Guide

Learning From Examples AIMA Chapter 18

Outline • Motivation for Learning • Supervised Learning • Expert Systems

Learning – What and Why? • An agent is said to be learning if it improves its performance on task based on experience/observations/data • must be fixed, must be measurable, must exist • e.g., image recognition, game playing, driving in urban environments… Reasons for learning: • Unknown, dynamically changing task environments • Hard to encode all knowledge (e.g., face recognition) • Easier to program

ML is Everywhere! Computational Biology Natural Language Processing Targeted Advertising Face Recognition

Designing Learning Elements

Supervised Learning Key idea: Learn an unknown function from examples

Supervised Learning – a Probabilistic Take Key idea: Learn an unknown function from examples

Probably Approximately Correct Learning Key idea: Learn an unknown function from examples

Learning a Classifier • Our goal is to find a function that fits the data: what is the likeliest function to have generated our dataset?

I. III. II. Simple Explanations – “Occam’s Razor” vs. Low error rate

Choosing Simple Hypotheses • Generalize better in the presence of noisy data • Faster to search through simple hypothesis space • Easier and faster to use simple hypothesis

Linear Classifiers Assumption: data was generated by some linear function is called a linear classifier.

Linear Classifiers Question: given a dataset, how would we determine which linear classifier is “good”?

Least Squared Error Let Empirical error: given a data point labelled , our classifier is wrong if Objective: “find that minimize the total number of mistakes on dataset” Other loss functions are possible!

Support Vector Machines Many candidates for a linear function minimizing LSE; which one should we pick?

Support Vector Machines Question: why is the middle line “good”?

Support Vector Machines One possible approach: find a hyperplane that is “perturbation resistant”

Support Vector Machines • The distance between the hyperplane and some is • Given a dataset labeled , the margin of with respect to the dataset is the minimal distance between the hyperplane defined by and the datapoints.

Support Vector Machines Assume that . To find the best hyperplane, solve: Subject to “Maximize the margin, but do not misclassify!”

Support Vector Machines Subject to Suppose that are an optimal solution. Then there is at least one for which . Thus the target can be replaced with and the optimal solution doesn’t change

Regret Minimization

Regret Minimization A group of experts; choosing one results in loss of either

Regret We want to do well – benchmark against something! • Do at least as well as the best algorithm? • Do at least as well as the best expert?

Best algorithm: pick expert 1 in rounds 1 & 2, and expert 2 in rounds 3 & 4 Best expert: expert 2 did the best in hindsight!

Regret Minimization

Round 1 – Greedy algorithm

The greedy algorithm has regret of ? ? ? ? ? ? ? ? ? ?

Theorem: Proof: let be the set of best actions at time . Whenever greedy incurs a loss and the best action does not, must lose at least one action (the one that greedy chose). Since , this can happen at most times before the best action is chosen.

Theorem: for any deterministic algorithm , there is a sequence of actions for which but Proof:

Round 2 – Randomized Greedy

The randomized greedy algorithm has regret of ? ? ? ? ? ? ? ? ? ?

Multiplicative Weights Updates “The multiplicative weights algorithm was such a great idea, that it was discovered three times” – C. Papadimitriou [Seminar Talk]

The MWU algorithm has regret of ? ? ? ? ? ? ? ? ? ?

Discussion Points • This idea can easily be translated to gains instead of losses • Losses don’t have to be in ; can be any real numbers in a range : but if is very large, affects learning rate. • A (beautiful) first step into the world of expert learning!

Mastering Machine Learning: A Comprehensive Guide

Mastering Machine Learning: A Comprehensive Guide

Presentation Transcript

Does Learning from Examples Improve Tutored Problem Solving?

Generalization in Learning from examples

Learning from Positive and Unlabeled Examples

Learning-Theoretic Linguistics: Some Examples from Phonology

Learning Rules from Incomplete Examples via Observation Models

Learning from negative examples: application in combinatorial optimization

Learning from Negative Examples in Set-Expansion

Segmentation from Examples

Learning Decompositional Shape Models from Examples

Learning from Infinite Training Examples

Learning Decompositional Shape Models from Examples

Learning Semantic String Transformations from Examples

Learning from Only Positive Examples in Learning By Observation

Machine Learning Examples

“Learning From Bad Examples!”

Update on Learning By Observation Learning from Positive Examples Only

Does Learning from Examples Improve Tutored Problem Solving?

Learning from Infinite Training Examples

Learning Description From Examples