Machine Learning in Practice Lecture 3

Machine Learning in PracticeLecture 3 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

Plan for Today • Announcements • Assignment 2 • Quiz 1 • Weka helpful hints • Topic of the day: Input and Output • More on cross-validation • ARFF format

Weka Helpful Hints

Increase Heap Size

Weka Helpful Hint: Documentation!! Click on More button!

Output Predictions Option

Important note: Because of the way Weka randomizes the data for cross-validation, the only circumstance under which you can match the instance numbers to positions in your data is if you have separate train and test sets so the order will be preserved! Output Predictions Option

View Classifier Errors

Input and Output

Representations • Concept: the rule you want to learn • Instance: one data point from your training or testing data (row in table) • Attribute: one of the features that an instance is composed of (column in table)

Numeric versus Nominal Attributes • What kind of reasoning does your representation enable? • Numeric attributes allow instances to be ordered • Numeric attributes allow you to measure distance between instances • Sometimes numeric attributes make too fine grained of a distinction .2 .25 .28 .31 .35 .45 .47 .52 .6 .63

Numeric versus Nominal Attributes • Numeric attributes can be discretized into nominal values • Then you lose ordering and distance • Another option is applying a function that maps a range of values into a single numeric attribute • Nominal attributes can be mapped into numbers • i.e., decide that blue=1 and green=2 • But are inferences made based on this valid? .2 .25 .28 .31 .35 .45 .47 .52 .6 .63

.2 .3 .5 .6 Numeric versus Nominal Attributes • Numeric attributes can be discretized into nominal values • Then you lose ordering and distance • Another option is applying a function that maps a range of values into a single numeric attribute • Nominal attributes can be mapped into numbers • i.e., decide that blue=1 and green=2 • But are inferences made based on this valid? .2 .25 .28 .31 .35 .45 .47 .52 .6 .63

Example! • Problem: Learn a rule that predicts how much time a person spends doing math problems each day • Attributes: You know gender, age, socio-economic status of parents, chosen field if any • How would you represent age, and why? What would you expect the target rule to look like?

Styles of Learning • Classification – learn rules from labeled instances that allow you to assign new instances to a class • Association – look for relationships between features, not just rules that predict a class from an instance (more general) • Clustering – look for instances that are similar (involves comparisons of multiple features) • Numeric Prediction (regression models)

Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

What else would be affected if wheat were to disappear? Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

How would you represent this data? Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

What would the learned rule look like? Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

What if you wanted a more general rule: i.e., Affects(Entity1, Entity2) Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

122 rows altogether! Now let’s look at the learned rule…. What if you wanted a more general rule: i.e., Affects(Entity1, Entity2) Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

Does it have to be this complicated? 122 rows altogether! Now let’s look at the learned rule…. What if you wanted a more general rule: i.e., Affects(Entity1, Entity2) Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

What would your representation for Affects(Entity1, Entity2) look like? Food Web http://www.cas.psu.edu/DOCS/WEBCOURSE/WETLAND/WET1/identify.html

More on Cross-Validation

Cross Validation Exercise 1 2 What is the same? What is different? What surprises you? 3 5 4

Compare Folds with Tree Trained on Whole Set 1 2 3 5 4

Train Versus Test Performance on Training Data Performance on Testing Data

Which Model Do You Think Will Perform Best on Test Set? 1 2 3 5 4

Fold 1

Fold 2

Fold 3

Fold 4

Fold 5

Total Performance What do you notice?

Total Performance Average Kappa = .5

Starting to think about Error Analyses • Step 1: Look at the confusion matrix • Where are most of the errors occurring? • What are possible explanations for systematic errors you see? • Are the instances in the confusable classes too similar to each other? If so, how can we distinguish them? • Are we paying attention to the wrong features? • Are we missing features that would allow us to see commonalities within classes that we are missing?

What went wrong on Fold 3? 1 2 3 5 4

What went wrong on Fold 3? Training Set Performance Testing Set Performance Hypotheses?

What’s the difference?

Hypothesis: Problem with first cut

Some Examples

What do you conclude?

What do you conclude? Problem with Fold 3 was probably just a sampling fluke. Distribution of classes different between train and test.

Machine Learning in Practice Lecture 3

Machine Learning in Practice Lecture 3

Presentation Transcript

Machine Learning – Lecture 3

Machine Learning in Practice Lecture 9

Machine Learning in Practice Lecture 18

Machine Learning in Practice Lecture 12

CS 59000 Statistical Machine learning Lecture 3

Machine Learning in Practice Lecture 19

Machine Learning in Practice MidTerm Review

Machine Learning in Practice Lecture 14

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 5

Machine Learning in Practice Lecture 8

Machine Learning: Lecture 6

CS 461: Machine Learning Lecture 3

Machine Learning: Lecture 5

Machine Learning in Practice Lecture 26

Machine Learning: Lecture 3

Machine Learning in Practice Lecture 27

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 6