310 likes | 326 Views
Machine Learning in Practice Lecture 23. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. In the home stretch…. Announcements Questions? Quiz ( second to last! ) Homework ( last! ) Discretization and Time Series Transformations Data Cleansing.
E N D
Machine Learning in PracticeLecture 23 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
In the home stretch…. • Announcements • Questions? • Quiz (second to last!) • Homework (last!) • Discretization and Time Series Transformations • Data Cleansing http://keijiro.typepad.com/journey/images/finish_01.jpg
About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds
About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds
About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds
About the Quiz…. * Fold 2 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds
Discretization • Connection between discretization and clustering • Finding natural breaks in your data • Connection between discretization and feature selection • You can think of each interval as a feature or a feature value • Discretizing before classification limits options for breaks • If you attempt to discretize and it fails to find a split that would have been useful, it has the effect of eliminating a feature
Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection
Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection
Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection
Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection
Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection
Discretization and Feature Selection • Removing boundaries is like a backwards elimination approach to attribute selection
Discretization and Feature Selection • Removing boundaries is like a backwards elimination approach to attribute selection
Discretization and Feature Selection • Removing boundaries is like a backwards elimination approach to attribute selection
Discretization • Discretization sometimes improves performance even if you don’t strictly need nominal attributes • Breaks in good places biases classifier to learn a good model • Decision tree learners do discretization locally when they are selecting an attribute to branch on • Advantages and disadvantages to local discretization
Layers • Think of building a model in layers • You can build a complex shape by combining lots of simple shapes • We’ll come back to this idea when we talk about ensemble methods in the next lecture! • You could build a complex model all at once • Or you could build a complex model in a series of simple stages • Discretization, feature selection, model building
Unsupervised Discretization • Equal intervals (equal interval binning) • E.g, For temperature: breaks every 10 degrees • E.g, For weight: breaks every 5 pounds • Equal frequencies (equal frequency binning) • E.g., Groupings of about 10 instances • E.g., Groupings of about 100 instances
Supervised Discretization • Supervised splitting: find the best split point by generating all possible splits and using attribute selection to pick one • Keep splitting till you don’t get value anymore • It’s a little like building a decision tree and then throwing the tree away, but keeping the grouping of instances at the leaf nodes • Entropy based: rank splits using information gain
Built-In Supervised Discretization • NaiveBayes can be used with or without supervised discretization • SpeakerID data set has numeric attributes • Not normally distributed • Without discretization kappa = .16 • With discretization kappa = .34
Doing Discretization in Weka • Note: there is also an unsupervised discretization filter • attributeIndices: which attributes do you want to discretize • Target class set inside the classifier
Doing Discretization in Weka • The last two options are for the stoping criterion • Not clear how it is evaluating the goodness of each split • Not well documented
Example for Time Series Transforms • Amount of CO2 in a room is related to how many people were in the room N minutes ago • Let’s say you take a measurement every N/2 minutes • Before you apply a numeric prediction model to predict CO2 from number of people, first copy number of people forward 2 instances 1NumPeople AmountCO2 2NumPeople AmountCO2 3NumPeople AmountCO2 4NumPeople AmountCO2 ?NumPeople AmountCO2 ?NumPeople AmountCO2 1NumPeople AmountCO2 2NumPeople AmountCO2
Example for Time Series Transforms • Amount of CO2 in a room is related to how many people were in the room N minutes ago • Let’s say you take a measurement every N/2 minutes • Before you apply a numeric prediction model to predict CO2 from number of people, first copy number of people forward 2 instances 1NumPeople AmountCO2 2NumPeople AmountCO2 3NumPeople AmountCO2 4NumPeople AmountCO2 ?NumPeople AmountCO2 ?NumPeople AmountCO2 1NumPeople AmountCO2 2NumPeople AmountCO2
Time Series Transforms • Fill in with the delta or fill in with a previous value • instanceRange: You specify how many instances backward or forward to look (negative means backwards) • fillWithMissing: default is to ignore first and last instance. If true, use missing as the value for the attributes
Data Cleansing: Removing Outliers • Noticing outliers is easier when you look at the overall distribution of your data • Especially when using human judgment • You know what doesn’t look right • It’s harder to tell automatically whether the problem is that your data doesn’t fit the model or you have outliers
Eliminating Noise with Decision Tree Learning • Train a tree • Eliminate misclassified examples • Train on the clean subset of the data • You will get a simpler tree that generalizes better • You can do this iteratively
Data Cleansing: Removing Outliers • One way of identifying outliers is to look for examples that several algorithms misclassify • Algorithms moving down different optimization paths are unlikely to get trapped in the same local minima • You can compensate for outliers by adjusting the learning algorithm • Using absolute distance rather than squared distance for a regression problem • Doesn’t remove outliers, but reduces the effect of outliers
Take Home Message • Discretization is related to feature selection and clustering • Similar alternative search strategies • Think about learning a model in stages • Getting back to the idea of natural breaks in your data • Difficult to tell with only one model whether a data point is noisy or a model is overly simplistic