1 / 31

Machine Learning in Practice Lecture 23

Machine Learning in Practice Lecture 23. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. In the home stretch…. Announcements Questions? Quiz ( second to last! ) Homework ( last! ) Discretization and Time Series Transformations Data Cleansing.

albertoe
Download Presentation

Machine Learning in Practice Lecture 23

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in PracticeLecture 23 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. In the home stretch…. • Announcements • Questions? • Quiz (second to last!) • Homework (last!) • Discretization and Time Series Transformations • Data Cleansing http://keijiro.typepad.com/journey/images/finish_01.jpg

  3. About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds

  4. About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds

  5. About the Quiz…. * Fold 1 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds

  6. About the Quiz…. * Fold 2 1 2 3 4 5 For fold i in {1..5} Test set is test_i Select one of the remaining sets to be validation_i Concatenate the remaining sets into part_train_i For each algorithm in {X1-1, X1-2, X2-1, X2-2} Train the algorithm on part_train_i and test on validation_i Now concatenate all but test_i into train_i Now train the best algorithm on train_i and test on test_i to get the performance for fold i Average the performance for the 5 folds

  7. Discretization and Time Series Transforms

  8. Discretization • Connection between discretization and clustering • Finding natural breaks in your data • Connection between discretization and feature selection • You can think of each interval as a feature or a feature value • Discretizing before classification limits options for breaks • If you attempt to discretize and it fails to find a split that would have been useful, it has the effect of eliminating a feature

  9. Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection

  10. Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection

  11. Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection

  12. Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection

  13. Discretization and Feature Selection • Adding breaks is like creating new attribute values • Each attribute value is potentially a new binary attribute • Inserting boundaries is like a forward selection approach to attribute selection

  14. Discretization and Feature Selection • Removing boundaries is like a backwards elimination approach to attribute selection

  15. Discretization and Feature Selection • Removing boundaries is like a backwards elimination approach to attribute selection

  16. Discretization and Feature Selection • Removing boundaries is like a backwards elimination approach to attribute selection

  17. Discretization • Discretization sometimes improves performance even if you don’t strictly need nominal attributes • Breaks in good places biases classifier to learn a good model • Decision tree learners do discretization locally when they are selecting an attribute to branch on • Advantages and disadvantages to local discretization

  18. Layers • Think of building a model in layers • You can build a complex shape by combining lots of simple shapes • We’ll come back to this idea when we talk about ensemble methods in the next lecture! • You could build a complex model all at once • Or you could build a complex model in a series of simple stages • Discretization, feature selection, model building

  19. Unsupervised Discretization • Equal intervals (equal interval binning) • E.g, For temperature: breaks every 10 degrees • E.g, For weight: breaks every 5 pounds • Equal frequencies (equal frequency binning) • E.g., Groupings of about 10 instances • E.g., Groupings of about 100 instances

  20. Supervised Discretization • Supervised splitting: find the best split point by generating all possible splits and using attribute selection to pick one • Keep splitting till you don’t get value anymore • It’s a little like building a decision tree and then throwing the tree away, but keeping the grouping of instances at the leaf nodes • Entropy based: rank splits using information gain

  21. Built-In Supervised Discretization • NaiveBayes can be used with or without supervised discretization • SpeakerID data set has numeric attributes • Not normally distributed • Without discretization kappa = .16 • With discretization kappa = .34

  22. Doing Discretization in Weka • Note: there is also an unsupervised discretization filter • attributeIndices: which attributes do you want to discretize • Target class set inside the classifier

  23. Doing Discretization in Weka • The last two options are for the stoping criterion • Not clear how it is evaluating the goodness of each split • Not well documented

  24. Example for Time Series Transforms • Amount of CO2 in a room is related to how many people were in the room N minutes ago • Let’s say you take a measurement every N/2 minutes • Before you apply a numeric prediction model to predict CO2 from number of people, first copy number of people forward 2 instances 1NumPeople AmountCO2 2NumPeople AmountCO2 3NumPeople AmountCO2 4NumPeople AmountCO2 ?NumPeople AmountCO2 ?NumPeople AmountCO2 1NumPeople AmountCO2 2NumPeople AmountCO2

  25. Example for Time Series Transforms • Amount of CO2 in a room is related to how many people were in the room N minutes ago • Let’s say you take a measurement every N/2 minutes • Before you apply a numeric prediction model to predict CO2 from number of people, first copy number of people forward 2 instances 1NumPeople AmountCO2 2NumPeople AmountCO2 3NumPeople AmountCO2 4NumPeople AmountCO2 ?NumPeople AmountCO2 ?NumPeople AmountCO2 1NumPeople AmountCO2 2NumPeople AmountCO2

  26. Time Series Transforms • Fill in with the delta or fill in with a previous value • instanceRange: You specify how many instances backward or forward to look (negative means backwards) • fillWithMissing: default is to ignore first and last instance. If true, use missing as the value for the attributes

  27. Data Cleansing

  28. Data Cleansing: Removing Outliers • Noticing outliers is easier when you look at the overall distribution of your data • Especially when using human judgment • You know what doesn’t look right • It’s harder to tell automatically whether the problem is that your data doesn’t fit the model or you have outliers

  29. Eliminating Noise with Decision Tree Learning • Train a tree • Eliminate misclassified examples • Train on the clean subset of the data • You will get a simpler tree that generalizes better • You can do this iteratively

  30. Data Cleansing: Removing Outliers • One way of identifying outliers is to look for examples that several algorithms misclassify • Algorithms moving down different optimization paths are unlikely to get trapped in the same local minima • You can compensate for outliers by adjusting the learning algorithm • Using absolute distance rather than squared distance for a regression problem • Doesn’t remove outliers, but reduces the effect of outliers

  31. Take Home Message • Discretization is related to feature selection and clustering • Similar alternative search strategies • Think about learning a model in stages • Getting back to the idea of natural breaks in your data • Difficult to tell with only one model whether a data point is noisy or a model is overly simplistic

More Related