1 / 22

CS 461: Machine Learning Lecture 3

CS 461: Machine Learning Lecture 3. Dr. Kiri Wagstaff wkiri@wkiri.com. Questions?. Homework 2 Project Proposal Weka Other questions from Lecture 2. Review from Lecture 2. Representation, feature types numeric, discrete, ordinal

jwilhelm
Download Presentation

CS 461: Machine Learning Lecture 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 461: Machine LearningLecture 3 Dr. Kiri Wagstaff wkiri@wkiri.com CS 461, Winter 2009

  2. Questions? • Homework 2 • Project Proposal • Weka • Other questions from Lecture 2 CS 461, Winter 2009

  3. Review from Lecture 2 • Representation, feature types • numeric, discrete, ordinal • Decision trees: nodes, leaves, greedy, hierarchical, recursive, non-parametric • Impurity: misclassification error, entropy • Evaluation: confusion matrix, cross-validation CS 461, Winter 2009

  4. Plan for Today • Decision trees • Regression trees, pruning, rules • Benefits of decision trees • Evaluation • Comparing two classifiers • Support Vector Machines • Classification • Linear discriminants, maximum margin • Learning (optimization) • Non-separable classes • Regression CS 461, Winter 2009

  5. Remember Decision Trees? CS 461, Winter 2009

  6. Algorithm: Build a Decision Tree CS 461, Winter 2009 [Alpaydin 2004  The MIT Press]

  7. Building a Regression Tree • Same algorithm… different criterion • Instead of impurity, use Mean Squared Error(in local region) • Predict mean output for node • Compute training error • (Same as computing the variance for the node) • Keep splitting until node error is acceptable;then it becomes a leaf • Acceptable: error < threshold CS 461, Winter 2009

  8. Turning Trees into Rules CS 461, Winter 2009 [Alpaydin 2004  The MIT Press]

  9. Comparing Two Algorithms Chapter 14 CS 461, Winter 2009

  10. Machine Learning Showdown! • McNemar’s Test • Under H0, we expect e01= e10=(e01+ e10)/2 Accept if < X2α,1 CS 461, Winter 2009 [Alpaydin 2004  The MIT Press]

  11. Support Vector Machines Chapter 10 CS 461, Winter 2009

  12. Linear Discrimination • Model class boundaries (not data distribution) • Learning: maximize accuracy on labeled data • Inductive bias: form of discriminant used CS 461, Winter 2009 [Alpaydin 2004  The MIT Press]

  13. How to find best w, b? • E(w|X) is error with parameters w on sample X w*=arg minwE(w | X) • Gradient • Gradient-descent: Starts from random w and updates w iteratively in the negative direction of gradient CS 461, Winter 2009 [Alpaydin 2004  The MIT Press]

  14. Gradient Descent E (wt) E (wt+1) wt wt+1 η CS 461, Winter 2009 [Alpaydin 2004  The MIT Press]

  15. Support Vector Machines • Maximum-margin linear classifiers • Imagine: Army Ceasefire • How to find best w, b? • Quadratic programming: CS 461, Winter 2009

  16. Must get training data right! Optimization (primal formulation) N + d +1 parameters CS 461, Winter 2009 [Alpaydin 2004  The MIT Press]

  17. So re-write: αt >0 are the SVs, capped by C Optimization (dual formulation) We know: N parameters. Where did w and b go? CS 461, Winter 2009 [Alpaydin 2004  The MIT Press]

  18. What if Data isn’t Linearly Separable? • Embed data in higher-dimensional space • Explicit: Basis functions (new features) • Visualization of 2D -> 3D • Implicit: Kernel functions (new dot product/similarity) • Polynomial • RBF/Gaussian • Sigmoid • SVM applet • Still need to find a linear hyperplane • Add “slack” variables to permit some errors CS 461, Winter 2009

  19. Example: Orbital Classification EO-1 • Linear SVM flying on EO-1 Earth Orbiter since Dec. 2004 • Classify every pixel • Four classes • 12 features (of 256 collected) Ice Water Land Snow Hyperion Classified CS 461, Winter 2009 [Castano et al., 2005]

  20. SVM in Weka • SMO: Sequential Minimal Optimization • Faster than QP-based versions • Try linear, RBF kernels CS 461, Winter 2009

  21. Summary: What You Should Know • Decision trees • Regression trees, pruning, rules • Benefits of decision trees • Evaluation • Comparing two classifiers (McNemar’s test) • Support Vector Machines • Classification • Linear discriminants, maximum margin • Learning (optimization) • Non-separable classes CS 461, Winter 2009

  22. Next Time • Reading • Evaluation(read Ch. 14.7) • Support Vector Machines(read Ch. 10.1-10.4, 10.6, 10.9) • Questions to answer from the reading • Posted on the website • Three volunteers: Sen, Jimmy, and Irvin CS 461, Winter 2009

More Related