1 / 33

Support Vector Machines

Support Vector Machines. Refer to Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials. a. Linear Classifiers. x. f. y est. f ( x , w ,b ) = sign( w . x - b ). denotes +1 denotes -1. How would you classify this data?. a. Linear Classifiers. x. f. y est.

lottie
Download Presentation

Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machines Refer to Andrew’s tutorials: http://www.cs.cmu.edu/~awm/tutorials

  2. a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 How would you classify this data?

  3. a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 How would you classify this data?

  4. a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 How would you classify this data?

  5. a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 How would you classify this data?

  6. a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 Any of these would be fine.. ..but which is best?

  7. a Classifier Margin x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 Define the margin of a linear classifier as the width that the boundary could be increased by before hitting a datapoint.

  8. a Maximum Margin x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Linear SVM

  9. a Maximum Margin x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against Linear SVM

  10. Why Maximum Margin? • Intuitively this feels safest. • If we’ve made a small error in the location of the boundary (it’s been jolted in its perpendicular direction) this gives us least chance of causing a misclassification. • LOOCV is easy since the model is immune to removal of any non-support-vector datapoints. • There’s some theory (using VC dimension) that is related to (but not the same as) the proposition that this is a good thing. • Empirically it works very very well. f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the, um, maximum margin. This is the simplest kind of SVM (Called an LSVM) Support Vectors are those datapoints that the margin pushes up against

  11. denotes +1 denotes -1 Estimate the Margin • What is the distance expression for a point x to a line wx+b= 0? wx +b = 0 x

  12. denotes +1 denotes -1 Estimate the Margin • What is the expression for margin? wx +b = 0 Margin

  13. denotes +1 denotes -1 Maximize Margin wx +b = 0 Margin

  14. denotes +1 denotes -1 Maximize Margin • Min-max problem  game problem wx +b = 0 Margin

  15. denotes +1 denotes -1 Maximize Margin Strategy: wx +b = 0 Margin

  16. Maximum Margin Linear Classifier • How to solve it?

  17. Learning via Quadratic Programming • QP is a well-studied class of optimization algorithms to maximize a quadratic function of some real-valued variables subject to linear constraints.

  18. Quadratic Programming Quadratic criterion Find Subject to n additional linear inequality constraints And subject to e additional linear equality constraints

  19. Quadratic Programming Quadratic criterion Find There exist algorithms for finding such constrained quadratic optima much more efficiently and reliably than gradient ascent. (But they are very fiddly…you probably don’t want to write one yourself) Subject to n additional linear inequality constraints And subject to e additional linear equality constraints

  20. denotes +1 denotes -1 Uh-oh! This is going to be a problem! What should we do?

  21. denotes +1 denotes -1 Uh-oh! • This is going to be a problem! • What should we do? • Idea 1: • Find minimum w.w, while minimizing number of training set errors. • Problemette: Two things to minimize makes for an ill-defined optimization

  22. denotes +1 denotes -1 Uh-oh! • This is going to be a problem! • What should we do? • Idea 1.1: • Minimize • w.w+ C (#train errors) • There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is? Tradeoff parameter

  23. denotes +1 denotes -1 Uh-oh! • This is going to be a problem! • What should we do? • Idea 1.1: • Minimize • w.w+ C (#train errors) • There’s a serious practical problem that’s about to make us reject this approach. Can you guess what it is? Tradeoff parameter Can’t be expressed as a Quadratic Programming problem. Solving it may be too slow. (Also, doesn’t distinguish between disastrous errors and near misses) So… any other ideas?

  24. denotes +1 denotes -1 Uh-oh! • This is going to be a problem! • What should we do? • Idea 2.0: • Minimize • w.w+ C (distance of error • points to their • correct place)

  25. denotes +1 denotes -1 Support Vector Machine (SVM) for Noisy Data • Any problem with the above formulism?

  26. denotes +1 denotes -1 Support Vector Machine (SVM) for Noisy Data • Balance the trade off between margin and classification errors

  27. Support Vector Machine for Noisy Data How do we determine the appropriate value for c ?

  28. An Equivalent QP: Determine b A linear programming problem ! Fix w

  29. Suppose we’re in 1-dimension What would SVMs do with this data? x=0

  30. Suppose we’re in 1-dimension Not a big surprise x=0 Positive “plane” Negative “plane”

  31. Harder 1-dimensional dataset That’s wiped the smirk off SVM’s face. What can be done about this? x=0

  32. Harder 1-dimensional dataset Remember how permitting non-linear basis functions made linear regression so much nicer? Let’s permit them here too x=0

  33. Harder 1-dimensional dataset Remember how permitting non-linear basis functions made linear regression so much nicer? Let’s permit them here too x=0

More Related