1 / 46

Support Vector Machine (SVM) Classification

Support Vector Machine (SVM) Classification. Greg Grudic. Today’s Lecture Goals. Support Vector Machine (SVM) Classification Another algorithm for linear separating hyperplanes A Good text on SVMs: Bernhard Schölkopf and Alex Smola. Learning with Kernels . MIT Press, Cambridge, MA, 2002.

laurajordan
Download Presentation

Support Vector Machine (SVM) Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machine (SVM) Classification Greg Grudic Intro AI

  2. Today’s Lecture Goals • Support Vector Machine (SVM) Classification • Another algorithm for linear separating hyperplanes A Good text on SVMs:Bernhard Schölkopf and Alex Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002 Intro AI

  3. Support Vector Machine (SVM) Classification • Classification as a problem of finding optimal (canonical) linear hyperplanes. • Optimal Linear Separating Hyperplanes: • In Input Space • In Kernel Space • Can be non-linear Intro AI

  4. Linear Separating Hyper-Planes How many lines can separate these points? Which line should we use? NO! Intro AI

  5. Initial Assumption: Linearly Separable Data Intro AI

  6. Linear Separating Hyper-Planes Intro AI

  7. Linear Separating Hyper-Planes • Given data: • Finding a separating hyperplane can be posed as a constraint satisfaction problem (CSP): • Or, equivalently: • If data is linearly separable, there are an infinite number of hyperplanes that satisfy this CSP Intro AI

  8. The Margin of a Classifier • Take any hyper-plane (P0) that separates the data • Put a parallel hyper-plane (P1) on a point in class 1 closest to P0 • Put a second parallel hyper-plane (P2) on a point in class -1 closest to P0 • The margin (M) is the perpendicular distance between P1 and P2 Intro AI

  9. Calculating the Margin of a Classifier P2 • P0: Any separating hyperplane • P1: Parallel to P0, passing through closest point in one class • P2: Parallel to P0, passing through point closest to the opposite class P0 P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Intro AI

  10. SVM Constraints on the Model Parameters Model parameters must be chosen such that, for on P1 and for on P2: For any P0, these constraints are always attainable. Given the above, then the linear separating boundary lies half way between P1 and P2 and is given by: Resulting Classifier: Intro AI

  11. Remember: signed distance from a point to a hyperplane: Hyperplane define by: Intro AI

  12. Calculating the Margin (1) Intro AI

  13. Calculating the Margin (2) Signed Distance Take absolute value to get the unsigned margin: Intro AI

  14. Different P0’s have Different Margins P2 • P0: Any separating hyperplane • P1: Parallel to P0, passing through closest point in one class • P2: Parallel to P0, passing through point closest to the opposite class P0 P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Intro AI

  15. Different P0’s have Different Margins P2 • P0: Any separating hyperplane • P1: Parallel to P0, passing through closest point in one class • P2: Parallel to P0, passing through point closest to the opposite class P0 P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Intro AI

  16. Different P0’s have Different Margins • P0: Any separating hyperplane • P1: Parallel to P0, passing through closest point in one class • P2: Parallel to P0, passing through point closest to the opposite class P2 P0 P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Intro AI

  17. How Do SVMs Choose the Optimal Separating Hyperplane (boundary)? P2 • Find the that maximizes the margin! P1 Margin (M): distance measured along a line perpendicular to P1 and P2 Intro AI

  18. SVM: Constraint Optimization Problem • Given data: • Minimize subject to: The Lagrange Function Formulation is used to solve this Minimization Problem Intro AI

  19. The Lagrange Function Formulation For every constraint we introduce a Lagrange Multiplier: The Lagrangian is then defined by: Where - the primal variables are - the dual variables are Goal: Minimize Lagrangian w.r.t. primal variables, and Maximize w.r.t. dual variables Intro AI

  20. Derivation of the Dual Problem • At the saddle point (extremum w.r.t. primal) • This give the conditions • Substitute into to get the dual problem Intro AI

  21. Using the Lagrange Function Formulation, we get the Dual Problem • Maximize • Subject to Intro AI

  22. Properties of the Dual Problem • Solving the Dual gives a solution to the original constraint optimization problem • For SVMs, the Dual problem is a Quadratic Optimization Problem which has a globally optimal solution • Gives insights into the NON-Linear formulation for SVMs Intro AI

  23. Support Vector Expansion (1) OR is also computed from the optimal dual variables Intro AI

  24. Support Vector Expansion (2) Substitute OR Intro AI

  25. What are the Support Vectors? Maximized Margin Intro AI

  26. Why do we want a model with only a few SVs? • Leaving out an example that does not become an SV gives the same solution! • Theorem (Vapnik and Chervonenkis, 1974): Let be the number of SVs obtained by training on N examples randomly drawn from P(X,Y), and E be an expectation. Then Intro AI

  27. What Happens When Data is Not Separable: Soft Margin SVM Add a Slack Variable Intro AI

  28. Soft Margin SVM: Constraint Optimization Problem • Given data: • Minimize subject to: Intro AI

  29. Dual Problem (Non-separable data) • Maximize • Subject to Intro AI

  30. Same Decision Boundary Intro AI

  31. Mapping into Nonlinear Space Goal: Data is linearly separable (or almost) in the nonlinear space. Intro AI

  32. Nonlinear SVMs • KEY IDEA: Note that both the decision boundary and dual optimization formulation use dot products in input space only! Intro AI

  33. Kernel Trick Replace with Inner Product Can use the same algorithms in nonlinear kernel space! Intro AI

  34. Nonlinear SVMs Maximize: Boundary: Intro AI

  35. Need Mercer Kernels Intro AI

  36. Gram (Kernel) Matrix Training Data: • Properties: • Positive Definite Matrix • Symmetric • Positive on diagonal • N by N Intro AI

  37. Commonly Used Mercer Kernels • Polynomial • Sigmoid • Gaussian Intro AI

  38. Why these kernels? • There are infinitely many kernels • The best kernel is data set dependent • We can only know which kernels are good by trying them and estimating error rates on future data • Definition: a universal approximator is a mapping that can arbitrarily well model any surface (i.e. many to one mapping) • Motivation for the most commonly used kernels • Polynomials (given enough terms) are universal approximators • However, Polynomial Kernels are not universal approximators because they cannot represent all polynomial interactions • Sigmoid functions (given enough training examples) are universal approximators • Gaussian Kernels (given enough training examples) are universal approximators • These kernels have shown to produce good models in practice Intro AI

  39. Picking a Model (A Kernel for SVMs)? • How do you pick the Kernels? • Kernel parameters • These are calledlearning parameters or hyperparamters • Two approaches choosing learning paramters • Bayesian • Learning parameters must maximize probability of correct classification on future data based on prior biases • Frequentist • Use the training data to learn the model parameters • Use validation data to pick the best hyperparameters. • More on learning parameter selection later Intro AI

  40. Intro AI

  41. Intro AI

  42. Intro AI

  43. Some SVM Software • LIBSVM • http://www.csie.ntu.edu.tw/~cjlin/libsvm/ • SVM Light • http://svmlight.joachims.org/ • TinySVM • http://chasen.org/~taku/software/TinySVM/ • WEKA • http://www.cs.waikato.ac.nz/ml/weka/ • Has many ML algorithm implementations in JAVA Intro AI

  44. MNIST: A SVM Success Story • Handwritten character benchmark • 60,000 training and 10,0000 testing • Dimension d = 28 x 28 Intro AI

  45. Results on Test Data SVM used a polynomial kernel of degree 9. Intro AI

  46. SVM (Kernel) Model Structure Intro AI

More Related