1 / 15

Support Vector Machines and Kernel Methods (a very short introduction)

Support Vector Machines and Kernel Methods (a very short introduction). Laurent Orseau AgroParisTech laurent.orseau@agroparistech.fr à partir des transparents d'Antoine Cornuéjols. Introduction. Linear separation is well understood Efficient algorithms (quadratic problem)

mabli
Download Presentation

Support Vector Machines and Kernel Methods (a very short introduction)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector MachinesandKernel Methods(a very short introduction) Laurent Orseau AgroParisTech laurent.orseau@agroparistech.fr à partir des transparents d'Antoine Cornuéjols

  2. Introduction • Linear separation is well understood • Efficient algorithms (quadratic problem) • Non-linear separation • More difficult • Neural Networks • Mostly heuristic • Prone to local minima • Support Vector Machines • Uses linear separation methods for non-linear separation in an optimal way

  3. SVM: How it works • We want a non-linear separation in the input space • Called the "primal" representation of the problem • Difficult to do as is • Feature space • Idea: projectthe input space into a higher dimensionalspacewhere to do linear separation • = non-linear separation in the input space • Still difficult: complexity depends on the number of dimensions • Kernel trick • Make complexity depend on the number of examples instead • Use a "dual" representation of the problem

  4. SVM: Properties • Global optimum! • No local minimum • Fast! • quadratic optimization • Can safely replace 3-layer perceptrons • Kernels • Generic kernels exist for solving a wide class of problems • Kernels can be combined to create new kernels

  5. Illustration Polynomial separation class 1 class 1 class 2 1 2 4 5 6 {x=2, x=5, x=6} are support vectors

  6. Projection in higher dimension • Higher dimension means that the problem is reformulatedin another description spaceso as to express a solution more succinctly • Can turn combinatorial explosion into polynomial expression • Ex: parity, majority, … More succinct solutions generalize better!

  7. Linear separation in the feature space

  8. Kernels • Kernels are scalar product functions of two input examplesin the feature space • K(xi, xj) • But no need to really enter the full feature space! • The scalar product is sufficient • = Kernel trick

  9. SVM: What it does • Optimal linear separation • Maximizes the margin between the positive/negative example sets • Margin defined by closest points/examples • Only a small number of examples • Examples "supporting" the margin are called Support Vectors • Kernel: Scalar product between an example to classify and support vectors

  10. Illustration : the XOR case

  11. Hyperplane of widest margin

  12. Illustration : the XOR case Polynomial kernel function of d° 2: K(x,x') = [1 + (xT . x')]2 K(x,xi ) = 1 + x12xi12 + 2 x1x2xi1xi2 + x22xi22 + 2x1xi1 + 2x2xi2 Corresponding to projection in feature space F: [1, x12, √2 x1x2, x22, √2 x1, √2 x2 ] T

  13. Illustration : the XOR case Separation in input space D(x) = -x1x2 Separation in feature space F(X) (6-dimensional space)

  14. Applications • Text categorization • Recognition of handwritten characters • Face detection • Breast cancer diagnostic • Protein classification • Electric consumption prevision • … Trained SVM classifiers for pedestrian and face object detection (Papageorgiou, Oren, Osuna and Poggio, 1998)

  15. SVM : Limits • Can only separate 2 classes • For multi-class, must make all combinations • Must choose kernel carefully • Not always easy • Overfitting problem • Limits to compacity • There is a limit to the redescription capacity of SVMs • Only 1 projection phase • Deep Belief Networks • Can represent some solutions more compactly than SVMs • Similar to MLP with # hidden layers > 1 • Each layer is a projection of the previous feature space into a higher space • Can represent even more compact solutionsand find interesting intermediate representations • Learning?

More Related