1 / 26

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning. Chapter 7: sparse kernel Machines. Outline. The problem: finding a sparse decision (and regression) machine that uses kernels The solution: Support Vector Machines (SVMs) and Relevance Vector Machines (RVMs) The core ideas behind the solutions

frisco
Download Presentation

Pattern Recognition and Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Recognition and Machine Learning Chapter 7: sparse kernel Machines

  2. Outline The problem: finding a sparse decision (and regression) machine that uses kernels The solution: Support Vector Machines (SVMs) and Relevance Vector Machines (RVMs) The core ideas behind the solutions The mathematical details

  3. The problem (1) Methods introduced in chapters 3 and 4 Take into account all data points in the training set -> cumbersome Do not take advantage of kernel methods -> basis functions have to be explicit Example: Least squares and logistic regression

  4. The problem (2) Kernel methods require evaluation of the kernel function for all pairs of -> cumbersome

  5. The solution (1) Support vector machines (SVMs) are kernel machines that compute a decision boundary making sparse use of data points

  6. The solution (2) Relevance vector machines (RVMs) are kernel machines that compute a posterior class probability making sparse use of data points

  7. The solution (3) SVMs as well as RVMs can also be used for regression SVM RVM even sparser!

  8. SVM: The core idea (1) That class separator which maximizes the margin between itself and the nearest data points will have the smallest generalization error:

  9. SVM: The core idea (2) In input space:

  10. SVM: The core idea (3) For regression:

  11. RVM: The core idea (1) Exclude basis vectors whose presence reduces the probability of the observed data

  12. RVM: The core idea (2) For classification and regression: Classification Regression

  13. SVM: The details (1) Equation of the decision surface: Distance of a point from the decision surface:

  14. SVM: The details (2) Distance of a point from the decision surface: Maximum margin solution:

  15. SVM: The details (3) Distance of a point from the decision surface: We therefore may rescale , such that for the point closest to the surface.

  16. SVM: The details (4) Therefore, we can reduce to under the constraint

  17. SVM: The details (5) To solve this, we introduce Lagrange multipliers and minimize Equivalently, we can maximize the dual representation where the kernel function can be chosen without specifying explicitly.

  18. SVM: The details (6) Because of the constraint only those survive for which is on the margin, i.e. This leads to sparsity.

  19. SVM: The details (7) Based on numerical optimization of the parameters and , predictions on new data points can be made by evaluating the sign of

  20. SVM: The details (8) In cases where the data points are not separable in feature space, we need a soft margin, i.e. a (limited) tolerance for misclassified points. To achieve this, we introduce slack variables with

  21. SVM: The details (9) Graphically:

  22. SVM: The details (10) The same procedure as before (with additional Lagrange multipliers and corresponding additional constraints) again yields a sparse kernel-based solution:

  23. SVM: The details (11) The soft-margin approach can be formulated as minimizing the regularized error function This formulation can be extended to use SVMs for regression: where and are slack variables describing the position of a data point above or below a tube of width 2ϵ around the estimate y.

  24. SVM: The details (12) Graphically:

  25. SVM: The details (13) Again, optimization using Lagrange multipliers yields a sparse kernel-based solution:

  26. SVM: Limitations Output is a decision, not a posterior probability Extension of classification to more than two classes is problematic The parameters C and ϵ have to be found by methods such as cross validation Kernel functions are required to be positive definite

More Related