1 / 34

CS 59000 Statistical Machine learning Lecture 18

CS 59000 Statistical Machine learning Lecture 18. Yuan (Alan) Qi Purdue CS Oct. 30 2008. Outline . Review of Support Vector Machines for Linearly Separable Case Support Vector Machines for Overlapping Class Distributions Support Vector Machines for Regression. Support Vector Machines.

Download Presentation

CS 59000 Statistical Machine learning Lecture 18

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 59000 Statistical Machine learningLecture 18 Yuan (Alan) Qi Purdue CS Oct. 30 2008

  2. Outline Review of Support Vector Machines for Linearly Separable Case Support Vector Machines for Overlapping Class Distributions Support Vector Machines for Regression

  3. Support Vector Machines Support Vector Machines: motivated by statistical learning theory. Maximum margin classifiers Margin: the smallest distance between the decision boundary and any of the samples

  4. Maximizing Margin Since scaling w and b together will not change the above ratio, we set In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.

  5. Optimization Problem Quadratic programming: Subject to

  6. Lagrange Multiplier Maximize Subject to Gradient of constraint:

  7. Geometrical Illustration of Lagrange Multiplier

  8. Lagrange Multiplier with Inequality Constraints

  9. Karush-Kuhn-Tucker (KKT) condition

  10. Lagrange Function for SVM Quadratic programming: Subject to Lagrange function:

  11. Dual Variables Setting derivatives over L to zero:

  12. Dual Problem

  13. Prediction

  14. KKT Condition, Support Vectors, and Bias The corresponding data points in the latter case are known as support vectors. Then we can solve the bias term as follows:

  15. Computational Complexity Quadratic programming: When Dimension < Number of data points, Solving the Dual problem is more costly. Dual representation allows the use of kernels

  16. Example: SVM Classification

  17. Classification for Overlapping Classes Soft Margin:

  18. New Cost Function To maximize margin and softly penalize points that lies on the wrong side of margin (not decision) boundary, we minimize

  19. Lagrange Function Where we have Lagrange multipliers:

  20. KKT Condition

  21. Gradients

  22. Dual Lagrangian Since and , we have

  23. Dual Lagrangian with Constraints Maximize Subject to

  24. Support Vectors Discussions on two cases of support vectors.

  25. Solve Bias Term Discussion on solving SVMs...

  26. Interpretation from Regularization Framework

  27. Regularized Logistic Regression For logistic regression, we have

  28. Visualization of Hinge Error Function

  29. SVM for Regression Using sum of square errors, we have However, the solution for ridge regression is not sparse.

  30. Є-insensitive Error Function Minimize

  31. Slack Variables How many slack variables do we need? Minimize

  32. Visualization of SVM Regression

  33. Support Vectors for Regression Which points will be support vectors for regression? Why?

  34. Sparsity Revisited Discussion: Error function or regularizer (Lasso)

More Related