1 / 62

Outline

Outline. Support Vector Machines Linear SVM Maximal Margin Non-linear Case Soft Margin Kernel Tricks Summary. Just in Case. W is a vector orthogonal to the hyperplane <w,x> is the length of x along the direction of w (scaled by ||w||). Linear Classification.

Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Support Vector Machines • Linear SVM • Maximal Margin • Non-linear Case • Soft Margin • Kernel Tricks • Summary

  2. Just in Case • W is a vector orthogonal to the hyperplane • <w,x> is the length of x along the direction of w (scaled by ||w||)

  3. Linear Classification • Binary Classification problem • The data above the red line belongs to class ‘x’ • The data below red line belongs to class ‘o’ x x x x x x x o x x o o x o o o o o o o o o o

  4. Separating hyperplane • samples are assumed to be linearly separable Which one of two would you choose as the classifier? because it can be trusted more for unknown data

  5. Goal of SVMFind Maximum Margin • definition of margin • minimum distance between a separating hyperplane and the sets of or margin

  6. Goal of SVMFind Maximum Margin • Goal • Find a separating hyperplane with maximum margin

  7. Small Margin Large Margin Support Vectors SVM – Support Vector Machines

  8. Why Maximum Margin?

  9. More • Minimize the risk of overfitting by choosing the maximal margin • Classification is less sensitive to the exact location of the training points • Generalization error of hyperplane can be bounded by an expression depending on 1/margin^{2}. • Related to injecting noise in inputs for neural network learning • Robustness

  10. Hyperplanes

  11. Calculate margin • A separating hyperplane • w and b are not uniquely determined • under the constraint min|<w,x>+b|=1, they are uniquely determine

  12. Calculate margin • the distance between a point x and is given by |<w,x>+b|/||w|| • thus, the margin is given by 1/||w||

  13. Optimization of margin • maximization of margin

  14. Optimization of margin separating hyperplane with maximal margin separating hyperplane with minimum • Therefore, we want to don’t forget that we want to know and

  15. Lagrange Multiplier • optimization problem under constraints can be solved by the method of Lagrange Multipliers • Lagrangian is obtained as follows: • for equality constraints • for inequality constraints

  16. Lagrange Multiplier • In our case • Inequality constraints

  17. Convex Optimization • an optimization problem is said to be convex iff the target(or cost) function as well as the constraints are convex • the optimization problem for SVM is convex • the solution to a convex problem, if it exist, is unique. that is, there is no local optimum! • for convex optimization problem, KKT(Karush-Kuhn-Tucker) condition is necessary and sufficient for the solution

  18. KKT(Karush-Kuhn-Tucker) condition • KKT condition • The gradient of the Lagrangian with respect to the original variable is 0 • The original constraints are satisfied • Multipliers for inequality constraints • (Complementary KKT) product of multiplier and constraints equal to 0 • for convex optimize problems,1-4 are necessary and sufficient for the solution

  19. KKT condition for the optimization of margin • recall • KKT condition (3.66) (3.62) (3.63) (3.64) (3.65)

  20. KKT condition for the optimization of margin • Combining (3.66) with (3.62) (3.67) (3.68)

  21. Remarks-support vector • The optimal solution w is a linear combination of feature vectors which are associated with • support vectors are associated with

  22. Remarks-support vector The resulting hyperplane classifier is insensitive to the number and position of non-support vector

  23. Remark-computation w0 • can be implicitly obtained by any of the condition satisfying strict complement (i.e. ) • In practice, is computed as an average value obtained using all conditions of the type

  24. Remarks: Dual Representation

  25. Remark-optimal hyperplane is unique • the optimal hyperplane classifier of a support vector machine is unique and this is guaranteed by two condition • the cost function is a strict convex one • the inequality constraints consist of linear functions an optimization problem is said to be convex iff the target(or cost) function as well as the constraints are convex (the optimization problem for SVM is convex) the solution to a convex problem, if it exist, is unique. that is, there is no local optimum!

  26. Computation optimal Lagrange multiplier • It belongs to the convex programing family of problems • can be soved by considering the so called Lagrangian duality and can be stated equivalently by its Wolfe dual representation form (3.71) (3.72) (3.73) (3.74)

  27. Computation optimal Lagrange multiplier • once the optimal Langrangian multipliers have been computed, the optimal hyperplane is obtained (3.75) (3.76)

  28. Remarks • the cost function does not depend explicitly on the dimensionality of the input space • this allows for efficient generalizations in the case of nonlinearly separable classes

  29. Today’s Lecture • Support Vector Machines • Linear SVM • Maximal Margin • Non-linear Case • Soft Margin • Kernel Tricks • Summary • Other Classification Method • Combining Classifiers

  30. SVM for Non-separable Classes • in the case of non-separable, the training feature vector belong to one of the following three categories

  31. Two Approaches • Allow soft margins • Allowing soft margins means that if a training point is on the wrong side of the hyperplane then a cost will be applied to the point • Increase Dimensionality • By increasing the dimensionality of the data, the likelihood of the data becoming linearly separable increase dramatically

  32. Today’s Lecture • Support Vector Machines • Linear SVM • Maximal Margin • Non-linear Case • Soft Margin • Kernel Tricks • Summary • Other Classification Method • Combining Classifiers

  33. SVM for Non-separable Classes • All three cases can be treated under a single type constraints

  34. SVM for Non-separable Classes • The goal is • make the margin as large as possible • keep the number of points with as small as possible • (3.79) is intractable because of discontinuous function (3.79)

  35. SVM for Non-separable Classes • as common case, we choose to optimize a closely related cost function

  36. SVM for Non-separable Classes • to Langrangian

  37. SVM for Non-separable Classes • The corresponding KKT condition (3.85) (3.86) (3.87) (3.90) (3.88) (3.89)

  38. SVM for Non-separable Classes • The associated Wolfe dual representation now becomes

  39. SVM for Non-separable Classes • equivalent to

  40. Remarks-difference with the linearly separable case • Lagrange multipliers( ) need to be bounded by C • the slack variables, , and their associated Lagrange multipliers, , do not enter into the problem explicitly • reflected indirectly though C

  41. Today’s Lecture • Support Vector Machines • Linear SVM • Maximal Margin • Non-linear Case • Soft Margin • Kernel Tricks • summary • Other Classification Method • Combining Classifiers

  42. General SVM This classification problem clearly do not have a good optimal linear classifier. Can we do better? A non-linear boundary as shown will do fine.

  43. Remember the XOR problem?

  44. Remember the XOR problem?

More Related