1 / 18

Support Vector Machines

Understand SVM's big idea, primal vs. dual versions, parameter implications, finding alpha, predicting, and support vectors. Learn more about kernels and kernel methods.

tariq
Download Presentation

Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machines Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

  2. From a linear classifier to ... *One of the most famous slides you will see, ever!

  3. The Big Idea X X X X X X O O X O O O O O O O

  4. Maximum margin Maximum possible separation between positive and negative training examples *One of the most famous slides you will see, ever!

  5. SUPPORT VECTORS Geometric Intuition X X X O O X O O

  6. Geometric Intuition SUPPORT VECTORS X X X O O X X O O

  7. Primal Version min ||w||2 +C ∑ξ s.t. (w.x + b)y ≥ 1-ξ ξ ≥ 0

  8. DUALVersion • Where did this come from? • Remember Lagrange Multipliers • Let us “incorporate” constraints into objective • Then solve the problem in the “dual” space of lagrange multipliers max ∑α-1/2 ∑αiαjyiyjxixj s.t. ∑αiyi = 0 C ≥ αi ≥ 0

  9. Primal vs Dual min ||w||2 +C ∑ξ s.t. (w.x + b)y ≥ 1-ξ ξ ≥ 0 max ∑α-1/2 ∑αiαjyiyjxixj s.t. ∑αiyi = 0 C ≥ αi ≥ 0 • Number of parameters? • large # features? • large # examples? • for large # features, DUAL preferred • many αican go to zero!

  10. DUAL: the “Support vector” version max ∑α- 1/2 ∑αiαjyiyjxixj s.t. ∑αiyi = 0 C ≥ αi ≥ 0 • How do we find α? • Quadratic programming • How do we find C? • Cross-validation! Wait... how do we predict y for a new point x?? How do we find w? How do we find b? y = sign(w.x+b) w = Σiαiyixi y= sign(Σiαiyixi xj + b)

  11. “Support Vector”s? max ∑α- 1/2 ∑αiαjyiyjxixj s.t. ∑αiyi = 0 C ≥ αi ≥ 0 y=w.x+b b = y-w.x x1: b = 1- .4 [-2 -1][0 1] = 1+.4 =1.4 b α2 max ∑α- α1α2(-1)(0+2) - 1/2 α12(1)(0+1) - 1/2 α22(1)(4+4) X (2,2) α1 O (0,1) max α1 + α2+ 2α1α2 - α12/2 - 4α22 s.t. α1-α2 = 0 C ≥ αi ≥ 0 α1=α2=α max 2α-5/2α2 max 5/2α(4/5-α) w = Σiαiyixi w = .4([0 1]-[2 2]) =.4[-2 -1] α1=α2=2/5 0 2/5 4/5

  12. “Support Vector”s? max ∑α- 1/2 ∑αiαjyiyjxixj s.t. ∑αiyi = 0 C ≥ αi ≥ 0 α2 X (2,2) What is α3? Try this at home α1 O (0,1) O α3

  13. Playing With SVMS • http://www.csie.ntu.edu.tw/~cjlin/libsvm/

  14. More on Kernels • Kernels represent inner products • K(a,b) = a.b • K(a,b) = φ(a) . φ(b) • Kernel trick is allows extremely complex φ( ) while keeping K(a,b) simple • Goal: Avoid having to directly construct φ( ) at any point in the algorithm

  15. Kernels Complexity of the optimization problem remains only dependent on the dimensionality of the input space and not of the feature space!

  16. Can we used Kernels to Measure Distances? • Can we measure distance between φ(a) and φ(b) using K(a,b)?

  17. Continued:

  18. Popular Kernel Methods • Gaussian Processes • Kernel Regression (Smoothing) • Nadarayan-Watson Kernel Regression

More Related