1 / 41

Support Vector Machine

Support Vector Machine. 20005047 김성호 20013167 김태윤. SVM concept. New implementation of SRM Mapping high-dimension using nonlinear function Linear function used to approximate Duality theory. Implementation of SVM. ^. x. g(x). z. w  z. y. e.x.). ax 1 +bx 2 +c. (a,b) • (2,3)+c.

dasha
Download Presentation

Support Vector Machine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machine 20005047 김성호 20013167 김태윤

  2. SVM concept • New implementation of SRM • Mapping high-dimension using nonlinear function • Linear function used to approximate • Duality theory

  3. Implementation of SVM ^ x g(x) z w z y

  4. e.x.) ax1+bx2+c (a,b)•(2,3)+c • Compact equation i = 1, …, n Optimal separating hyperplane • Hyperplane Decision Function If yi = +1 If yi = -1, i = 1, … , n

  5. Maximum margin hyperplan optimal hyperplan hyperplan

  6. Optimal condition (1) • Margin - >  • Optimal : if margin is the maximum size • For all training patterns where

  7. Optimal condition (2) • Infinite number of sol. • Btw.  and norm of w • Maximizing  minimizing

  8. D(x) = 0 D(x) > +1 D(x) = +1 D(x) < -1

  9. Optimal condition (3) • Optimal condition • Larger margin, more separation 1. 2. minimize

  10. What is S.V. • Support vector : exist at the margin most difficult define decision surface

  11. En[ # of support vector ] n Bound of error rate • # of SV provides a bound on the expectation of error rate for test sample – Vapnik (1995) En[Error rate]≤ • Bound is independent of the dimensionality of the space

  12. VC – DIM. • Hyperplane functions in d-dim. VC-dim with is r : radius of smallest sphere on input vector - Vapnik (1995) - Control complexity indep. of dim.

  13. SVM with SRM • min min VC-dim  min confidence interval  min guaranteed risk • Thus, finding w and w0 minimize the function; constraint ;

  14. underfitting overfitting Confidence Interval Empirical Risk SRM True Risk Classification Error h(VC-dim.)

  15. Solving in High Dim. • Not solvable in high dimension • Using “duality theory” • And then applying “Kuhn-Tucker theorem” - Very high dimensions - moderate sample size(10,000)

  16. 제약 i 변수 i 목적함수 우변 Duality Theory • All linear problem have dual problem Original problem Dual problem

  17. Kuhn-Tucker Condition(1) Object function : f(x) Constraint function : gi(x) • f(x) is convex, differentiable • gi(x) are concave, differentiable => 비선형 문제의 최적해가 x* 일 필요조건은 KKT 조건을 모두 만족하는 m개의 수 u1, u2 , … um이 존재한다. Cont’

  18. KT condition (2) 1. 2. 3. i=1,2,…,n 4. j=1,2,…,n 5. 6. i=1,2,…,n

  19. Largrange multiplier Max(Min) f(x1,…, xn) s.t. g1(x1,…, xn)= b1 … gn(x1,…, xn)= bn Using Largrange multiplier (1,…, n), then Max(Min)

  20. Solving problem(1) • Two steps 1. Lagrange multipliers i : Lagrange multiplier

  21. Solving problem(2) 2. Kuhn-Tucker conditions Maximization w.r.t i - The solution w*, w0*, * should satisfy Cont’

  22. Solving problem(3) • Final Equation s.t.

  23. ξI is greater than zero if non-separable ξI is greater than one if misclassified Non-separable (1) • Nonseparable; if not satisfy in the margin or wrong side • Positive slack variable ξ i

  24. ξ1 = 1 - D(x1) y*D(x2)= 1 - ξ2 >0 y*D(x4) =1-ξ4 <0 ξ3 = 1 + D(x3) D(x) > +1 D(x) = -1 D(x) = +1 D(x) < -1 D(x) = 0

  25. Non-separable (2) • Number of nonseparable samples • Min Q(w) = Min Q(ξ) Where p is a small positive constant When p = 1  approximation

  26. Soft margin hyperplane(1) • Soft margin hyperplane min s.t.  min Using s.t. Sufficiently large (fixed) C

  27. Soft margin hyperplane(2) • C affects trade-off btw. Complexity and nonseparable samples • By duality theory Min S.T.

  28. High-dimensional mapping and inner product • Non-linear transformation: input space  gj(x) feature space Ex) Linear boundary  polynomial boundary x1, x2 -> third-order polynomial g1(x1,x2)= 1 g2(x1,x2)= x1 g3(x1,x2)= x2 g4(x1,x2)= x12 g5(x1,x2)= x22 g6(x1,x2)= x13 g7(x1,x2)= x23 g8(x1,x2)= x1x2 …… ………… g15(x1,x2)= x12x22 g16(x1,x2)= x13x23

  29. Decision functionwith inner product • Decision function • Dual form

  30. Inner product kernel • Polynomials of degree q • Radial basis functions • Neural network

  31. Kernel function • Properties of kernel function K(x,x’), x Rp • K(x,x’) takes on its maximum value when x’=x • |K(x,x’)| decreases with |x-x’| • K(x,x’) is a function of 2p variables generally

  32. Example(1) SVM For XOR(exclusive-or problem)

  33. Example(2)

  34. Example(3) • Kernel function: • Decision function: S.T.

  35. Example(4) • D(x1, x2)= x1*x2 ( )

  36. SVM FOR REGRESSION(1) • Estimation function: • Vapnik’s loss function

  37. SVM FOR REGRESSION(2) • Quadratic problem: S.T.

  38. SVM FOR REGRESSION(3) • Dual problem: S.T.

  39. SVM FOR REGRESSION(4) • Resulting estimation function:

  40. SVM FOR REGRESSION(5)

  41. SVM FOR REGRESSION(6)

More Related