Support Vector Machine

Support Vector Machine 20005047 김성호 20013167 김태윤

SVM concept • New implementation of SRM • Mapping high-dimension using nonlinear function • Linear function used to approximate • Duality theory

Implementation of SVM ^ x g(x) z w z y

e.x.) ax1+bx2+c (a,b)•(2,3)+c • Compact equation i = 1, …, n Optimal separating hyperplane • Hyperplane Decision Function If yi = +1 If yi = -1, i = 1, … , n

Maximum margin hyperplan optimal hyperplan hyperplan

Optimal condition (1) • Margin - >  • Optimal : if margin is the maximum size • For all training patterns where

Optimal condition (2) • Infinite number of sol. • Btw.  and norm of w • Maximizing  minimizing

D(x) = 0 D(x) > +1 D(x) = +1 D(x) < -1

Optimal condition (3) • Optimal condition • Larger margin, more separation 1. 2. minimize

What is S.V. • Support vector : exist at the margin most difficult define decision surface

En[ # of support vector ] n Bound of error rate • # of SV provides a bound on the expectation of error rate for test sample – Vapnik (1995) En[Error rate]≤ • Bound is independent of the dimensionality of the space

VC – DIM. • Hyperplane functions in d-dim. VC-dim with is r : radius of smallest sphere on input vector - Vapnik (1995) - Control complexity indep. of dim.

SVM with SRM • min min VC-dim  min confidence interval  min guaranteed risk • Thus, finding w and w0 minimize the function; constraint ;

underfitting overfitting Confidence Interval Empirical Risk SRM True Risk Classification Error h(VC-dim.)

Solving in High Dim. • Not solvable in high dimension • Using “duality theory” • And then applying “Kuhn-Tucker theorem” - Very high dimensions - moderate sample size(10,000)

제약 i 변수 i 목적함수 우변 Duality Theory • All linear problem have dual problem Original problem Dual problem

Kuhn-Tucker Condition(1) Object function : f(x) Constraint function : gi(x) • f(x) is convex, differentiable • gi(x) are concave, differentiable => 비선형 문제의 최적해가 x* 일 필요조건은 KKT 조건을 모두 만족하는 m개의 수 u1, u2 , … um이 존재한다. Cont’

KT condition (2) 1. 2. 3. i=1,2,…,n 4. j=1,2,…,n 5. 6. i=1,2,…,n

Largrange multiplier Max(Min) f(x1,…, xn) s.t. g1(x1,…, xn)= b1 … gn(x1,…, xn)= bn Using Largrange multiplier (1,…, n), then Max(Min)

Solving problem(1) • Two steps 1. Lagrange multipliers i : Lagrange multiplier

Solving problem(2) 2. Kuhn-Tucker conditions Maximization w.r.t i - The solution w*, w0*, * should satisfy Cont’

Solving problem(3) • Final Equation s.t.

ξI is greater than zero if non-separable ξI is greater than one if misclassified Non-separable (1) • Nonseparable; if not satisfy in the margin or wrong side • Positive slack variable ξ i

ξ1 = 1 - D(x1) y*D(x2)= 1 - ξ2 >0 y*D(x4) =1-ξ4 <0 ξ3 = 1 + D(x3) D(x) > +1 D(x) = -1 D(x) = +1 D(x) < -1 D(x) = 0

Non-separable (2) • Number of nonseparable samples • Min Q(w) = Min Q(ξ) Where p is a small positive constant When p = 1  approximation

Soft margin hyperplane(1) • Soft margin hyperplane min s.t.  min Using s.t. Sufficiently large (fixed) C

Soft margin hyperplane(2) • C affects trade-off btw. Complexity and nonseparable samples • By duality theory Min S.T.

High-dimensional mapping and inner product • Non-linear transformation: input space  gj(x) feature space Ex) Linear boundary  polynomial boundary x1, x2 -> third-order polynomial g1(x1,x2)= 1 g2(x1,x2)= x1 g3(x1,x2)= x2 g4(x1,x2)= x12 g5(x1,x2)= x22 g6(x1,x2)= x13 g7(x1,x2)= x23 g8(x1,x2)= x1x2 …… ………… g15(x1,x2)= x12x22 g16(x1,x2)= x13x23

Decision functionwith inner product • Decision function • Dual form

Inner product kernel • Polynomials of degree q • Radial basis functions • Neural network

Kernel function • Properties of kernel function K(x,x’), x Rp • K(x,x’) takes on its maximum value when x’=x • |K(x,x’)| decreases with |x-x’| • K(x,x’) is a function of 2p variables generally

Example(1) SVM For XOR(exclusive-or problem)

Example(2)

Example(3) • Kernel function: • Decision function: S.T.

Example(4) • D(x1, x2)= x1*x2 ( )

SVM FOR REGRESSION(1) • Estimation function: • Vapnik’s loss function

SVM FOR REGRESSION(2) • Quadratic problem: S.T.

SVM FOR REGRESSION(3) • Dual problem: S.T.

SVM FOR REGRESSION(4) • Resulting estimation function:

SVM FOR REGRESSION(5)

SVM FOR REGRESSION(6)

Support Vector Machine