410 likes | 614 Views
Support Vector Machine. 20005047 김성호 20013167 김태윤. SVM concept. New implementation of SRM Mapping high-dimension using nonlinear function Linear function used to approximate Duality theory. Implementation of SVM. ^. x. g(x). z. w z. y. e.x.). ax 1 +bx 2 +c. (a,b) • (2,3)+c.
E N D
Support Vector Machine 20005047 김성호 20013167 김태윤
SVM concept • New implementation of SRM • Mapping high-dimension using nonlinear function • Linear function used to approximate • Duality theory
Implementation of SVM ^ x g(x) z w z y
e.x.) ax1+bx2+c (a,b)•(2,3)+c • Compact equation i = 1, …, n Optimal separating hyperplane • Hyperplane Decision Function If yi = +1 If yi = -1, i = 1, … , n
Maximum margin hyperplan optimal hyperplan hyperplan
Optimal condition (1) • Margin - > • Optimal : if margin is the maximum size • For all training patterns where
Optimal condition (2) • Infinite number of sol. • Btw. and norm of w • Maximizing minimizing
D(x) = 0 D(x) > +1 D(x) = +1 D(x) < -1
Optimal condition (3) • Optimal condition • Larger margin, more separation 1. 2. minimize
What is S.V. • Support vector : exist at the margin most difficult define decision surface
En[ # of support vector ] n Bound of error rate • # of SV provides a bound on the expectation of error rate for test sample – Vapnik (1995) En[Error rate]≤ • Bound is independent of the dimensionality of the space
VC – DIM. • Hyperplane functions in d-dim. VC-dim with is r : radius of smallest sphere on input vector - Vapnik (1995) - Control complexity indep. of dim.
SVM with SRM • min min VC-dim min confidence interval min guaranteed risk • Thus, finding w and w0 minimize the function; constraint ;
underfitting overfitting Confidence Interval Empirical Risk SRM True Risk Classification Error h(VC-dim.)
Solving in High Dim. • Not solvable in high dimension • Using “duality theory” • And then applying “Kuhn-Tucker theorem” - Very high dimensions - moderate sample size(10,000)
제약 i 변수 i 목적함수 우변 Duality Theory • All linear problem have dual problem Original problem Dual problem
Kuhn-Tucker Condition(1) Object function : f(x) Constraint function : gi(x) • f(x) is convex, differentiable • gi(x) are concave, differentiable => 비선형 문제의 최적해가 x* 일 필요조건은 KKT 조건을 모두 만족하는 m개의 수 u1, u2 , … um이 존재한다. Cont’
KT condition (2) 1. 2. 3. i=1,2,…,n 4. j=1,2,…,n 5. 6. i=1,2,…,n
Largrange multiplier Max(Min) f(x1,…, xn) s.t. g1(x1,…, xn)= b1 … gn(x1,…, xn)= bn Using Largrange multiplier (1,…, n), then Max(Min)
Solving problem(1) • Two steps 1. Lagrange multipliers i : Lagrange multiplier
Solving problem(2) 2. Kuhn-Tucker conditions Maximization w.r.t i - The solution w*, w0*, * should satisfy Cont’
Solving problem(3) • Final Equation s.t.
ξI is greater than zero if non-separable ξI is greater than one if misclassified Non-separable (1) • Nonseparable; if not satisfy in the margin or wrong side • Positive slack variable ξ i
ξ1 = 1 - D(x1) y*D(x2)= 1 - ξ2 >0 y*D(x4) =1-ξ4 <0 ξ3 = 1 + D(x3) D(x) > +1 D(x) = -1 D(x) = +1 D(x) < -1 D(x) = 0
Non-separable (2) • Number of nonseparable samples • Min Q(w) = Min Q(ξ) Where p is a small positive constant When p = 1 approximation
Soft margin hyperplane(1) • Soft margin hyperplane min s.t. min Using s.t. Sufficiently large (fixed) C
Soft margin hyperplane(2) • C affects trade-off btw. Complexity and nonseparable samples • By duality theory Min S.T.
High-dimensional mapping and inner product • Non-linear transformation: input space gj(x) feature space Ex) Linear boundary polynomial boundary x1, x2 -> third-order polynomial g1(x1,x2)= 1 g2(x1,x2)= x1 g3(x1,x2)= x2 g4(x1,x2)= x12 g5(x1,x2)= x22 g6(x1,x2)= x13 g7(x1,x2)= x23 g8(x1,x2)= x1x2 …… ………… g15(x1,x2)= x12x22 g16(x1,x2)= x13x23
Decision functionwith inner product • Decision function • Dual form
Inner product kernel • Polynomials of degree q • Radial basis functions • Neural network
Kernel function • Properties of kernel function K(x,x’), x Rp • K(x,x’) takes on its maximum value when x’=x • |K(x,x’)| decreases with |x-x’| • K(x,x’) is a function of 2p variables generally
Example(1) SVM For XOR(exclusive-or problem)
Example(3) • Kernel function: • Decision function: S.T.
Example(4) • D(x1, x2)= x1*x2 ( )
SVM FOR REGRESSION(1) • Estimation function: • Vapnik’s loss function
SVM FOR REGRESSION(2) • Quadratic problem: S.T.
SVM FOR REGRESSION(3) • Dual problem: S.T.
SVM FOR REGRESSION(4) • Resulting estimation function: