An introduction to support vector machine (SVM)

An introduction to support vector machine (SVM) Advisor : Dr.Hsu Graduate : Ching –Wen Hong

Outline • 1.SVM : A brief overview • 2.Simple SVM : Linear classifier for separable data • 3.Simple SVM : Linear classifier for non-separable data • 4.Conclusion

SVM : A brief overview • 1-1 What is a SVM ? • a family of learning algorithm for classification of objects into two classes . • Input : a training set {(x1,y1),…,(xl ,yl)} of object xi E Ŕ(n-dim vector space) and their known classes yi E {-1,+1}. • Output : a classifier f :Ŕ→ {-1,+1}.which predicts the class f(x) for any (new) object x E Ŕ

1-2 Pattern recognition example

1-3 Example of classification tasks • Optical character recognition : x is an image , y is a character. • Text classification : x is a text , y is a category . • Medical diagnosis : x is a set of features (age , sex, blood type, genome,…) , y indicates the risk.

1-4 Are there other methods for classification ? • Bayesian classifier (base on maximum a posterior probability) • Fisher linear discriminant • Neural networks • Expert system (rule-based) • Decision tree • …

1-5 Why is it gaining popularity ? • Good performance in real-world applications. • Computational efficiency. • Robust in high dimension. • No strong hypothesis on the data generation process (contrary to Bayesian approach).

2.Simplest SVM :Linear SVM for separable training sets • a training set S= {(x1,y1),…,(xl ,yl)} , xi E Ŕ, yi E {-1,+1}. • 2-1 Linearly separable training set

2-2 Linear classifier

2-3 Which one is the best ?

2-4 How to find the optimal hyperplane? • xi·w+b≥+1 for yi=+1 (1) yi(xi·w+b)-1≥0 ,i=1,…,l • xi·w+b≤-1 for yi= -1 (2) , w is the Normal vector of H1,H2 • H1: xi·w+b=1 ,H2: xi·w+b=-1 • Margin=2/║w║, ◎ is a support vector.

2-5 Finding the optimal hyperplane • The optimal hyperplane is defined by the pair (w,b). • Solve the linear program problem • Min ½║w║² • st. yi(xi·w+b)-1≥0 ,i=1,…,l • This is a class quadratic(convex) program

2-6 Lagrange Method

2-7 Recovery the optimal hyperplane • Once αi ,i=1,..,l is found. we recover (w,b) corresponding to the optimal hyperplane , w is given by w=∑ αi yixi and the decision function f(x)=w·x+b

2-8 Solving the dual problem

2-9 The Karush-Kahn-Tucker conditions • The KKT conditions are necessary and sufficient for w,b,α to be solution,Thus solving the SVM problem is equivalent to finding asolution to the KKT conditions. • From the KKT conditions,we can the following conclusion,If αi>0 then yi(w·xi+b)=1 and xi is a support vector • If all other training points(αi=0) were removed and training was repeated ,the separating hyperplane would be found.

2-10 Examples by Pictures

3.Simplest SVM :Linear classifier for non-separable data • 3-1 Finding the optimal hyperplane • Solve the linear program problem • Min ½║w║²+C(∑εi) , c is a extreme large value • S.t. yi(xi·w+b)-1+εi≥0 , εi≥0, 0≤αi≤c ,i =1,…,l

3-2 Lagrange Method

Simplest SVM :Conclusion • Finds the optimal hyperplane , which corresponds to the largest margin • Can be solved easily using a dual formulation • The solution is sparse : the number of support vectors can be very small compared to the size of the training set • Only support vectors are important for prediction of future points. All other points can be forgotten.

An introduction to support vector machine (SVM)