Support Vector Machines

Support Vector Machines Presented by: Yasmin Anwar

Outlines • Introduction • Support Vector Machines • SVM Tools • SVM Applications • Text Classification • Conclusion

Introduction • Proposed by Boser, Guyon and Vapnik in 1992. • SVMs are one of the best classification methods. • Supervised learning • Non-probabilistic binary linear classifier

Find a linear hyperplane (decision boundary) that will separate the data Support Vector Machines

One Possible Solution Support Vector Machines

Another possible solution Support Vector Machines

Other possible solutions Support Vector Machines

Which one is better? B1 or B2? How do you define better? Support Vector Machines

Find hyperplane maximizes the margin => B1 is better than B2 Support Vector Machines

Support Vector Machines Support Vectors

Support Vector Machines

Support Vector Machines x2 • Noisy data, outliers, etc. • Slack variables ξi x1

Nonlinear Support Vector Machines • What if the problem is not linearly separable?

Nonlinear Support Vector Machines

Nonlinear Support Vector Machines • The kernel function transforms the data into a higher dimensional space to make it possible to perform the separation.

SVM tools • SVM software: • LibSVM(C++) • SVMLight(C) • The complete machine learning toolboxes that has SVMs: • Weka(Java) • RapidMinar

SVM Applications • SVMs are currently among the best performers for a number of classification tasks: • Text classification • Image classification • Microarray Gene Expression Data • Database Marketing

Text Classification • Step 1—text pre-processing • to pre-process text and represent each document as a feature vector • Step 2—training • to train a classifier using a classification tool (e.g. Weka, SVM-light) • Step 3—classification • to apply the classifier to new documents

Text Pre-Processing • Tokenization • Stop word removal: “a”, “the”, “I”, “he”, “she”, “is”, “are”, etc. • Stemming: teacher  teach • Feature extraction: • To use TF (term frequency) as feature value • To use TF*IDF (inverse document frequency) as feature value • IDF = log (total-number-of-documents / number-of-documents-containing-t)

Text Pre-Processing • Tokenization Example

SVM-light • SVM-light: a command line C program that implements the SVM learning algorithm • Classification, regression, ranking • Download at http://svmlight.joachims.org/ • Documentation on the same page • Two programs • svm_learn for training • svm_classify for classification

SVM-light Examples • Input format 1 1:0.5 3:1 5:0.4 -1 2:0.9 3:0.1 4:2 • Output format • Positive score  positive class • Negative score  negative class

Conclusion • SVM • Large Margin gives Better generalization ability and less over-fitting • Non-Linear SVM • Kernel function maps the data points to higher dimensional space in order to make them linearly separable.

References [1] P. H. Chen, C. J. Lin, and B. Schölkopf, A tutorial on ν-support vector machines, Appl. Stoch. Models. Bus. Ind.2005, 21, 111-136. [2] http://svmlight.joachims.org/ [3] http://www-users.cs.umn.edu/~kumar/dmbook/index.php [4] https://agora.cs.illinois.edu/download/attachments/9642083 /Text+Categorization+using+SVM-light.ppt?version=1& modificationDate=1212615247000 [5] http://www.cs.waikato.ac.nz/ml/weka/ [6] http://www.clopinet.com/isabelle/Projects/SVM/applist.html [7] http://www.dtreg.com/svm.htm

Support Vector Machines