Support Vector Machine (SVM)

Support Vector Machine(SVM) Presented by Robert Chen

Introduction • High level explanation of SVM • SVM is a way to classify data • We are interested in text classification

What is a SVM • “In essence, an SVM is a mathematical entity, an algorithm (or recipe) for maximizing a particular mathematical function with respect to a given collection of data.” William S Noble

What is a SVM • It is a computer algorithm that learns through the training data we provide in order to categorize new data in future cases. • SVM can’t cluster data, it can only classify data: we use SVD to cluster the data.

SVM hyperplanes • 1)Seperating hyperplane • 1d, 2d, 3d • 2)Maximum-margin hyperplane • Separates classes, while maintaining the maximal distance from any one of the given expression profiles • 3)Soft margin hyperplane • Generalized Optimal hyperplane (name used in Vapnik’s book)

Soft Margin Hyperplane • Allows some outlier data points to push their way through the margin of the separating hyperplane without affecting the final result • “Soft margin parameter specifies a trade-off between hyperplane violations and the size of the margin.” W. Noble

Soft Margin Hyper plane Suggested by Corinna Cortes and Vladimir Vapnik in 1995 Won the 2008 ACM Paris Kanellakis Award

Kernal function • Mathematical solution to determining the hyperplane when: • 1) No clear boundary • 2) Soft margin doesn’t help

Kernal Function • Projects data from a low dimensional state to a high dimensional state • We then project the SVM hyperplane in that state back to a lower drawable state such as 2-D. • Kernals that have a very high-dimension can result in the SVM overfitting the data.

Types of Kernels linear: K(xi, xj) = xiTxj . polynomial: K(xi, xj) = (γ xiT xj + r)d, γ > 0. radial basis function (RBF): K(xi, xj) = exp(−γ |xi − xj|^2), > 0 sigmoid: K(xi, xj) = tanh(γxiTxs + r).

Notes • radial basis function (RBF): K(xi, xj) = exp(−γ |xi − xj|^2), > 0 A radial basis function (rbf) is equivalent to mapping the data into an infinite dimensional Hilbert space

Example • Data Set: 1 dimensional set • Class, X1 • +1, 0 • -1, 1 • -1, 2 • +1, 3 • Φ(X1) = (X1, X1)

Support Vectors • <w · x> + b = +1 (positive labels) (1) • <w · x> + b = -1 (negative labels) (2) • <w · x> + b = 0 (hyperplane) (3) • Any vectors on expressions (1) or (2) are support vectors.

Importance of SVM in Support Vector Machines • Complexity of SVM depends on the number of support vectors rather that on the dimensionality of the feature space

Positive label • w1x1 + w2x2 + b = +1 • w10 + w20 + b = +1 • w13 + w29 + b = +1

Negative label • w11 + w21 + b = -1 • w12 + w24 + b = -1 • w1 = -3, w2 = 1, b = 1

Hyperplane • w1x1 + w2x2 + b = 0 • -3x1 + 1x2 + 1 = 0 • x2 = -1 + 3x1 • X1; X2 0, -1 1, 2 2, 5 3, 8

Maximum-Margin Hyperplane • 2/sqrt( w · w) • 2/sqrt(-32 + 12) margin = 0.632456

Recommended Article • What is a support vector machine? • By William S Noble

Recommended Article • Support Vector Machines for Text Categorization • A. Basu, C. Watters, and M. Shepherd • Faculty of Computer Science • Dalhousie University • Halifax, Nova Scotia, Canada B3H 1W5 • {basu | watters | shepherd@cs.dal.ca}

Recommended Book • The Nature of Statistical Learning Theory • By Vladimir N. Vapnik

Library doesn’t have this bookAuthor:Thorsten Joachims

Thank you • Questions? • Comments?

Multiclass SVM • Multiclass ranking SVMs, in which one SVM decision function attempts to classify all classes. • One-against-all classification, in which there is one binary SVM for each class to separate members of that class from members of other classes. • Pairwise classification, in which there is one binary SVM for each pair of classes to separate members of one class from members of the other.

Types of Kernels linear: K(xi, xj) = xiTxj . polynomial: K(xi, xj) = (γ xiT xj + r)d, γ > 0. radial basis function (RBF): K(xi, xj) = exp(−γ |xi − xj|^2), > 0 sigmoid: K(xi, xj) = tanh(γxiTxs + r).

Notes • radial basis function (RBF): K(xi, xj) = exp(−γ |xi − xj|^2), > 0 A radial basis function (rbf) is equivalent to mapping the data into an infinite dimensional Hilbert space

Support Vector Machine (SVM)