Support Vector Machines

Support Vector Machines Graphic generated with Lucent TechnologiesDemonstration 2-D Pattern Recognition Applet athttp://svm.research.bell-labs.com/SVT/SVMsvt.html

Class -1 Class 1 Separating Line (or hyperplane) • Goal: Find the best line (or hyperplane) to separate the training data. How to formalize? • In two dimensions, equation of the line is given by: • Better notation for n dimensions:

Class -1 Class 1 Simple Classifier • The Simple Classifier: • Points that fall on the right are classified as “1” • Points that fall on the left are classified as “-1” • Therefore: using the training set, find a hyperplane (line) so that • This is a perceptron! • How can we improve on the perceptron?

Finding the Best Plane • Not all planes are equal. Which of the two following planes shown is better? Class -1 Class 1 • Both planes accurately classify the training set. • The green plane is the better choice, since it is more likely to do well on future test data. • The green plane is further away from the data.

Separating the planes • Construct the bounding planes: • Draw two parallel planes to the classification plane. • Push them as far apart as possible, until they hit data points. • The classification plane with bounding planes furthest apart is the best one. Class -1 Class 1

Class -1 Class 1 Recap: Finding the Best Plane • Details • All points in class 1 should be to theright of bounding plane 1. • All points in class -1 should be to theleft of bounding plane -1. • Pick yi to be +1 or -1 depending on the classification. Then the above two inequalities can be written as one: • The distance between bounding planes should be maximized. • The distance between bounding planes is given by:

The Optimization Problem • The previous slide can be rewritten as: such that • This is a mathematical program. • Optimization problem subject to constraints • More specifically, this is a quadratic program • There are high powered software tools for solving this kind of problem (both commercial and academic) • No special algorithms are necessary (in theory...) • Just enter this problem and the associated data into a quadratic programming solver (like CPLEX), and let it find an answer.

error • Find the plane that maximizes the margin and minimizes the errors on the training points. • Take original inequality and add a slack variable to measure error: Data Which is Not Linearly Separable • What if a separating plane does not exist? Class -1 Class 1

The Support Vector Machine • Push the planes apart and minimize the error at the same time: such that • C is a positive number that is chosen to balance these two goals. • This problem is called a Support Vector Machine, or SVM. • The SVM is one of many techniques for doing supervised machine learning • Others: Neural networks, decision trees, k-nearest neighbor

Terminology • Those points that touch the bounding plane, or lie on the wrong side, are called support vectors. • If all the data points except the support vectors were removed, the solution would turn out the same. • The SVM is mathematically equivalent to force and torque equilibrium (hence the name support vectors).

Research: Solving Massive SVMs • The standard SVM is solved using a canned quadratic programming (QP) solver. Problem: • Standard tools bring all the data into memory. If dataset is bigger than memory, out of luck. • How do other supervised learning techniques handle data that does not fit in memory? • Why not use virtual memory? Let the operating system manage which data the QP solver is using. • Answer: The QP solver accesses data in a random, not a continuous fashion. The cost to page data in and out of memory is enormous.

What about nonlinear surfaces? • Some datasets may not be best separated by a plane. • How can we do nonlinear separating surfaces? • Simple method: Map into a higher dimensional space, and do the same thing we have already done. Generated with Lucent TechnologiesDemonstration 2-D Pattern Recognition Applet athttp://svm.research.bell-labs.com/SVT/SVMsvt.html

Finding nonlinear surfaces • How to modify algorithm to find nonlinear surfaces? • First idea (simple and effective): map each data point into a higher dimensional space, and find a linear fit there • Example: Find a quadratic surface for • Use new coordinates in regular linear SVM • A plane in this quadratic space is equivalent to a quadratic surface in our original space.

Problem & Solution • If dimensionality of space is high, lots of calculations • For a high polynomial space, combinations of coordinates explodes • Need to do all these calculations for all training points, and for each testing point • Infinite dimensional spaces impossible • Nonlinear surfaces can be used without these problems through the use of a kernel function. • Demonstration: http://svm.cs.rhul.ac.uk/pagesnew/GPat.shtml

Example: Checkerboard

5-Nearest Neighbor

Sixth degree polynomial kernel

Support Vector Machines