Support Vector Machines

Support Vector Machines Andrew HorwitzMUMT621

A brief history… Developed by Vladimir Vapnik in the 90s at AT&T Standard algorithm developed by Vapnik and Corinna Cortes a few years later Used for image analysis: identification of characters and human form recognition.

How would you classify this data? Black dots = +1 White dots = 0/-1 (technically boolean values) (graph adapted from Moore 2003)

Optimal Hyperplane–“Maximum Margin Linear Classifier” • Goal is to find the line farthest from the closest points • The line is placed equidistant from those points • “Margin” refers to the distance from the closest black point(s) to the closest white point(s)(graph taken from Moore 2003)

Plus Planes and Minus Planes (graphic taken from Moore 2003) • For the boolean interpretation: • Black dot support vectors = “plus plane”, white = “minus” • These are lines on a Cartesian plane… • wx + b • These are parallel lines… • Rescale the graph…

Plus Zones and Minus Zones (graphic taken from Moore 2003) • For a value w and a value b: • Plus-plane = { x:wx+ b = +1 }, plus values > 1 • Minus-plane = {x:wx+ b = -1 }, minus values < -1

Calculating the Margin • Keep in mind, this set is linearly separable… • Let’s take a point x- on the minus plane and the closest point x+ on the plus plane. • W is perpendicular to these planes; we can say that x+ =x- +λw for some value λ. (graphic taken from Moore 2003)

Calculating the Margin • We know: • w*x+ + b = 1 • w*x-+ b = -1 • x+ =x- +λw • Using the first and third: • w*(x- +λw) + b = 1 • w*x- + b + λ(w*w) = 1 • -1 + λ(w*w) = 1 • λ = 2/(w*w) (graphic taken from Moore 2003)

Calculating the Margin • Given: • λ = 2/(w*w) • Margin width = |λw| • |λw| = λ*sqrt(w*w) • = 2*sqrt(w*w) (w*w) • = 2 sqrt(w*w) (graphic taken from Moore 2003)

Conclusions about the 2D perfect case Width of the margin is This is great! This is also reverse-engineering! How do we get it from the data? And what happens if a point is misclassified?

Perceptron algorithm Perceptron algorithm: cycle through (x, y) pairs and keep adjusting w and b each time. If the dataset is separable by a hyperplane, w will eventually converge. If it is, this takes a long time; if it isn’t, w will never stabilize.

Lagrange Multipliers • Looking at a 1D example on right: it is not linearly divisible. • Add a new variable(the Lagrange multiplier), and go through every (x, y) datapoint and adjust w accordingly. • In 2D 3D, this can result in a non-linear classifier and turn out overly complicated. (above taken from Moore 2003) (above taken from Dogan 2008)

Soft Margin (Graphic taken from Zisserman 2014)

Uses in MIR • Playlist creation via relevance comparison (Mandel et al.) • Machine trains SVM on user-labeled examples. • SVM considers six characteristics of the songs • Based on a seed song from their dataset, presents user with songs that are farthest from the “decision boundary.” • Compared what users thought of style, artist, and mood similarities based off SVM results’ ordering

Uses in MIR (cont.) • Emotion detection (Zhou et al.) • User searches “I feel happy today and I need a song” – “tough” queries • Compares lyrics to tags to lyric-and-tag searches • Found that SVM models were most effective and that lyric-and-tag searches were better than lyric or tag

My use • Computer vision! • Examples are from human form recognition • Positive weights: likely that segment belongs to a human • Negative: opposite • 16*8 sections of the image * 8 possible orientations = R1024 • OpenCV (images from Ramanan through Zisserman)

SVM Resources http://www.autonlab.org/tutorials/svm15.pdf http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html http://www.kernel-machines.org/Thanks!

Conclusions About the 2D Perfect Case • Keep in mind, this set is linearly separable… • Imagine two points on the hyperplane, xa and xb • w*xa+b = w*xb+b = 0 • Then: w*(xa – xb) = 0, w must be perpendicular to the optimal hyperplane • Let’s calculate the margin…

Support Vector Machines