Support Vector Machine II

Support Vector Machine II Jia-Bin Huang Virginia Tech ECE-5424G / CS-5824 Spring 2019

Administrative • Please use piazza. No emails. • HW 2 released.

Support Vector Machine • Cost function • Large margin classification • Kernels • Using an SVM

Support vector machine If “y = 1”, we want ( If “y = 0”, we want( Slide credit: Andrew Ng

SVM decision boundary • Let’s say we have a very large • Whenever • Whenever Slide credit: Andrew Ng

SVM decision boundary: Linearly separable case Slide credit: Andrew Ng

SVM decision boundary: Linearly separable case margin Slide credit: Andrew Ng

Why large margin classifiers? margin

Vector inner product length of vector length of projection of onto Slide credit: Andrew Ng

SVM decision boundary Simplication: What’s ? Slide credit: Andrew Ng

SVM decision boundary Simplication: small large large can be small Slide credit: Andrew Ng

Rewrite the formulation Let , with margin

Data not linearly separable? NP-hard 

Convex relaxation NP-hard  : slack variables

Hinge loss Image credit: https://math.stackexchange.com/questions/782586/how-do-you-minimize-hinge-loss

Hard-margin SVM formulation Soft-margin SVM formulation

Non-linear classification • How do we separate the two classes using a hyperplane?

Non-linear classification

Kernel • a legal definition of inner product: s.t.

Why Kernels matter? • Many algorithms interact with data only via dot-products • Replace with • Act implicitly as if data was in the higher-dimensional -space

Example corresponds to

Example corresponds to Slide credit: Maria-Florina Balcan

Example kernels • Linear kernel • Gaussian (Radial basis function) kernel • Sigmoid kernel

Constructing new kernels • Positive scaling • Exponentiation • Addition • Multiplication with function • Multiplication

Non-linear decision boundary Predict Is there a different/better choice of the features ? Slide credit: Andrew Ng

Kernel Give , compute new features depending on proximity to landmarks , , Gaussian kernel Slide credit: Andrew Ng

Predict Ex: Slide credit: Andrew Ng

Choosing the landmarks • Given Predict Where to get Slide credit: Andrew Ng

SVM with kernels • Given • Choose , , • Given example : • For training example : Slide credit: Andrew Ng

SVM with kernels • Hypothesis: Given , compute features • Predict • Training (original) • Training (with kernel)

Support vector machines (Primal/Dual) • Primal form • Lagrangian dual form

SVM (Lagrangian dual) Classifier: • The points for which Support Vectors Replace with

SVM parameters • Large : Lower bias, high variance. Small Higher bias, low variance. • Large features vary more smoothly. • Higher bias, lower variance • Small features vary less smoothly. • Lower bias, higher variance Slide credit: Andrew Ng

SVM Demo • https://cs.stanford.edu/people/karpathy/svmjs/demo/

SVM song • https://www.youtube.com/watch?v=g15bqtyidZs Video source:

Using SVM • SVM software package (e.g., liblinear, libsvm) to solve for • Need to specify: • Choice of parameter . • Choice of kernel (similarity function): • Linear kernel: Predict • Gaussian kernel: • , where • Need to choose . Need proper feature scaling Slide credit: Andrew Ng

Kernel (similarity) functions • Note: not all similarity functions make valid kernels. • Many off-the-shelf kernels available: • Polynomial kernel • String kernel • Chi-square kernel • Histogram intersection kernel Slide credit: Andrew Ng

Multi-class classification • Use one-vs.-all method. Train SVMs, one to distinguish from the rest, get • Pick class with the largest Slide credit: Andrew Ng

Logistic regression vs. SVMs • number of features (, number of training examples • If is large (relative to ): Use logistic regression or SVM without a kernel (“linear kernel”) • If is small, is intermediate: Use SVM with Gaussian kernel • If is small, is large: Create/add more features, then use logistic regression of linear SVM Neural network likely to work well for most of these case, but slower to train Slide credit: Andrew Ng

Things to remember • Cost function • Large margin classification • Kernels • Using an SVM margin

Support Vector Machine II

Support Vector Machine II

Presentation Transcript

Support Vector Machine

Support vector machine

Support vector machine

Support vector machine

Support Vector Machine

Support Vector Machine

Support Vector Machine (SVM)

Support Vector Machine

Support Vector Machine

Support Vector Machine (SVM)

Support Vector Machine (SVM)

Support Vector Machine

Support Vector Machine

Support Vector Machine

Classification: Support Vector Machine

Support Vector Machine

Support Vector Machine (SVM)

Support Vector Machine

Support Vector Machine