420 likes | 446 Views
Support Vector Machine II. Jia-Bin Huang Virginia Tech. ECE-5424G / CS-5824. Spring 2019. Administrative. Please use piazza. No emails. HW 2 released. Support Vector Machine. Cost function Large margin classification Kernels Using an SVM. Support vector machine.
E N D
Support Vector Machine II Jia-Bin Huang Virginia Tech ECE-5424G / CS-5824 Spring 2019
Administrative • Please use piazza. No emails. • HW 2 released.
Support Vector Machine • Cost function • Large margin classification • Kernels • Using an SVM
Support vector machine If “y = 1”, we want ( If “y = 0”, we want( Slide credit: Andrew Ng
SVM decision boundary • Let’s say we have a very large • Whenever • Whenever Slide credit: Andrew Ng
SVM decision boundary: Linearly separable case Slide credit: Andrew Ng
SVM decision boundary: Linearly separable case margin Slide credit: Andrew Ng
Vector inner product length of vector length of projection of onto Slide credit: Andrew Ng
SVM decision boundary Simplication: What’s ? Slide credit: Andrew Ng
SVM decision boundary Simplication: small large large can be small Slide credit: Andrew Ng
Rewrite the formulation Let , with margin
Data not linearly separable? NP-hard
Convex relaxation NP-hard : slack variables
Hinge loss Image credit: https://math.stackexchange.com/questions/782586/how-do-you-minimize-hinge-loss
Hard-margin SVM formulation Soft-margin SVM formulation
Support Vector Machine • Cost function • Large margin classification • Kernels • Using an SVM
Non-linear classification • How do we separate the two classes using a hyperplane?
Kernel • a legal definition of inner product: s.t.
Why Kernels matter? • Many algorithms interact with data only via dot-products • Replace with • Act implicitly as if data was in the higher-dimensional -space
Example corresponds to
Example corresponds to Slide credit: Maria-Florina Balcan
Example kernels • Linear kernel • Gaussian (Radial basis function) kernel • Sigmoid kernel
Constructing new kernels • Positive scaling • Exponentiation • Addition • Multiplication with function • Multiplication
Non-linear decision boundary Predict Is there a different/better choice of the features ? Slide credit: Andrew Ng
Kernel Give , compute new features depending on proximity to landmarks , , Gaussian kernel Slide credit: Andrew Ng
Predict Ex: Slide credit: Andrew Ng
Choosing the landmarks • Given Predict Where to get Slide credit: Andrew Ng
SVM with kernels • Given • Choose , , • Given example : • For training example : Slide credit: Andrew Ng
SVM with kernels • Hypothesis: Given , compute features • Predict • Training (original) • Training (with kernel)
Support vector machines (Primal/Dual) • Primal form • Lagrangian dual form
SVM (Lagrangian dual) Classifier: • The points for which Support Vectors Replace with
SVM parameters • Large : Lower bias, high variance. Small Higher bias, low variance. • Large features vary more smoothly. • Higher bias, lower variance • Small features vary less smoothly. • Lower bias, higher variance Slide credit: Andrew Ng
SVM Demo • https://cs.stanford.edu/people/karpathy/svmjs/demo/
SVM song • https://www.youtube.com/watch?v=g15bqtyidZs Video source:
Support Vector Machine • Cost function • Large margin classification • Kernels • Using an SVM
Using SVM • SVM software package (e.g., liblinear, libsvm) to solve for • Need to specify: • Choice of parameter . • Choice of kernel (similarity function): • Linear kernel: Predict • Gaussian kernel: • , where • Need to choose . Need proper feature scaling Slide credit: Andrew Ng
Kernel (similarity) functions • Note: not all similarity functions make valid kernels. • Many off-the-shelf kernels available: • Polynomial kernel • String kernel • Chi-square kernel • Histogram intersection kernel Slide credit: Andrew Ng
Multi-class classification • Use one-vs.-all method. Train SVMs, one to distinguish from the rest, get • Pick class with the largest Slide credit: Andrew Ng
Logistic regression vs. SVMs • number of features (, number of training examples • If is large (relative to ): Use logistic regression or SVM without a kernel (“linear kernel”) • If is small, is intermediate: Use SVM with Gaussian kernel • If is small, is large: Create/add more features, then use logistic regression of linear SVM Neural network likely to work well for most of these case, but slower to train Slide credit: Andrew Ng
Things to remember • Cost function • Large margin classification • Kernels • Using an SVM margin