310 likes | 470 Views
Classification IV. Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn. Overview. Support Vector Machines. Linear Classifier. w. w · x + b =0. w · x + b <0. w · x + b >0. Distance to Hyperplane. x. x '. Selection of Classifiers. ?. Which classifier is the best?
E N D
Classification IV Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn
Overview • Support Vector Machines
Linear Classifier w w·x + b =0 w·x + b <0 w·x + b >0
Selection of Classifiers ? Which classifier is the best? All have the same training error. How about generalization?
Unknown Samples B A Classifier B divides the space more consistently (unbiased).
Margins Support Vectors Support Vectors
Margins • The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point. • Intuitively, it is safer to choose a classifier with a larger margin. • Wider buffer zone for mistakes • The hyperplane is decided by only a few data points. • Support Vectors! • Others can be discarded! • Select the classifier with the maximum margin. • Linear Support Vector Machines (LSVM) • Works very well in practice. • How to specify the margin formally?
Margins “Predict Class = +1” zone M=Margin Width x+ X- wx+b=1 “Predict Class = -1” zone wx+b=0 wx+b=-1
Objective Function • Correctly classify all data points: • Maximize the margin • Quadratic Optimization Problem • Minimize • Subject to
Lagrange Multipliers Dual Problem Quadratic problem again!
Solutions of w & b inner product
An Example x2 (1, 1, +1) x1 (0, 0, -1)
e11 e2 wx+b=1 e7 wx+b=0 wx+b=-1 Soft Margin
x 0 x 0 x2 Non-linear SVMs x
Feature Space x2 x22 Φ: x→ φ(x) x1 x12
Feature Space x2 Φ: x→ φ(x) x1
Quadratic Basis Functions Constant Terms Number of terms Linear Terms Pure Quadratic Terms Quadratic Cross-Terms
Kernel Trick • The linear classifier relies on dot products between vectors xi·xj • If every data point is mapped into a high-dimensional space via some transformation Φ: x→ φ(x), the dot product becomes: φ(xi)·φ(xj) • A kernel function is some function that corresponds to an inner product in some expanded feature space: K(xi, xj) = φ(xi)·φ(xj) • Example: x=[x1,x2]; K(xi, xj) = (1 + xi ·xj)2
String Kernel Similarity between text strings: Car vs. Custard
More Maths … Quadratic Optimization Lagrange Duality Karush–Kuhn–Tucker Conditions
Reading Materials • Text Book • NelloCristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000. • Online Resources • http://www.kernel-machines.org/ • http://www.support-vector-machines.org/ • http://www.tristanfletcher.co.uk/SVM%20Explained.pdf • http://www.csie.ntu.edu.tw/~cjlin/libsvm/ • A list of papers uploaded to the web learning portal • Wikipedia & Google
Review • What is the definition of margin in a linear classifier? • Why do we want to maximize the margin? • What is the mathematical expression of margin? • How to solve the objective function in SVM? • What are support vectors? • What is soft margin? • How does SVM solve nonlinear problems? • What is so called “kernel trick”? • What are the commonly used kernels?
Next Week’s Class Talk • Volunteers are required for next week’s class talk. • Topic : SVM in Practice • Hints: • Applications • Demos • Multi-Class Problems • Software • A very popular toolbox: Libsvm • Any other interesting topics beyond this lecture • Length: 20 minutes plus question time