Classification IV

Classification IV Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn

Overview • Support Vector Machines

Linear Classifier w w·x + b =0 w·x + b <0 w·x + b >0

Distance to Hyperplane x x'

Selection of Classifiers ? Which classifier is the best? All have the same training error. How about generalization?

Unknown Samples B A Classifier B divides the space more consistently (unbiased).

Margins Support Vectors Support Vectors

Margins • The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point. • Intuitively, it is safer to choose a classifier with a larger margin. • Wider buffer zone for mistakes • The hyperplane is decided by only a few data points. • Support Vectors! • Others can be discarded! • Select the classifier with the maximum margin. • Linear Support Vector Machines (LSVM) • Works very well in practice. • How to specify the margin formally?

Margins “Predict Class = +1” zone M=Margin Width x+ X- wx+b=1 “Predict Class = -1” zone wx+b=0 wx+b=-1

Objective Function • Correctly classify all data points: • Maximize the margin • Quadratic Optimization Problem • Minimize • Subject to

Lagrange Multipliers Dual Problem Quadratic problem again!

Solutions of w & b inner product

An Example x2 (1, 1, +1) x1 (0, 0, -1)

e11 e2 wx+b=1 e7 wx+b=0 wx+b=-1 Soft Margin

Soft Margin

x 0 x 0 x2 Non-linear SVMs x

Feature Space x2 x22 Φ: x→ φ(x) x1 x12

Feature Space x2 Φ: x→ φ(x) x1

Quadratic Basis Functions Constant Terms Number of terms Linear Terms Pure Quadratic Terms Quadratic Cross-Terms

Calculation of Φ(xi )·Φ(xj)

It turns out …

Kernel Trick • The linear classifier relies on dot products between vectors xi·xj • If every data point is mapped into a high-dimensional space via some transformation Φ: x→ φ(x), the dot product becomes: φ(xi)·φ(xj) • A kernel function is some function that corresponds to an inner product in some expanded feature space: K(xi, xj) = φ(xi)·φ(xj) • Example: x=[x1,x2]; K(xi, xj) = (1 + xi ·xj)2

Kernels

String Kernel Similarity between text strings: Car vs. Custard

Solutions of w & b

Decision Boundaries

More Maths … Quadratic Optimization Lagrange Duality Karush–Kuhn–Tucker Conditions

SVM Roadmap

Reading Materials • Text Book • NelloCristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000. • Online Resources • http://www.kernel-machines.org/ • http://www.support-vector-machines.org/ • http://www.tristanfletcher.co.uk/SVM%20Explained.pdf • http://www.csie.ntu.edu.tw/~cjlin/libsvm/ • A list of papers uploaded to the web learning portal • Wikipedia & Google

Review • What is the definition of margin in a linear classifier? • Why do we want to maximize the margin? • What is the mathematical expression of margin? • How to solve the objective function in SVM? • What are support vectors? • What is soft margin? • How does SVM solve nonlinear problems? • What is so called “kernel trick”? • What are the commonly used kernels?

Next Week’s Class Talk • Volunteers are required for next week’s class talk. • Topic : SVM in Practice • Hints: • Applications • Demos • Multi-Class Problems • Software • A very popular toolbox: Libsvm • Any other interesting topics beyond this lecture • Length: 20 minutes plus question time

Classification IV

Classification IV

Presentation Transcript

Classification

Classification

Classification

UNIT IV DIVERSITY OF LIVING THINGS Scientific Classification

Classification

Classification

DSM IV CLASSIFICATION SYSTEM PRESENTED BY-

IV. UNIT IV

Classification Techniques: Bayesian Classification

CLASSIFICATION

Classification

Classification

Classification

Classification

Classification