240 likes | 261 Views
CH. 13: Kernel Machines. (A) Support Vector Machine (SVM). -- classifier, forward neural network, supervised learning. Difficulties with SVM: i) binary classifier, ii) linearly separable patterns. SVM finds optimal separating hyperplane ( OSH )
E N D
CH. 13: Kernel Machines (A) Support Vector Machine (SVM) -- classifier, forward neural network, supervised learning Difficulties with SVM: i) binary classifier, ii) linearly separable patterns
SVM finds optimal separating hyperplane (OSH) With the maximal margin between two support hyperplaneswhich are formed bysupport vectors.
Data points: Let the equation of the OSH be : normal vectorpoints toward positive data : distance to the origin e.g.,
Let : support hyperplanes Distances between them Then Rewrite Likewise, Margin:
Replace Minimizing Maximizing margin subject to ( ) are called satisfying support vectors Lagrange Multiplier Method – converts a constrained to an unconstrained problem.
The objective function: The optimal solution is given by the saddle point of , which is minimized w.r.t. w and b, while maximized w.r.t. i.e., . ThroughKarush–Kuhn–Tucker (KKT) conditions, L defined in the primal space of w, b, is translated to the dual space of
--- (A) --- (B) --- (C)
From (B), From (A), The problem becomes
After solving by letting find w by (A) . For non-support vectors, From (C), Support vectorsare those whose Determine b using any support vector. Consider any support vector : # support vectors
Overlapping patterns: the patterns that violate Define the constraint as Soft margin: : slack variables Two ways of violation:
Problem: Find a separating hyperplane for which • minimal (ii) (iii) minimal (soft error) Lagrange objective function in the primal space, C:penalty factor
ThroughKKT conditions, Dual space: the space of subject to Different from the separable case in that
e.g., 2D 3D (B) Kernel Machines 13.5 Kernel Trick Cover’s theorem: Make nonlinearly separable data linearly separable by mapping them from low to high dimensional space x : a vector in the original N-D space
: a set of functions that transform x to a space of infinite dimensionality. Let The OSH in the new space where 14
Substitute (2), (3) into (1), : kernel function Let 15
Mercer conditions: requirements of a kernel function A kernel function can be considered as a a measure of similarity between data points. 1. Symmetric 2. 3. 4.
13.6 Examples of Kernel Functions i) Linear kernel: ii) Polynomial kernel with degree d : e.g., d = 2
iii) Perceptron kernel: iv) Sigmoidal kernel: v) Radial basis functionkernel: 13.8 Multiple Kernel Learning A new kernel can be constructed by combining simpler kernels, e.g.,
(K > 2 classes) 13.9 Multiclass Kernel Machines • Train K 2-class classifiers , each one distinguishing one class from all other classes combined. During testing, 2. Train K(K-1)/2 pairwise classifiers 3. Train a single multiclass classifier
13.10 Kernel Machines for Regression • Consider a linear model Define constraints: : slack variables Problem: subject to constraints
The Lagrangian Through KKT conditions:
The dual: subject to
(a) The examples that fall in the tube have (b) The support vectors satisfy
The fitted line kernel function