Kernel Methods and SVM’s

Kernel Methods and SVM’s

Predictive Modeling Goal: learn a mapping: y = f(x;) Need: 1. A model structure 2. A score function 3. An optimization strategy Categorical y  {c1,…,cm}: classification Real-valued y: regression Note: usually assume {c1,…,cm} are mutually exclusive and exhaustive

Simple Two-Class Perceptron Initialize weight vector Repeat one or more times (indexed by k): For each training data point xi If … endIf “gradient descent”

Perceptron Dual Form Notice that ends up as a linear combination of yjxj: Thus: +ve; bigger for “harder” examples This leads to a dual form of the learning algorithm:

Perceptron Dual Form Initialize weight vector Repeat until no more mistakes For each training data point xi If … endIf Note: the training data only enter the algorithm via This is generally true for linear models (eg linear regression, ridge regression).

Learning in Feature Space We have already seen the idea of changing the representation of the predictors: is called the feature space

Linear Feature Space Models Now consider models of the form: equivalently: A kernel is a function K, such that for all x,zX where  is a mapping from X to an inner product feature space F just need to know K, not  !

Making Kernels What properties must K satisfy to be a kernel? 1. Symmetry 2. Cauchy-Schwarz + other conditions

K “pos. semi-definite” Mercer’s Theorem Mercer’s Theorem gives necessary and sufficient conditions for a continuous symmetric function K to admit this representation: “Mercer Kernels” This kernel defines a set of functions HK, elements of which have an expansion as: So, some kernels correspond to infinite numers of transformed predictor variables

Reproducing Kernel Hilbert Space Define an inner product in this function space as: Note then that: This is the reproducing property of HK Also note, Mercer kernel implies:

Regularization and RKHS A general class of regularization problems has the form: Suppose f lives in a RKHS with Some loss function (e.g. squared loss) Penalize complex f and Let: Then need to solve this “easy” problem:

RKHS Examples For regression with squared error loss, have so that: generalizes smoothing splines… Choosing: leads to the thin-plate spline models

Support Vector Machine Two-class classifier with the form: parameters chosen to minimize: Many of the fitted ’s are usually zero; x’s corresponding the the non-zero ’s are the support vectors.

Kernel Methods and SVM’s

Kernel Methods and SVM’s

Presentation Transcript

Powerpoint presentation

PPT Presentation

PowerPoint presentation

talk-ppt - PowerPoint Presentation

Kernel – Based Methods

Kernel Methods – Gaussian Processes

Kernel Methods

Kernel Methods

Kernel Methods Part 2

Overview of Kernel Methods

Kernel Methods: Basics

Kernel Methods

Kernel Methods

Kernel methods

Support Vector Machines and Kernel Methods

Neural Networks and Kernel Methods

Kernel synchronization methods

Kernel – Based Methods

Kernel Methods

Kernel Methods

Support Vector and Kernel Methods

Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods