130 likes | 333 Views
Kernel Methods and SVM’s. Predictive Modeling. Goal: learn a mapping: y = f ( x ; ) Need: 1. A model structure 2. A score function 3. An optimization strategy Categorical y { c 1 ,…, c m } : classification Real-valued y : regression
E N D
Predictive Modeling Goal: learn a mapping: y = f(x;) Need: 1. A model structure 2. A score function 3. An optimization strategy Categorical y {c1,…,cm}: classification Real-valued y: regression Note: usually assume {c1,…,cm} are mutually exclusive and exhaustive
Simple Two-Class Perceptron Initialize weight vector Repeat one or more times (indexed by k): For each training data point xi If … endIf “gradient descent”
Perceptron Dual Form Notice that ends up as a linear combination of yjxj: Thus: +ve; bigger for “harder” examples This leads to a dual form of the learning algorithm:
Perceptron Dual Form Initialize weight vector Repeat until no more mistakes For each training data point xi If … endIf Note: the training data only enter the algorithm via This is generally true for linear models (eg linear regression, ridge regression).
Learning in Feature Space We have already seen the idea of changing the representation of the predictors: is called the feature space
Linear Feature Space Models Now consider models of the form: equivalently: A kernel is a function K, such that for all x,zX where is a mapping from X to an inner product feature space F just need to know K, not !
Making Kernels What properties must K satisfy to be a kernel? 1. Symmetry 2. Cauchy-Schwarz + other conditions
K “pos. semi-definite” Mercer’s Theorem Mercer’s Theorem gives necessary and sufficient conditions for a continuous symmetric function K to admit this representation: “Mercer Kernels” This kernel defines a set of functions HK, elements of which have an expansion as: So, some kernels correspond to infinite numers of transformed predictor variables
Reproducing Kernel Hilbert Space Define an inner product in this function space as: Note then that: This is the reproducing property of HK Also note, Mercer kernel implies:
Regularization and RKHS A general class of regularization problems has the form: Suppose f lives in a RKHS with Some loss function (e.g. squared loss) Penalize complex f and Let: Then need to solve this “easy” problem:
RKHS Examples For regression with squared error loss, have so that: generalizes smoothing splines… Choosing: leads to the thin-plate spline models
Support Vector Machine Two-class classifier with the form: parameters chosen to minimize: Many of the fitted ’s are usually zero; x’s corresponding the the non-zero ’s are the support vectors.