1 / 15

Pattern Recognition and Machine Learning: Kernel Methods

Pattern Recognition and Machine Learning: Kernel Methods. Overview. Many linear parametric models can be recast into an equivalent dual representation in which the predictions are based on linear combinations of a kernel function evaluated at the training data points

trapper
Download Presentation

Pattern Recognition and Machine Learning: Kernel Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Recognition and Machine Learning: Kernel Methods

  2. Overview • Many linear parametric models can be recast into an equivalent dual representation in which the predictions are based on linear combinations of a kernel function evaluated at the training data points • Kernel k(x,x’) = Ф(x)T Ф(x’) • Ф(x) is a fixed nonlinear feature space mapping • Kernel is symmetric of its arguments i.e. k(x,x’) = k(x’,x)

  3. Overview • Kernel trick or kernel substitution is the general idea that, if we have an algorithm formulated in such a way that the input vector x enters only in the form of scalar products, then we can replace the scalar product with some other choice of kernel • Stationary kernels – invariant to translations in input space • k(x,x’) = k(x-x’) • Homogeneous kernels (RBF) – depend only on the magnitude of the distance • k(x,x’) = k(||x-x’||)

  4. Dual Representations

  5. Constructing Kernels • Approach 1: Choose a feature space mapping and then use this to find the kernel

  6. Constructing Kernels • Approach 2: Construct kernel functions directly such that it corresponds to a scalar product in some feature space

  7. Constructing Kernels • A simpler way to test without having to construct Ф(x): • Use the necessary and sufficient condition that for a function k(x,x’) to be a valid kernel, the Gram matrix K, whose elements are given by k(xn,xm), should be positive semidefinite for all possible choices of the set {xn}

  8. Constructing Kernels • Another powerful technique is to build them out of simpler kernels

  9. Radial Basis Functions • Historically introduced for the purpose of exact function interpolation • The values of the coefficients are found by least squares • Since there are as many constraints as coefficients, results in a function that fits every target value exactly

  10. Radial Basis Functions • Imagine the noise on the input variable x, described by a variable ξ having a distribution (ξ), the sum of squares error function is • Basis function centred on every data point • Nadaraya-Watson model

  11. Nadaraya-Watson model • Imagine the noise on the input variable x, described by a variable ξ having a distribution (ξ), the sum of squares error function is • Basis function centred on every data point • Nadaraya-Watson model

  12. Nadaraya-Watson model • Imagine the noise on the input variable x, described by a variable ξ having a distribution (ξ), the sum of squares error function is • Basis function centred on every data point • Nadaraya-Watson model

  13. Nadaraya-Watson model • Can also be derived from kernel density estimation • where f(x,t) is the component density function and there is one such component centred on each data point • We now find an expression for the regression function y(x), corresponding to the conditional average of the target variable conditioned on the input variable

  14. Nadaraya-Watson model

  15. Nadaraya-Watson model • This model is also known as kernel regression • For a localized kernel function, it has the property of giving more weight to data points that a close to x

More Related