An Overview of Kernel-Based Learning Methods

An Overview of Kernel-Based Learning Methods Yan Liu Nov 18, 2003

Outline • Introduction • Theory Basis: • Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization • Kernel –based learning algorithm • Supervised learning: support vector machines(SVMs), kernel fisher discriminant (KFD) • Unsupervised learning: one class SVM , kernel PCA • Kernel design • Standard kernels • Making kernels from kernels • Application oriented kernels: Fisher kernel

Example Idea: map the problem into higher dimensional space. Let F be a potentially much higher dimensional feature space. Let f : X -> F, x->f(x) Learning problem now works with samples (f(x_1), y_1), . . . , (f(x_N)), y_N) in F × Y. Key : Can this mapped problem be classified in a “simple” way? Introduction

Exploring Theory: Roadmap

Reproducing Kernel Hilbert Space -1 • Inner product space: • Hilbert space: • Hilbert space is a complete inner product space, obeying the following:

Reproducing Kernel Hilbert Space -2 • Reproducing Kernel Hilbert Space (RKHS) • Gram matrix • given a kernel k(x, y), define the gram matrix to be Kij = k(xi, xj) • We say the kernel is positive definite when the corresponding gram matrix is positive definite • Definition of RKHS

Reproducing Kernel Hilbert Space -3 • Reproducing properties: • Comment • RKHS is a “bounded” Hilbert space • RKHS is a “smoothed” Hilbert space

Mercer’s Theorem-1 • Mercer’s Theorem • For discrete case, assume A is the Gram Matrix. If A is positive definite, then

Mercer’s Theorem-2 • Comment • Mercer’s theorem provides a concrete way to construct the basis for a RKHS • Mercer’s condition is the only constraint for a kernel: the corresponding gram matrix must be positive definite to be a kernel

Representer Theorem-1

Representer Theorem-2 • Comment • Representer theorem is a powerful result. It shows that although we search for the optimal solution in an infinite-dimension feature space, adding the regularization term reduces the problem to finite-dimensional space (training examples) • In reality, regularization and RKHS are equivalent.

Exploring Theory: Roadmap

Support Vector Machines-1quick overview

Support Vector Machines-3 • Parameter Sparsity • Most a_i are zeros • C: regularization constant • : slack variables

Support Vector Machines-4Optimization technique • Chunking: • Each step sovles the problem containing all non-zero a_I plus some of the a_I violating KKT conditions • Decomposition methods: SVM_light • The size of the subproblem is fixed, add and remove one sample in each iteration • Sequential minimal optimization (SMO) • Each iteration solves a quadratic problem of size two

Kernel Fisher Discriminant-1Overview of LDA • Fisher’s discriminant (or LDA): find the linear projection with the most discriminative direction • Maximizing the Rayleigh coefficient where S_w is the within class variance and S_B is between class variance. • Comparison with PCA

Kernel Fisher Discriminant-2 • KFD: solves the problem of Fisher’s linear discriminant to get a nonlinear discriminant in input space. • One can express w in terms of mapped training patterns: • The optimization problem for the KFD can be written as:

Kernel PCA -1 • The basic idea of PCA: find a set of orthogonal directions that capture most of the variance in the data. • However, sometimes the clusters are more than N (N is the number of dimensions) • Kernel PCA tries to map the data into a higher dimensional space and perform standard PCA. Using the kernel trick, we can do all our calculations in a lower dimension.

Kernel PCA -2 • Covariance matrix • By definition • Then we have • Define the gram matrix • At last we have: • Therefore we simply have to solve an eigenvalue problem on the Gram matrix.

Standard Kernels

Making kernels out of Kernels • Theorem: • K(x, z) = K1(x,z) + K2(x,z) • K(x, z) = aK1(x,z) • K(x, z) = K1(x,z) * K2(x, z) • K(x, z) = f(x) f(z) • K(x, z) = K3(Φ (x), Φ (y)) • Kernel selection

Fisher-kernel • Jaakolla and Haussler proposed using a generative model as a kernel in a discriminative (non-probabilistic) kernel classifier. • Build a HMM model for each family • Compute the fisher scores for each parameter in the HMM • Use scores as features and predict by SVM with RBF kernel • Good performance for protein family classification

An Overview of Kernel-Based Learning Methods

An Overview of Kernel-Based Learning Methods

Presentation Transcript

An Overview on Semi-Supervised Learning Methods

Overview of Kernel Methods

Kernel Methods: Basics

An Overview of Clustering Methods

Kernel Methods

Kernel Methods

Kernel methods

Machine Learning for Protein Classification: Kernel Methods

Kernel synchronization methods

Kernel – Based Methods

Kernel Methods

Overview of Kernel Methods (Part 2)

Kernel Methods

An overview and critique of methods

Comparing Kernel-based Learning Methods for Face Recognition

Kernel Density Estimation, Kernel Methods, and fast learning

Kernel Methods

Explicit Feature Methods for Accelerated Kernel Learning

Fast Methods for Kernel-based Text Analysis

Computational Learning Theory and Kernel Methods

An Overview of Kernel-Based Learning Methods

Kernel methods - overview