Sparse Kernels Methods

Sparse Kernels Methods Steve Gunn

Overview • Part I : Introduction to Kernel Methods • Part II : Sparse Kernel Methods

Part I • Introduction to • Kernel Methods

Classification • Consider 2 class problem

Optimal Separating Hyperplane

Optimal Separating Hyperplane Separate the data, with a hyperplane, such that the data is separated without error, and the distance between the closest vector to the hyperplane is maximal.

Solution The optimal hyperplane minimises, subject to the constraints, and is obtained by finding the saddle point of the Lagrange functional

Finding the OSH Quadratic Programming Problem • Size is dependent upon training set size • Unique global minimum

Support Vectors • Information contained in support vectors • Can throw away rest of training data • SVs have non zero Lagrange multipliers

Generalised Separating Hyperplane

Non Separable Case • Introduce slack Variables Minimise C is chosen a priori and determines trade-off to non-separable case.

Finding the GSH Quadratic Programming Problem Size is dependent upon training set size Unique global minimum

Non-Linear SVM • Map input space to high dimensional feature space Find OSH or GSH in Feature Space

Kernel Functions • Hilbert Schmidt Theory is a symmetric function Mercer’s Conditions

Polynomial Degree 2

Acceptable Kernel Functions • Polynomial Radial Basis Functions Multi-Layer Perceptrons

Iris Data Set

Generalisation Estimation Error Approximation Error Model Size Regression

Regression Approximate the data, with a hyperplane, using a loss function, e.g., and the SRM principle.

Solution Introduce slack variables and minimise subject to the constraints

Finding the Solution Quadratic Programming Problem • Size is dependent upon training set size • Unique global minimum where

Part I : Summary • Unique Global Minimum • Addresses Curse of Dimensionality • Complexity dependent upon data set size • Information contained in Support Vectors

Part II • Sparse Kernel Methods

Cyclic Nature of Empirical Modelling Design Induce Interpret Validate

Induction • SVMs have strong theory • Good empirical performance • Solution of the form, • Interpretation • Input Selection • Transparency

Additive Representation Additive structure Transparent Rejection of redundant inputs Unique decomposition

Sparse Kernel Regression Previously …. Now

The Priors • “Different priors for different parameters” • Smoothness – controls “overfitting” • Sparseness – enables input selection and controls overfitting

Sparse Kernel Model Replace the kernel with a weighted linear sum of kernels, And minimise the number of non-zero multipliers, along with the standard support vector optimisation, optimisation hard Solution sparse optimisation easier Solution sparse optimisation easier Solution NOT sparse

Choosing the Sub-Kernels • Avoid additional parameters if possible • Sub-models should be flexible

Spline Kernel

Tensor Product Splines The univariate spline which passes through the origin has a kernel of the form, And the multivariate ANOVA kernel is given by E.g. for a two input problem the ANOVA kernel is given by

Sparse ANOVA Kernel Introduce multipliers for each ANOVA term, And minimise the number of non-zero multipliers, along with the standard support vector optimisation,

Optimisation

Quadratic Loss

Epsilon-Insensitive Loss

Data ANOVA Basis Selection Sparse ANOVA Selection Parameter Selection Model Algorithm 3+ Stage Technique Auto-selection of Parameters Each stage consists of solving a convex, constrained optimisation problem. (QP or LP) • Capacity Control Parameter • cross-validation • Sparseness Parameter • Validation error Stage I

Sparse Basis Solution Quadratic Loss Function (Quadratic Program) e-Insensitive Loss Function (Linear Program)

AMPG Problem • Predict automobile MPG (392 samples) • Inputs: • no. of cylinders, displacement • horsepower, weight • acceleration, year • Output: • MPG

Horse Power 50 86 122 Horse Power 158 194 230 Network transparency through ANOVA representation.

SUPANOVA AMPG Results (=2.5) Loss Function Estimated Generalisation Error Stage I Stage III Linear Model Training Testing Mean Variance Mean Variance Mean Variance Quadratic Quadratic 6.97 7.39 7.08 6.19 11.4 11.0 e e Insensitive Insensitive 0.48 0.04 0.49 0.03 1.80 0.11 e Insensitive 1.10 0.07 1.37 0.10 Quadratic e Insensitive 7.07 6.52 7.13 6.04 11.72 10.94 Quadratic

AMPG Additive Terms

Summary • SUPANOVA is a global approach • Strong Basis (Kernel Methods) • Can control loss function and sparseness • Can impose limit on maximum variate terms • Generalisation + Transparency

Further Information • http://www.isis.ecs.soton.ac.uk/ • isystems/kernel/ • SVM Technical Report • MATLAB SVM Toolbox • Sparse Kernel Paper • These Slides

Sparse Kernels Methods

Sparse Kernels Methods

Presentation Transcript

Sparse Systems and Iterative Methods

Automatic Performance Tuning of Sparse Matrix Kernels

Automatic Performance Tuning of Sparse Matrix Kernels

Autotuning Sparse Matrix and Structured Grid Kernels

Numerical Methods for Sparse Systems

Automatic Performance Tuning of Sparse Matrix Kernels: Recent Progress

Auto-tuning Sparse Matrix Kernels

Sparse Matrix Methods

Autotuning sparse matrix kernels

Sparse Vector Methods

Autotuning sparse matrix kernels

Kernels

Automatic Performance Tuning Sparse Matrix Kernels

Sparse Matrix Methods

Automatic Performance Tuning of Sparse Matrix Kernels

Autotuning Sparse Matrix and Structured Grid Kernels

Sparse Matrix Methods

Kernels

Sparse Matrix Methods

Optimization of Sparse Matrix Kernels for Data Mining

Automatic Performance Tuning of Sparse Matrix Kernels: Recent Progress