Kazuya Akimoto

Kazuya Akimoto Piet Mondriaan

Salvador Dalí

Radboud University Nijmegen Non-parametric non-linear classifiers Willem Melssen W.Melssen@science.ru.nl Institute for Molecules and Materials Analytical Chemistry & Chemometrics www.cac.science.ru.nl

Non-parametric non-linear classifiers • no assumptions regarding - mean - variance / covariance - normality of the distribution of the input data • non-linear relationship between input data and the corresponding output (class membership) • supervised techniques (input and output based)

Parametric and linear… equal (co-)variance LDA

Parametric and linear… linear separable classes LDA

…versus non-parametric, non-linear LDA ???

Some powerful classifiers • K Nearest Neighbours; • Artificial Neural Networks; • Support Vector Machines.

K Nearest Neighbours (KNN) • non-parametric classifier; (no assumptions regarding normality) • similarity based; (Euclidean distance, 1 - correlation) • matching to a set of classified objects. (decision based on consensus criterion)

KNN modelling procedure • use appropriate scaling of the selected training set; • select a similarity measure (Euclidean distance); • set the number of neighbours (K); • construct similarity matrix for a new object (unknown class) and the objects in the training set; • rank all similarity values in ascending order; • generate the class membership list; • consensus criterion determines the class; (e.g., the majority takes all) • validation of K value (cross-validation, test set)

Select a representative training set X2 X1

Label the data points (supervised) X2 X1 class A class B

Classify a new object X2 X1 class A class B

One neighbour: K = 1 X2 X1 class B class A class B

K = 3 X2 X1 class A class A class B

K = 2 X2 X1 class A class A or B: undecided class B

K = 11 X2 X1 5 A’s and 6 B’s: confidence? class A class B

Classification of brain tumours • Collaboration with the department of radiology UMCN, Nijmegen; EC project eTumour • Magnetic resonance imaging • Voxel-wise in-vivo NMR spectroscopy • Goal of the project: determination of type and grading of various brain tumours

T1 weighted T2 weighted proton density gadolinium Magnetic Resonance Imaging ventricles (CSF) tumour grey+white matter skull

Construction of data set

Image variables Quantitated values MRI combined with MRS

Average spectrum per tissue type PC2 (19.5%) PC1 (42.2%)

Results 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models • LDA: 90.0%± 2.0 [87.0 - 92.8] 0.1 sec • KNN: 95.4% ± 1.0 [92.2 - 97.2] 1.4 sec

Artificial Neural Networks (ANN) • non-parametric, non-linear, adaptive; • weights trained by an iterative learning procedure;

ANN architecture

Neuron or ‘unit’ summation (net) transfer function f(net) weighted input (dendrites, synapses) neuron (soma) distribution of output (axon)

Transfer functions exponential linear compressive

An easy one: the ‘and’ problem decision line

Two layer network (perceptron) sign(x1*w1 + x2*w2 – t) < 0 : class 0 sign(x1*w1 + x2*w2 – t) > 0 : class 1 Hey, this looks like LDA…

Logical ‘exclusive-or’ problem

No single decision line possible…

… but two lines will do

Multi-layer feed-forward ANN

Upper decision line

Lower decision line

Solution

How to get the weights: by learning • set network parameters (learning rate, number of hidden layers / units, transfer functions, etc); • initialise network weights randomly; • present an object; • calculate the ANN output; • adapt network weights to minimise output error; • repeat 3 – 5 for all training objects; • iterate until network converges / stop criterion; • evaluate network performance by an independent test set or by a cross-validation procedure.

Adapting the weights global minimum local minimum • adapt weights to minimise the output error E • weight changes controlled by the learning rate • error back propagation (from output to input layer) • Newton-Raphson, Levenberg-Marquardt, etc

Function of the hidden layer white: 0 black: 1 ??? (x, y) points on [0, 1] x [0, 1] grid specified output for the grid

Output of hidden layer units unit output for the [0, 1] x [0, 1] grid Combining linear sub-solutions yields a non-linear classifier…

When to stop training? Error not converged over-fitting test set training set Iteration number External validation set required to estimate the accuracy

Many solutions possible: not unique

Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models • LDA: 90.0%± 2.0 [87.0 - 92.8] 0.1 sec • KNN: 95.4% ± 1.0 [92.2 - 97.2] 1.4 sec • ANN: 93.2%± 3.5 [86.4 - 97.7] 316 sec

Support Vector Machines (SVMs) • kernel-based classifier; • transforms input space to a high-dimensional feature space; • exploits Lagrange formalism for the best solution; • binary (two-class) classifier.

A linear separable problem X2 class B class A X1 Goal: to find the optimal separating hyper plane

Optimal hyper plane X2 class B class A X1 no objects are allowed between boundaries, maximisation of distance: unique solution!

Support vectors X2 class B support vectors class A X1

Crossing the borderlines… X2 class B class A X1 solution: penalise these objects

Target Constraints Lagrange equation

Kazuya Akimoto

Kazuya Akimoto

Presentation Transcript

Masaki Tauchi PhD, Kazuya Kuro-oka , Takabun Nakamura PhD

Kazuya Koyama University of Portsmouth

11 April 2007 Kazuya KAKU Japan Aerospace Exploration Agency (JAXA)

Kazuya Mitsutani （ YITP ） in collaboration with Masakiyo Kitazawa （ Osaka Univ. ）

11 April 2007 Kazuya KAKU Japan Aerospace Exploration Agency (JAXA)

Kazuya Aoki For the PHENIX Collaborations. Kyoto Univ. / RIKEN