660 likes | 827 Views
Kazuya Akimoto. Piet Mondriaan. Salvador Dalí. Radboud University Nijmegen. Non-parametric non-linear classifiers Willem Melssen W.Melssen@science.ru.nl Institute for Molecules and Materials Analytical Chemistry & Chemometrics www.cac.science.ru.nl. Non-parametric non-linear classifiers.
E N D
Kazuya Akimoto Piet Mondriaan
Radboud University Nijmegen Non-parametric non-linear classifiers Willem Melssen W.Melssen@science.ru.nl Institute for Molecules and Materials Analytical Chemistry & Chemometrics www.cac.science.ru.nl
Non-parametric non-linear classifiers • no assumptions regarding - mean - variance / covariance - normality of the distribution of the input data • non-linear relationship between input data and the corresponding output (class membership) • supervised techniques (input and output based)
Parametric and linear… equal (co-)variance LDA
Parametric and linear… linear separable classes LDA
Some powerful classifiers • K Nearest Neighbours; • Artificial Neural Networks; • Support Vector Machines.
K Nearest Neighbours (KNN) • non-parametric classifier; (no assumptions regarding normality) • similarity based; (Euclidean distance, 1 - correlation) • matching to a set of classified objects. (decision based on consensus criterion)
KNN modelling procedure • use appropriate scaling of the selected training set; • select a similarity measure (Euclidean distance); • set the number of neighbours (K); • construct similarity matrix for a new object (unknown class) and the objects in the training set; • rank all similarity values in ascending order; • generate the class membership list; • consensus criterion determines the class; (e.g., the majority takes all) • validation of K value (cross-validation, test set)
Label the data points (supervised) X2 X1 class A class B
Classify a new object X2 X1 class A class B
One neighbour: K = 1 X2 X1 class B class A class B
K = 3 X2 X1 class A class A class B
K = 2 X2 X1 class A class A or B: undecided class B
K = 11 X2 X1 5 A’s and 6 B’s: confidence? class A class B
Classification of brain tumours • Collaboration with the department of radiology UMCN, Nijmegen; EC project eTumour • Magnetic resonance imaging • Voxel-wise in-vivo NMR spectroscopy • Goal of the project: determination of type and grading of various brain tumours
T1 weighted T2 weighted proton density gadolinium Magnetic Resonance Imaging ventricles (CSF) tumour grey+white matter skull
Image variables Quantitated values MRI combined with MRS
Average spectrum per tissue type PC2 (19.5%) PC1 (42.2%)
Results 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models • LDA: 90.0%± 2.0 [87.0 - 92.8] 0.1 sec • KNN: 95.4% ± 1.0 [92.2 - 97.2] 1.4 sec
Artificial Neural Networks (ANN) • non-parametric, non-linear, adaptive; • weights trained by an iterative learning procedure;
Neuron or ‘unit’ summation (net) transfer function f(net) weighted input (dendrites, synapses) neuron (soma) distribution of output (axon)
Transfer functions exponential linear compressive
An easy one: the ‘and’ problem decision line
Two layer network (perceptron) sign(x1*w1 + x2*w2 – t) < 0 : class 0 sign(x1*w1 + x2*w2 – t) > 0 : class 1 Hey, this looks like LDA…
How to get the weights: by learning • set network parameters (learning rate, number of hidden layers / units, transfer functions, etc); • initialise network weights randomly; • present an object; • calculate the ANN output; • adapt network weights to minimise output error; • repeat 3 – 5 for all training objects; • iterate until network converges / stop criterion; • evaluate network performance by an independent test set or by a cross-validation procedure.
Adapting the weights global minimum local minimum • adapt weights to minimise the output error E • weight changes controlled by the learning rate • error back propagation (from output to input layer) • Newton-Raphson, Levenberg-Marquardt, etc
Function of the hidden layer white: 0 black: 1 ??? (x, y) points on [0, 1] x [0, 1] grid specified output for the grid
Output of hidden layer units unit output for the [0, 1] x [0, 1] grid Combining linear sub-solutions yields a non-linear classifier…
When to stop training? Error not converged over-fitting test set training set Iteration number External validation set required to estimate the accuracy
Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models • LDA: 90.0%± 2.0 [87.0 - 92.8] 0.1 sec • KNN: 95.4% ± 1.0 [92.2 - 97.2] 1.4 sec • ANN: 93.2%± 3.5 [86.4 - 97.7] 316 sec
Support Vector Machines (SVMs) • kernel-based classifier; • transforms input space to a high-dimensional feature space; • exploits Lagrange formalism for the best solution; • binary (two-class) classifier.
A linear separable problem X2 class B class A X1 Goal: to find the optimal separating hyper plane
Optimal hyper plane X2 class B class A X1 no objects are allowed between boundaries, maximisation of distance: unique solution!
Support vectors X2 class B support vectors class A X1
Crossing the borderlines… X2 class B class A X1 solution: penalise these objects
Target Constraints Lagrange equation