1 / 66

Kazuya Akimoto

Kazuya Akimoto. Piet Mondriaan. Salvador Dalí. Radboud University Nijmegen. Non-parametric non-linear classifiers Willem Melssen W.Melssen@science.ru.nl Institute for Molecules and Materials Analytical Chemistry & Chemometrics www.cac.science.ru.nl. Non-parametric non-linear classifiers.

diem
Download Presentation

Kazuya Akimoto

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kazuya Akimoto Piet Mondriaan

  2. Salvador Dalí

  3. Radboud University Nijmegen Non-parametric non-linear classifiers Willem Melssen W.Melssen@science.ru.nl Institute for Molecules and Materials Analytical Chemistry & Chemometrics www.cac.science.ru.nl

  4. Non-parametric non-linear classifiers • no assumptions regarding - mean - variance / covariance - normality of the distribution of the input data • non-linear relationship between input data and the corresponding output (class membership) • supervised techniques (input and output based)

  5. Parametric and linear… equal (co-)variance LDA

  6. Parametric and linear… linear separable classes LDA

  7. …versus non-parametric, non-linear LDA ???

  8. Some powerful classifiers • K Nearest Neighbours; • Artificial Neural Networks; • Support Vector Machines.

  9. K Nearest Neighbours (KNN) • non-parametric classifier; (no assumptions regarding normality) • similarity based; (Euclidean distance, 1 - correlation) • matching to a set of classified objects. (decision based on consensus criterion)

  10. KNN modelling procedure • use appropriate scaling of the selected training set; • select a similarity measure (Euclidean distance); • set the number of neighbours (K); • construct similarity matrix for a new object (unknown class) and the objects in the training set; • rank all similarity values in ascending order; • generate the class membership list; • consensus criterion determines the class; (e.g., the majority takes all) • validation of K value (cross-validation, test set)

  11. Select a representative training set X2 X1

  12. Label the data points (supervised) X2 X1 class A class B

  13. Classify a new object X2 X1 class A class B

  14. One neighbour: K = 1 X2 X1 class B class A class B

  15. K = 3 X2 X1 class A class A class B

  16. K = 2 X2 X1 class A class A or B: undecided class B

  17. K = 11 X2 X1 5 A’s and 6 B’s: confidence? class A class B

  18. Classification of brain tumours • Collaboration with the department of radiology UMCN, Nijmegen; EC project eTumour • Magnetic resonance imaging • Voxel-wise in-vivo NMR spectroscopy • Goal of the project: determination of type and grading of various brain tumours

  19. T1 weighted T2 weighted proton density gadolinium Magnetic Resonance Imaging ventricles (CSF) tumour grey+white matter skull

  20. Construction of data set

  21. Image variables Quantitated values MRI combined with MRS

  22. Average spectrum per tissue type PC2 (19.5%) PC1 (42.2%)

  23. Results 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models • LDA: 90.0%± 2.0 [87.0 - 92.8] 0.1 sec • KNN: 95.4% ± 1.0 [92.2 - 97.2] 1.4 sec

  24. Artificial Neural Networks (ANN) • non-parametric, non-linear, adaptive; • weights trained by an iterative learning procedure;

  25. ANN architecture

  26. ANN architecture

  27. Neuron or ‘unit’ summation (net) transfer function f(net) weighted input (dendrites, synapses) neuron (soma) distribution of output (axon)

  28. Transfer functions exponential linear compressive

  29. An easy one: the ‘and’ problem decision line

  30. Two layer network (perceptron) sign(x1*w1 + x2*w2 – t) < 0 : class 0 sign(x1*w1 + x2*w2 – t) > 0 : class 1 Hey, this looks like LDA…

  31. Logical ‘exclusive-or’ problem

  32. No single decision line possible…

  33. … but two lines will do

  34. Multi-layer feed-forward ANN

  35. Upper decision line

  36. Lower decision line

  37. Solution

  38. How to get the weights: by learning • set network parameters (learning rate, number of hidden layers / units, transfer functions, etc); • initialise network weights randomly; • present an object; • calculate the ANN output; • adapt network weights to minimise output error; • repeat 3 – 5 for all training objects; • iterate until network converges / stop criterion; • evaluate network performance by an independent test set or by a cross-validation procedure.

  39. Adapting the weights global minimum local minimum • adapt weights to minimise the output error E • weight changes controlled by the learning rate • error back propagation (from output to input layer) • Newton-Raphson, Levenberg-Marquardt, etc

  40. Function of the hidden layer white: 0 black: 1 ??? (x, y) points on [0, 1] x [0, 1] grid specified output for the grid

  41. Output of hidden layer units unit output for the [0, 1] x [0, 1] grid Combining linear sub-solutions yields a non-linear classifier…

  42. When to stop training? Error not converged over-fitting test set training set Iteration number External validation set required to estimate the accuracy

  43. Many solutions possible: not unique

  44. Classification of brain tumours 10 random divisions of the data in a balanced way training set (2/3), test set (1/3): 10 different models • LDA: 90.0%± 2.0 [87.0 - 92.8] 0.1 sec • KNN: 95.4% ± 1.0 [92.2 - 97.2] 1.4 sec • ANN: 93.2%± 3.5 [86.4 - 97.7] 316 sec

  45. Support Vector Machines (SVMs) • kernel-based classifier; • transforms input space to a high-dimensional feature space; • exploits Lagrange formalism for the best solution; • binary (two-class) classifier.

  46. A linear separable problem X2 class B class A X1 Goal: to find the optimal separating hyper plane

  47. Optimal hyper plane X2 class B class A X1 no objects are allowed between boundaries, maximisation of distance: unique solution!

  48. Support vectors X2 class B support vectors class A X1

  49. Crossing the borderlines… X2 class B class A X1 solution: penalise these objects

  50. Target Constraints Lagrange equation

More Related