Dr. Gari D. Clifford, University Lecturer & Associate Director,

Information Driven Healthcare:Data Visualization & Classification Lecture 6: Neural Networks (continued): Training, Stopping, Validation and Testing Centre for Doctoral Training in Healthcare Innovation Dr. Gari D. Clifford, University Lecturer & Associate Director, Centre for Doctoral Training in Healthcare Innovation, Institute of Biomedical Engineering, University of Oxford

ANN training continued • Pseudo code for training and reporting classification performance: • 1. Partition data into training, validation & test sets • 2. for(j=1; j<= jmax; j++){ • for(run=1; run <=10; run++){ • intialise weights; • train MLP until stopping criterion reached (min(eval) error in validation, not train, set); • save min(eval); • } • } • 3. Select optimal network on basis of lowest eval • 4. Test optimal network on test data and report results.

How to terminate training … • Your cost function doesn’t tell you when to end training …

Use validation set to avoid overfitting • At some point the error on your training set will continue to drop, but the error, , on an independent (validation) set will start to rise … • Now you are overfitting on the training data • The learned function fits very closely the training data but it does not generalise well, - it cannot model sufficiently well unseen data for the same task 

Warning: local minima and overtraining • Training • Testing Best Test Score (0.695) Accuracy Random ‘Coin Toss’ (0.5) Training Epochs

Examples of over-fitting • Imagine a regression problem: y=f(x) + noise

Examples of over-fitting • Three type of fit to the same data: Piecewise linear nonparametric regression Linear regression Quadratic regression

Which is the ‘best’ fit? • Why not choose the technique which most closely fits the data? • Well – you need to answer the question: “How well are you going to predict future data drawn from the same distribution?”

The train / test approach • The test set method:

The train / test approach • Good news: • Very simple • Can then simply choose the method with the best test set score • Bad News: • Wastes data: we get an estimate of the best method to apply to 30% less data • Test set estimator of performance has high variance – i.e. if we don’t have enough data, our test set might be unrepresentative

Cross Validation • We can improve this by dropping out some of the data at random, and repeating. Averaging reduces variance … • LOOCV - Leave One Out Cross Validation:

LOOCV • Leave One Out Cross Validation for linear regression:

LOOCV • Leave One Out Cross Validation for quadratic regression:

LOOCV • Leave One Out Cross Validation for piecewise linear regression:

K-Fold CV – Linear Regression

K-Fold CV – Quadratic Regression

K-Fold CV – Piecewise Linear Regression

Which scheme to use?

Let’s look at the ANN train/test procedure again • Pseudo code for training and reporting classification performance: • 1. Partition data into training, validation & test sets • 2. for(j=1; j<= jmax; j++){ • for(run=1; run <=10; run++){ • intialise weights; • train MLP until stopping criterion reached (min( (val)) – error in validation, not train, set); • save min( (val)); • } • } • 3. Select optimal network on basis of lowest  (val) • 4. Test optimal network on test data and report results.

Pruning network nodes • Step 3 is used to identify the optimal I-J-K configuration on the basis of the lowest values of  (val) • Network nodes can be pruned during this stage • Also – an outer loop can be added to create different partitions in the data for each loop = cross validation!

E.g. k=5 on ICU data

Common problems in training and testing • Poor training performance • Incorrect choice of problem – no/weak relationship between input and output • Wrong set of features – incorrect pre-filtering applied • Stuck units (local minima) – initialization problem or normalization problem • Poor generalization performance • Insufficient number of training patterns – you only learn the patterns, not the relationship between the patterns and the class • Over-fitting – Model order/architecture problem – also learns noise • Over-training – Model order is correct, but eventually the noise is learned • Test examples of one class consistently wrong – Unbalanced database • Attempting to extrapolate rather than interpolate – Network is trained on data under one set of conditions (or for one population) and used to predict on another population that exhibits a different set of conditions

Now to the lab …. www.devbio.uga.edu/gallery/index.html

Acknowledgements • Overfitting, Cross-validation and bootstrapping slides adapted from notes by Andrew W. Moore, School of Computer Science Carnegie Mellon University: www.cs.cmu.edu/~awm including “Cross-validation for detecting and preventing overfitting” - http://www.autonlab.org/tutorials/index.html

Dr. Gari D. Clifford, University Lecturer & Associate Director,