260 likes | 376 Views
Optimizing number of hidden neurons in neural networks. IASTED International Conference on Artificial Intelligence and Applications Innsbruck, Austria Feb, 2007. Janusz A. Starzyk School of Electrical Engineering and Computer Science Ohio University Athens Ohio U.S.A. Outline.
E N D
Optimizing number of hidden neurons in neural networks IASTED International Conference on Artificial Intelligence and Applications Innsbruck, Austria Feb, 2007 Janusz A. Starzyk School of Electrical Engineering and Computer Science Ohio University Athens Ohio U.S.A
Outline • Neural networks – multi-layer perceptron • Overfitting problem • Signal-to-noise ratio figure (SNRF) • Optimization using signal-to-noise ratio figure • Experimental results • Conclusions
Inputs x Outputs z Neural networks– multi-layer perceptron (MLP)
outputs inputs MLP Neural networks– multi-layer perceptron (MLP) • Efficient mapping from inputs to outputs • Powerful universal function approximation • Number of inputs and outputs determined by the data • Number of hidden neurons: determines the fitting accuracy critical
MLP training Training data (x, y) Model new data (x’) Model y’ Overfitting problem • Generalization: • Overfitting: overestimates the function complexity, degrades generalization capability • Bias/variance dilemma • Excessive hidden neuron overfitting
Overfitting problem • Avoid overfitting: cross-validation & early stopping training data (x, y) Training error etrain MLP training All available training data (x, y) testing data (x’, y’) MLP testing Testing error etest Fitting error etest Stopping criterion: etest starts to increase or etrainand eteststart to diverge etrain Number of hidden neurons Optimum number
Overfitting problem • How to divide available data? • When to stop? Fitting error training data (x, y) All available training data (x, y) etest testing data (x’, y’) etrain data wasted Number of hidden neurons Optimum number • Can test error catch the generalization error?
Overfitting problem • Desired: • Quantitative measure of unlearned useful information from etrain • Automatic recognition of overfitting
Signal-to-noise ratio figure (SNRF) • Sampled data: function value + noise • Error signal: approximation error component + noise component Useful signal Should be reduced Noise part Should not be learned • Assumption: continuous function & WGN as noise • Signal-to-noise ratio figure (SNRF): signal energy/noise energy • Compare SNRFe and SNRFWGN Learning should stop – ? If there is useful signal left unlearned If noise dominates in the error signal
Error signal Training data and approximating function Signal-to-noise ratio figure (SNRF) – one-dimensional case How to measure the level of these two components? noise component approximation error component +
Signal-to-noise ratio figure (SNRF) – one-dimensional case High correlation between neighboring samples of signals
Signal-to-noise ratio figure (SNRF) – one-dimensional case Hypothesis test: 5% significance level
Signal-to-noise ratio figure (SNRF) – multi-dimensional case • Signal and noise level: estimated within neighborhood sample p M neighbors
Signal-to-noise ratio figure (SNRF) – multi-dimensional case All samples
Signal-to-noise ratio figure (SNRF) – multi-dimensional case M=1 threshold multi-dimensional (M=1)≈ threshold one-dimensional
Optimization using SNRF • SNRFe< threshold SNRFWGN • Start with small network • Train the MLP etrain • Compare SNRFe & SNRFWGN • Add hidden neurons Noise dominates in the error signal, Little information left unlearned, Learning should stop Stopping criterion: SNRFe< threshold SNRFWGN
Optimization using SNRF • Set the structure of MLP • Train the MLP with back-propagation iteration etrain • Compare SNRFe & SNRFWGN • Keep training with more iterations Applied in optimizing number of iterations in back-propagation training to avoid overfitting (overtraining)
Experimental results • Optimizing number of iterations noise-corrupted 0.4sinx+0.5
Optimization using SNRF • Optimizing order of polynomial
Experimental results • Optimizing number of hidden neurons two-dimensional function
Experimental results • Mackey-glass database Every consecutive 7 samples the following sample MLP
Experimental results WGN characteristic
Experimental results • Puma robot arm dynamics database 8 inputs (positions, velocities, torques) angular acceleration MLP
Conclusions • Quantitative criterion based on SNRF to optimize number of hidden neurons in MLP • Detect overfitting by training error only • No separate test set required • Criterion: simple, easy to apply, efficient and effective • Optimization of other parameters of neural networks or fitting problems