Model Calibration and Validation

Model Calibration and Validation Dr. Dawei HAN Department of Civil Engineering University of Bristol, UK

A mathematical model is used to represent the real system

‘All models are wrong, some are useful’ ARMA model George Box 1919- Princeton University (graduated from University College of London)

White box model --- Black Box models all necessary information available no a priori information Calibration and validation needed (Glass box)

Deterministic model --- Stochastic model The same output The same input The same input The different output Randomness (pdf for input, parameters, output)

All computer models are deterministic The same input The same output

Ensemble simulation The output The input as pdf Uncertainty in models and their parameters Randomness 1) For each model (pdf for input, parameters, output) 2) Committee models (combining models)

Ensemble weather simulation Does it represent the real probability distribution?

Climate models (which one to trust?)

Is the natural system really stochastic? Bohr Einstein Germany, USA Danish physicist 1879-1955 1885 - 1962 Nobel Prize (1922) Nobel Prize (1921)

Quantum Mechanics Bohr Einstein “Einstein, stop telling God what to do.” “God does not play dice” http://www.aip.org/history/einstein/ae63.htm

Coin tossing: random?

“There is nothing random about this world" --- Prof. Diaconis Stanford university http://www-stat.stanford.edu/~cgates/PERSI/cv.html

Hydrological systems appear as stochastic, due to insufficient information to us?

Hydraulic modelling in a data-rich world Professor Paul Bates University of Bristol So, more information is available (e.g., remote sensing) http://www.ggy.bris.ac.uk/staff/staff_bates.html

Not all information is useful. Useful information? Matlab user guide: Fuzzy toolbox

Questions for a modeller How complicated the model should be? What input data should be used? How long the records should be used for model development?

How complicated the model should be?

The data Model too simple (underfitting) Model too complicated (overfitting)

A suitable model

Occam's razor(Ockham's razor) One should not increase, beyond what is necessary, the number of entities required to explain anything. William of Ockham 1288-1348 Ockham village, Surry, England

"Make everything as simple as possible, but not simpler." Einstein

Model selection method Cross validation Akaike information criterion Bayesian information criterion …

Model calibration (training, learning) ? to predict future data drawn from the same distribution http://www.cs.cmu.edu/~awm/

Holdout validation 1) Randomly choose 30% of the data as a test set 2) The remainder is a training set 3) Perform regression on the training set 4) Estimate future performance with the test set http://www.cs.cmu.edu/~awm/

Model parameter estimation (fitting to the data) Least squares method Maximum likelihood Maximum a posteriori Nonlinear optimisation Genetic algorithms …

Estimate future performance with the test set Linear regression Mean Squared Error = 2.4 http://www.cs.cmu.edu/~awm/

Estimate future performance with the test set Quadratic regression Mean Squared Error = 0.9 http://www.cs.cmu.edu/~awm/

Estimate future performance with the test set Join the dots Mean Squared Error = 2.2 http://www.cs.cmu.edu/~awm/

The test set method Positive: •Very simple Negative: •Wastes data: 30% less data for model calibration •If you don’t have much data, the test-set might just be lucky or unlucky

Cross Validation Repeated partitioning a sample of data into subsets: training and testing Seymour Geisser 1929-2004 University of Minnesota http://en.wikipedia.org/wiki/Seymour_Geisser

Leave-one-out Cross Validation Mean Squared Error of 9 sets = 2.2 (single test 2.4)

Leave-one-out Cross Validation Mean Squared Error of 9 sets = 0.962 (single test 0.9 )

Leave-one-out Cross Validation Mean Squared Error of 9 sets = 3.33 (single test 2.2 )

Leave-one-out Cross Validation Positive: •only waste one data point Negative: • More computation • One test point might be too small

k-fold Cross Validation K=3 Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions coloured Red Green and Blue)

3-fold Cross Validation For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points. Ditto with other 2 colours, use the mean error of the three sets

3-fold Cross Validation

Other model selection method Akaike information criterion Bayesian information criterion … AIC and BIC: only need the training error

Model data input selection method The Gamma test (model free) Information theory (model free) Cross validation …

Four example data sources Sine Line Logistic function Mackey-Glass

From the measured data

Fit models to the data (cross validation) Underfitting Overfitting

The Gamma Test It estimates what proportion of the variance of the target value is caused by the unknown function and what proportion is caused by the random variable. G is an estimate for the noise variance relative to the best possible model results.

500 points generated from the function with added noise of 0.075 The Gamma estimated noise variance is 0.073

If G is small, the output value is largely determined by the input variables. • If G is large, • Some important input variables are missing • Too much measurement noise • Data record is too short • Gaps in the data record

G Archive http://users.cs.cf.ac.uk/Antonia.J.Jones/GammaArchive/IndexPage.htm Gamma Test, Computer Science, Cardiff University Prof. Antonia Jones

HYDROLOGICAL PROCESSES 2008

Information Theory "A Mathematical Theory of Communication.“ (1948) Shannon MIT 1916-2001

Information entropy a measure of the uncertainty associated with a random variable.

Model Calibration and Validation