Over-parameterisation and model reduction

Over-parameterisation and model reduction Neil Crout Environmental Science School of Biosciences University of Nottingham

Co-workers • Davide Tarsitano • Glen Cox • James Gibbons • Andy Wood • Jim Craigon • Steve Ramsden • Yan Jiao • Tim Reid • with thanks to BBSRC and Leverhulme

Motivation

Over-parameterisation • Increase the complexity of the model to improve agreement with observations • Can continue this until ALL the variation in the data is accounted for • Including the noise! • Models can become overparameterised

Over-parameterisation • Judgements have to be made about the level of complexity in a model (parsimony) • In a statistical context methods for model selection exist • Key feature - comparing models • Key feature - comparing models

So what! • Our models are not statistical • Mechanistic • No fitting involved • Predicting lots of different things (multi-variate) • Have to account for lots of different cases • Did we mention they’re mechanistic • Newton • Thermodynamics • etc all on our side

Overparameterisation and mechanistic models? • Mechanistically inspired – but empirically implemented • So normal statistical rules of business apply • Model parameters are often not formally fitted • worse they may be ‘tuned’ • might be measured (but model structure may not be consistent with real value) • Often (maybe always) models are implicitly fitted • comparisons are made with the real world • model changes • development iterates • But normal rules of business apply – greater complexity may not equal greater realism

Overparameterisation and mechanistic models? • Mechanistic models have the potential for over-parameterisation • This is a source of uncertainty • Perhaps of equal importance • Mechanistic models are cited as tools to test understanding • For this we do need to know whether a model’s formulation really can be supported by observations

Model Reduction – Why? • Judgements have to be made about the level of complexity in a model • Mostly this is subjective • supported by tools such as sensitivity analysis etc • sometimes by consideration of alternative model formulations • but this is generally difficult…. • Want some statistical support for model selection • But that requires a set of alternative models to compare

Model Reduction – Why? • Want to assess whether a model has the most appropriate level of detail • We can get some idea on this by comparing models of the same system which have different levels of detail • But, we don’t have lots of different models • So we consider reducing the model we have to create alternative model formulations which can be compared • Comparison is restricted • reduced models are drawn from the same source • but hopefully better than no comparison at all

Some illustrations

Example: radioacaesium uptake

‘Absalom’ radiocaesium uptake model % clay K+ pH % C θclay RIPclay(2) θhumus Kx soil CEChumus M_camg CECclay Kx humus(2) Kdhumus NH4 Kdclay(1) mK(1) Kdr Kdl CF(2) D factor Cssol Csplant Cssoil

Model Reduction – How? • Start with a ‘full’ model of a system • Reduce it (systematically, automatically) • Producing a ‘set’ of alternative model formulations for the same system • Reduction? • Inter-connected nature of typical models mean can’t simply leave things out • We replace variables with a constant • e.g. the mean or median that the variable attains in the full model

Reduction by variable replacement • Suppose the full model has a variable Vi • Which is a function of the model variables, inputs, and parameters • Arrange so that the model has • Where  is either 0 (no replacement) or 1 (replaced) • Systematically reducing the model then involves setting the values of the vector 

Search the model replacement space • Search the combination of possible models (i.e. varying ) • exhaustive if computationally feasible • stochastic search (e.g. MCMC) can be more efficient • discrete space • Calculate a posterior probability for each model configuration • This is based on comparison with observations • So this is a data-led approach • Data has to be representative for analysis to be representative • Certainly need to be aware of the data’s limitations

10 candidate variables – 210 combinations % clay K+ pH % C θclay RIPclay(2) θhumus Kx soil CEChumus M_camg CECclay Kx humus(2) Kdhumus NH4 Kdclay(1) mK(1) Kdr Kdl CF(2) D factor Cssol Csplant Cssoil

Posterior probability for top 10 models Full Model

Full model (as published) % clay K+ pH % C θclay RIPclay(2) θhumus Kx soil CEChumus M_camg CECclay Kx humus(2) Kdhumus NH4 Kdclay(1) mK(1) Kdr Kdl CF(2) D factor Cssol Csplant Cssoil

this one works better… % clay K+ pH % C θclay RIPclay(2) θhumus Kx soil CEChumus M_camg CECclay Kx humus(2) Kdhumus NH4 Kdclay(1) mK(1) Kdr Kdl CF(2) D factor Cssol Csplant Cssoil

Better? • Reduced uncertainty, better predicting models • Tested with independent data • Requiring fewer inputs • Specifically soil pH • Model is used spatially, so this is a good thing operationally • Greater confidence that the model is not over-parameterised….? • Diagnostic to guide/support model development

Methane Emissions from Wetlands

Methane emission from wetlands

Top ‘performing’ models Full Model

Mechanistic Interpretation • Model diagnostics – present results as probability of replacing specific variables • Summing over the model combinations explored • seasonal transfer of plant matter to soil organic matter (100%) • reducing methane oxidation from mechalis-menten to first order (39%) • temperature dependency of methane oxidation (15%) • seasonal dependency of plant growth (99%).

C & N dynamics in the arctic • N15 applied to field plots • 3 years subsequent measurement • used to parameterise a model based on MBL-GEM

C & N dynamics in the arctic

14 candidate variables (16k combinations)

Ensemble prediction over set of models

Sirius – Wheat Crop Growth Model

Roots Vernalisation EVAPO CANOPY SOIL.AW[1..30] TempMax TempMin exw[1..30] Rootlen Potlfno TTOPT TTFIX TTOUT SLOSL ECUM TTOUT Amnlfno Soilmax aw[1..30 AVWATER WATCON SHOUT CanTemp PrimordNo Vprog TTROOT SoilMin Def Leafnumber HSLOP(Tmean) TTCAN LAIStage PHENO TTOUT DrFacLAI Rootlength Areamax PTSOIL RN Rad PTAY WINTER Leafarea Alpha Reduc CROPUP PENMAN TAU GAKLIR Areaopt Lastleaf FleafNo SF Anthesis PotTrans Deadleaves Maxgaklir leafarea IPHASE Anthesis TTOUT Transp AreaANTH BIOANTH EVSOIL GRAIN Therm Biomass Reduc EVAP ENAV Grainstart CropN CONDUC Hcrop BiomassBGF BiomassBGF Demand Hsoil Grainend BIOANTH TCMax TTOUT TCmin Biomass TDeepsoil DrFacLai Inputs SF DrFacgrowth Soilmin GrainDM Temp Min Temp Max GrainNoverGrainDm tadj Tmean Soilmax LeafNc TTOUT TDeepsoil MinGrainDemand Temp Max Temp Max Water balance Rain GrainDemand UptakeN AvN VARIETY Si Soil leafarea Radiation AreaANTH leafarea Dry matter SoilN DeadLeafN GrainN PoolN Lastleaf EarWt Therm RUE ExStemNc StemNc ExLeafNc Anthesis SF Biomass Tau PAR drfacgrowth rad CropN Equilibrium leafarea uw[1..30] exw[1..30] aw[1..30] Naw[1..30] Nex[1..30] biomass dn stembiomass MaxStemdemand AvN Rootlen Naw[1..30] Nuw[1..30] Nex[1..30] Soil moisture SF Percolation StemN N’Pulses anforP Wateruptake Rain+Irr exw[1] SumP Fert minstemDemand UptakeN Rootlen Norganic Tmean aw[1] Q Nf Q exw[1..30] StemNc AW[1..30] Waterbalance.Si Ta Nm Ts Ni aw[1..30] Ndp uw[1] CropN si Naw[1] Qwp LeafNc WP(Pottrans) RZexw uw[1..8] Soil evap aw[1..30] Qr x[1..8] WP (Rain +irrigation) x Leafdemand StemExN si aw[1..8] Qwp Qfc Q EVAPO.evsoil exw[1..30] x AW[1] Qfc TransP EVAPO.pottrans leafarea leafarea Nuw[1..8] Naw[1..8] aw[1..5] SOIL.EXW[1..30] exw[1..5] evsoil Nex[1..30] SF Results

Top 75% of the distribution/ensemble

Interpretation.... Vernalisation 99% N-mineralisation Temp/moisture adjustment~99% Crop N physiology 50% Canopy Temperature 96% Nitrogen Leaching 50% Diurnal adjustment of thermal time 99% 40 other variables <<50%

Other applications… • Applying the same/similar approach to other models, e.g. • Marine Ecology Models (carbon cycling) • JULES – land surface scheme for GCM • FARM-ADAPT – very large farm management model • Linear programming based model • V. large number of variables (10 000s) • More challenging

Some conclusions • Whether a model’s formulation is appropriate is uncertain • Model reduction provides a way to test (challenge) model formulation (perhaps a brutal test) • May give us some insurance against over-parameterisation (reduce uncertainty) • Does test the understanding built into our models • All models investigated so far can be reduced by variable replacement • Most reduced models have some advantage

Useful references? • NMJ Crout, D Tarsitano, AT Wood. Is my model too complex? Evaluating model formulation using model reduction.Environmental Modelling & Software, in press. • JM Gibbons, GM Cox, ATA Wood, J Craigon, SJ Ramsden, D Tarsitano, NMJ Crout (2008). Applying Bayesian Model Averaging to mechanistic models: an example and comparison of methods. Environmental Modelling & Software, 23:973-985. • Cox GM, Gibbons JM, Wood ATA, Craigon J, Ramsden SJ, Crout NMJ (2006). Towards the systematic simplification of mechanistic models. Ecological Modelling 198:240-246. • Bernhardt, K. 2008. Finding alternatives and reduced formulations for process based models. Evolutionary Computation 16:1-16. Alternative approach. • Asgharbeygi, N., Langley, P., Bay, S., Arrigo, 2006. Inductive revision of quantitative process models. Ecol. Mod., 194:70-79. Alternative approach to model reduction, albeit without a statistical framework • Brooks RJ, Tobias AM (1996) Choosing the best model: level of detail, complexity, & model... Mathl. Comput. Mod. 24:1-14. Interesting article on model utility • Anderson, T.R., 2005. Plankton functional type modelling: running before we can walk? J. Plankton Research 27:1073-1081. Nice discussion of model complexity in context of marine systems. Quite controversial, replies to the replies to the replies. • Jakeman, A.J, Letcher, R.A., Norton, J.P., 2006 Ten iterative steps in development and evaluation of environmental models. Environmental Modelling & Software 21, 602-614. Very nice discussion on what is good practice in model development • Myung, J., Pitt, M. A., 2002. When a good fit can be bad. Trends. Cogn. Sci., 6:421-425. Good introduction to model selection criteria • Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretical Approach. 2nd ed. New York: Springer-Verlag. Ultimate reference, but not without critics.

Over-parameterisation and model reduction

Over-parameterisation and model reduction

Presentation Transcript

Course on Model Order Reduction

Model Order Reduction

Introduction to Model Order Reduction

Decision model for behavior reduction

Introduction to Model Order Reduction

Course on Model Order Reduction

Model Order Reduction using POD

Data quality and model parameterisation

Model Reduction for Parameter Estimation

Parameterisation of Urban Sprawl

Parameterisation of particle fluxes

FLake - A Lake Parameterisation Scheme for the COSMO Model: Implementation and Testing

Scalable Symbolic Model Order Reduction

Model Order Reduction

Introduction to Model Order Reduction

Introduction to Model Order Reduction

Introduction to Model Order Reduction

Introduction to Model Order Reduction

Status of shower parameterisation

On Model Parameterisation

Data quality and model parameterisation

Model Pressure Reduction