360 likes | 436 Views
Over-parameterisation and model reduction. Neil Crout Environmental Science School of Biosciences University of Nottingham. Co-workers. Davide Tarsitano Glen Cox James Gibbons Andy Wood Jim Craigon Steve Ramsden Yan Jiao Tim Reid with thanks to BBSRC and Leverhulme. Motivation.
E N D
Over-parameterisation and model reduction Neil Crout Environmental Science School of Biosciences University of Nottingham
Co-workers • Davide Tarsitano • Glen Cox • James Gibbons • Andy Wood • Jim Craigon • Steve Ramsden • Yan Jiao • Tim Reid • with thanks to BBSRC and Leverhulme
Over-parameterisation • Increase the complexity of the model to improve agreement with observations • Can continue this until ALL the variation in the data is accounted for • Including the noise! • Models can become overparameterised
Over-parameterisation • Judgements have to be made about the level of complexity in a model (parsimony) • In a statistical context methods for model selection exist • Key feature - comparing models • Key feature - comparing models
So what! • Our models are not statistical • Mechanistic • No fitting involved • Predicting lots of different things (multi-variate) • Have to account for lots of different cases • Did we mention they’re mechanistic • Newton • Thermodynamics • etc all on our side
Overparameterisation and mechanistic models? • Mechanistically inspired – but empirically implemented • So normal statistical rules of business apply • Model parameters are often not formally fitted • worse they may be ‘tuned’ • might be measured (but model structure may not be consistent with real value) • Often (maybe always) models are implicitly fitted • comparisons are made with the real world • model changes • development iterates • But normal rules of business apply – greater complexity may not equal greater realism
Overparameterisation and mechanistic models? • Mechanistic models have the potential for over-parameterisation • This is a source of uncertainty • Perhaps of equal importance • Mechanistic models are cited as tools to test understanding • For this we do need to know whether a model’s formulation really can be supported by observations
Model Reduction – Why? • Judgements have to be made about the level of complexity in a model • Mostly this is subjective • supported by tools such as sensitivity analysis etc • sometimes by consideration of alternative model formulations • but this is generally difficult…. • Want some statistical support for model selection • But that requires a set of alternative models to compare
Model Reduction – Why? • Want to assess whether a model has the most appropriate level of detail • We can get some idea on this by comparing models of the same system which have different levels of detail • But, we don’t have lots of different models • So we consider reducing the model we have to create alternative model formulations which can be compared • Comparison is restricted • reduced models are drawn from the same source • but hopefully better than no comparison at all
‘Absalom’ radiocaesium uptake model % clay K+ pH % C θclay RIPclay(2) θhumus Kx soil CEChumus M_camg CECclay Kx humus(2) Kdhumus NH4 Kdclay(1) mK(1) Kdr Kdl CF(2) D factor Cssol Csplant Cssoil
Model Reduction – How? • Start with a ‘full’ model of a system • Reduce it (systematically, automatically) • Producing a ‘set’ of alternative model formulations for the same system • Reduction? • Inter-connected nature of typical models mean can’t simply leave things out • We replace variables with a constant • e.g. the mean or median that the variable attains in the full model
Reduction by variable replacement • Suppose the full model has a variable Vi • Which is a function of the model variables, inputs, and parameters • Arrange so that the model has • Where is either 0 (no replacement) or 1 (replaced) • Systematically reducing the model then involves setting the values of the vector
Search the model replacement space • Search the combination of possible models (i.e. varying ) • exhaustive if computationally feasible • stochastic search (e.g. MCMC) can be more efficient • discrete space • Calculate a posterior probability for each model configuration • This is based on comparison with observations • So this is a data-led approach • Data has to be representative for analysis to be representative • Certainly need to be aware of the data’s limitations
10 candidate variables – 210 combinations % clay K+ pH % C θclay RIPclay(2) θhumus Kx soil CEChumus M_camg CECclay Kx humus(2) Kdhumus NH4 Kdclay(1) mK(1) Kdr Kdl CF(2) D factor Cssol Csplant Cssoil
Posterior probability for top 10 models Full Model
Full model (as published) % clay K+ pH % C θclay RIPclay(2) θhumus Kx soil CEChumus M_camg CECclay Kx humus(2) Kdhumus NH4 Kdclay(1) mK(1) Kdr Kdl CF(2) D factor Cssol Csplant Cssoil
this one works better… % clay K+ pH % C θclay RIPclay(2) θhumus Kx soil CEChumus M_camg CECclay Kx humus(2) Kdhumus NH4 Kdclay(1) mK(1) Kdr Kdl CF(2) D factor Cssol Csplant Cssoil
Better? • Reduced uncertainty, better predicting models • Tested with independent data • Requiring fewer inputs • Specifically soil pH • Model is used spatially, so this is a good thing operationally • Greater confidence that the model is not over-parameterised….? • Diagnostic to guide/support model development
Top ‘performing’ models Full Model
Mechanistic Interpretation • Model diagnostics – present results as probability of replacing specific variables • Summing over the model combinations explored • seasonal transfer of plant matter to soil organic matter (100%) • reducing methane oxidation from mechalis-menten to first order (39%) • temperature dependency of methane oxidation (15%) • seasonal dependency of plant growth (99%).
C & N dynamics in the arctic • N15 applied to field plots • 3 years subsequent measurement • used to parameterise a model based on MBL-GEM
Roots Vernalisation EVAPO CANOPY SOIL.AW[1..30] TempMax TempMin exw[1..30] Rootlen Potlfno TTOPT TTFIX TTOUT SLOSL ECUM TTOUT Amnlfno Soilmax aw[1..30 AVWATER WATCON SHOUT CanTemp PrimordNo Vprog TTROOT SoilMin Def Leafnumber HSLOP(Tmean) TTCAN LAIStage PHENO TTOUT DrFacLAI Rootlength Areamax PTSOIL RN Rad PTAY WINTER Leafarea Alpha Reduc CROPUP PENMAN TAU GAKLIR Areaopt Lastleaf FleafNo SF Anthesis PotTrans Deadleaves Maxgaklir leafarea IPHASE Anthesis TTOUT Transp AreaANTH BIOANTH EVSOIL GRAIN Therm Biomass Reduc EVAP ENAV Grainstart CropN CONDUC Hcrop BiomassBGF BiomassBGF Demand Hsoil Grainend BIOANTH TCMax TTOUT TCmin Biomass TDeepsoil DrFacLai Inputs SF DrFacgrowth Soilmin GrainDM Temp Min Temp Max GrainNoverGrainDm tadj Tmean Soilmax LeafNc TTOUT TDeepsoil MinGrainDemand Temp Max Temp Max Water balance Rain GrainDemand UptakeN AvN VARIETY Si Soil leafarea Radiation AreaANTH leafarea Dry matter SoilN DeadLeafN GrainN PoolN Lastleaf EarWt Therm RUE ExStemNc StemNc ExLeafNc Anthesis SF Biomass Tau PAR drfacgrowth rad CropN Equilibrium leafarea uw[1..30] exw[1..30] aw[1..30] Naw[1..30] Nex[1..30] biomass dn stembiomass MaxStemdemand AvN Rootlen Naw[1..30] Nuw[1..30] Nex[1..30] Soil moisture SF Percolation StemN N’Pulses anforP Wateruptake Rain+Irr exw[1] SumP Fert minstemDemand UptakeN Rootlen Norganic Tmean aw[1] Q Nf Q exw[1..30] StemNc AW[1..30] Waterbalance.Si Ta Nm Ts Ni aw[1..30] Ndp uw[1] CropN si Naw[1] Qwp LeafNc WP(Pottrans) RZexw uw[1..8] Soil evap aw[1..30] Qr x[1..8] WP (Rain +irrigation) x Leafdemand StemExN si aw[1..8] Qwp Qfc Q EVAPO.evsoil exw[1..30] x AW[1] Qfc TransP EVAPO.pottrans leafarea leafarea Nuw[1..8] Naw[1..8] aw[1..5] SOIL.EXW[1..30] exw[1..5] evsoil Nex[1..30] SF Results
Interpretation.... Vernalisation 99% N-mineralisation Temp/moisture adjustment~99% Crop N physiology 50% Canopy Temperature 96% Nitrogen Leaching 50% Diurnal adjustment of thermal time 99% 40 other variables <<50%
Other applications… • Applying the same/similar approach to other models, e.g. • Marine Ecology Models (carbon cycling) • JULES – land surface scheme for GCM • FARM-ADAPT – very large farm management model • Linear programming based model • V. large number of variables (10 000s) • More challenging
Some conclusions • Whether a model’s formulation is appropriate is uncertain • Model reduction provides a way to test (challenge) model formulation (perhaps a brutal test) • May give us some insurance against over-parameterisation (reduce uncertainty) • Does test the understanding built into our models • All models investigated so far can be reduced by variable replacement • Most reduced models have some advantage
Useful references? • NMJ Crout, D Tarsitano, AT Wood. Is my model too complex? Evaluating model formulation using model reduction.Environmental Modelling & Software, in press. • JM Gibbons, GM Cox, ATA Wood, J Craigon, SJ Ramsden, D Tarsitano, NMJ Crout (2008). Applying Bayesian Model Averaging to mechanistic models: an example and comparison of methods. Environmental Modelling & Software, 23:973-985. • Cox GM, Gibbons JM, Wood ATA, Craigon J, Ramsden SJ, Crout NMJ (2006). Towards the systematic simplification of mechanistic models. Ecological Modelling 198:240-246. • Bernhardt, K. 2008. Finding alternatives and reduced formulations for process based models. Evolutionary Computation 16:1-16. Alternative approach. • Asgharbeygi, N., Langley, P., Bay, S., Arrigo, 2006. Inductive revision of quantitative process models. Ecol. Mod., 194:70-79. Alternative approach to model reduction, albeit without a statistical framework • Brooks RJ, Tobias AM (1996) Choosing the best model: level of detail, complexity, & model... Mathl. Comput. Mod. 24:1-14. Interesting article on model utility • Anderson, T.R., 2005. Plankton functional type modelling: running before we can walk? J. Plankton Research 27:1073-1081. Nice discussion of model complexity in context of marine systems. Quite controversial, replies to the replies to the replies. • Jakeman, A.J, Letcher, R.A., Norton, J.P., 2006 Ten iterative steps in development and evaluation of environmental models. Environmental Modelling & Software 21, 602-614. Very nice discussion on what is good practice in model development • Myung, J., Pitt, M. A., 2002. When a good fit can be bad. Trends. Cogn. Sci., 6:421-425. Good introduction to model selection criteria • Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretical Approach. 2nd ed. New York: Springer-Verlag. Ultimate reference, but not without critics.