Process-based modelling of vegetations and uncertainty quantification

Process-based modelling of vegetations and uncertainty quantification Marcel van Oijen (CEH-Edinburgh) Statistics for Environmental Evaluation Glasgow, 2010-09-01

Contents • Process-based modelling • The Bayesian approach • Bayesian Calibration (BC) of process-based models • Bayesian Model Comparison (BMC) • Examples of BC & BMC in other sciences • The future of BC & BMC? • References, Summary, Discussion

1. Process-based modelling

1.1 Ecosystem PBMs simulate biogeochemistry Atmosphere N C H2O Tree N H2O H2O C N Soil N H2O C Subsoil

1.2 I/O of PBMs Atmospheric drivers Parameters & initial constants vegetation Parameters & initial constants soil Simulation of time series of plant and soil variables Management & land use Output Input Model

1.3 I/O of empirical models Y = P1 + P2 * t Two parameters: P1 = slope P2 = intercept Output Input Model

1.4 Environmental evaluation: increasing use of PBMs C-sequestration (model output for 1920-2000) Uncertainty of C-sequestration

1.5 Forest models and uncertainty Model [Levy et al, 2004]

1.6 Forest models and uncertainty bgc century hybrid NdepUE (kg C kg-1 N) [Levy et al, 2004]

1.7 Many models! Status: 680 models (21.05.10) http://ecobas.org/www-server/index.html Search models (by free-text-search) Result of query : List of words : soil, carbon 96 models found logical operator: and type of search: word ANIMO: Agricultural NItrogenMOdel BETHY: Biosphere Energy-Transfer Hydrology scheme BIOMASS: Forest canopy carbon and water balance model BIOME-BGC: Biome model - BioGeochemical Cycles BIOME3: Biome model BLUEGRAMA: BLUE GRAMA CANDY: Carbon and Nitrogen Dynamics in soils CARBON: Wageningen Carbon Cycle Model CARBON_IN_SOILS: TURNOVER OF CARBON IN SOIL CARDYN: CARbonDYNamics CASA: Carnegie-Ames-Stanford Approach (CASA) Biosphere model CENTURY: grassland and agroecosystem dynamics model CERES_CANOLA : CERES-Canola 3.0 CHEMRANK: Interactive Model for Ranking the Potential of Organic Chemicals to Contaminte Groundwater COUPMODEL: Coupled heat and mass transfer model for soil-plant-atmosphere system (…)

1.8 Reality check ! In every study using systems analysis and simulation: Model parameters, inputs and structure are uncertain How reliable are these model studies: • Sufficient data for model parameterization? • Sufficient data for model input? • Would other models have given different results? How to deal with uncertainties optimally?

2. The Bayesian approach

Probability Theory Uncertainties are everywhere: Models (environmental inputs, parameters, structure), Data Uncertainties can be expressed as probability distributions (pdf’s) We need methods that: • Quantify all uncertainties • Show how to reduce them • Efficiently transfer information: data  models  model application Calculating with uncertainties (pdf’s) = Probability Theory

The Bayesian approach: reasoning using probability theory “ ”

2.1 Dealing with uncertainty: Medical diagnostics P(pos|dis) = 0.99 P(pos|hlth) = 0.01 Bayes’ Theorem P(dis|pos) = P(pos|dis) P(dis) / P(pos) P(dis) = 0.01 A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable • Test result is positive (bad news!) • What is P(diseased|test positive)? • 0.50 • 0.98 • 0.99

2.1 Dealing with uncertainty: Medical diagnostics P(pos|dis) = 0.99 P(pos|hlth) = 0.01 P(dis) = 0.01 A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable Bayes’ Theorem P(dis|pos) = P(pos|dis) P(dis) / P(pos) = P(pos|dis) P(dis) P(pos|dis) P(dis) + P(pos|hlth) P(hlth) • Test result is positive (bad news!) • What is P(diseased|test positive)? • 0.50 • 0.98 • 0.99

2.1 Dealing with uncertainty: Medical diagnostics P(pos|dis) = 0.99 P(pos|hlth) = 0.01 P(dis) = 0.01 A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable Bayes’ Theorem P(dis|pos) = P(pos|dis) P(dis) / P(pos) = P(pos|dis) P(dis) P(pos|dis) P(dis) + P(pos|hlth) P(hlth) = 0.99 0.01 0.99 0.01 + 0.01 0.99 = 0.50 • Test result is positive (bad news!) • What is P(diseased|test positive)? • 0.50 • 0.98 • 0.99

2.2 Bayesian updating of probabilities Bayes’ Theorem: Prior probability → Posterior prob. Medical diagnostics: P(disease) → P(disease|test result) Model parameterization: P(params) → P(params|data) Model selection: P(models) → P(model|data) SPAM-killer: P(SPAM) → P(SPAM|E-mail header) Weather forecasting: … Climate change prediction: … Oil field discovery: … GHG-emission estimation: … Jurisprudence: … …

2.3 What and why? • We want to use data and models to explain and predict ecosystem behaviour • Data as well as model inputs, parameters and outputs are uncertain • No prediction is complete without quantifying the uncertainty. No explanation is complete without analysing the uncertainty • Uncertainties can be expressed as probability density functions (pdf’s) • Probability theory tells us how to work with pdf’s: Bayes Theorem (BT) tells us how a pdf changes when new information arrives • BT: Prior pdf  Posterior pdf • BT: Posterior = Prior x Likelihood / Evidence • BT: P(θ|D) = P(θ) P(D|θ) / P(D) • BT: P(θ|D)  P(θ) P(D|θ)

3. Bayesian Calibration (BC)of process-based models

Bayesian updating of probabilities for process-based models Bayes’ Theorem: Prior probability → Posterior prob. Model parameterization: P(params) → P(params|data) Model selection: P(models) → P(model|data)

3.1 Process-based forest models Height Environmental scenarios NPP Initial values Soil C Parameters Model

3.2 Process-based forest model BASFOR 40+ parameters 12+ output variables BASFOR

3.3 BASFOR: outputs Carbon in trees (standing + thinned) Volume (standing) Carbon in soil

3.4 BASFOR: parameter uncertainty

3.5 BASFOR: prior output uncertainty Carbon in trees (standing + thinned) Volume (standing) Carbon in soil

3.6 Data Dodd Wood (R. Matthews, Forest Research) Carbon in trees (standing + thinned) Volume (standing) Carbon in soil

3.7 Using data in Bayesian calibration of BASFOR Prior pdf Data Bayesian calibration Posterior pdf

3.8 Bayesian calibration: posterior uncertainty Carbon in trees (standing + thinned) Volume (standing) Carbon in soil

3.9 Calculating the posterior using MCMC MCMC trace plots P(|D)P() P(D|f()) • Start anywhere in parameter-space: p1..39(i=0) • Randomly choose p(i+1) = p(i) + δ • IF: [ P(p(i+1)) P(D|f(p(i+1))) ] / [ P(p(i)) P(D|f(p(i))) ] > Random[0,1] • THEN: accept p(i+1) • ELSE: reject p(i+1) • i=i+1 • 4. IF i < 104 GOTO 2 Metropolis et al (1953) Sample of 104 -105 parameter vectors from the posterior distribution P(|D) for the parameters

3.10 BC using MCMC: an example in EXCEL Click here for BC_MCMC1.xls

install.packages("mvtnorm") require(mvtnorm) chainLength = 10000 data <- matrix(c(10,6.09,1.83, 20,8.81,2.64, 30,10.66,3.27), nrow=3, ncol=3, byrow=T) param <- matrix(c(0,5,10, 0,0.5,1) , nrow=2, ncol=3, byrow=T) pMinima <- c(param[1,1], param[2,1]) pMaxima <- c(param[1,3], param[2,3]) logli <- matrix(, nrow=3, ncol=1) vcovProposal = diag( (0.05*(pMaxima-pMinima)) ^2 ) pValues <- c(param[1,2], param[2,2]) pChain <- matrix(0, nrow=chainLength, ncol = length(pValues)+1) logPrior0 <- sum(log(dunif(pValues, min=pMinima, max=pMaxima))) model <- function (times,intercept,slope) {y <- intercept+slope*times return(y)} for (i in 1:3) {logli[i] <- -0.5*((model(data[i,1],pValues[1],pValues[2])- data[i,2])/data[i,3])^2 - log(data[i,3])} logL0 <- sum(logli) pChain[1,] <- c(pValues, logL0) # Keep first values for (c in (2 : chainLength)){ candidatepValues <- rmvnorm(n=1, mean=pValues, sigma=vcovProposal) if (all(candidatepValues>pMinima) && all(candidatepValues<pMaxima)) {Prior1 <- prod(dunif(candidatepValues, pMinima, pMaxima))} else {Prior1 <- 0} if (Prior1 > 0) { for (i in 1:3){logli[i] <- -0.5*((model(data[i,1],candidatepValues[1],candidatepValues[2])- data[i,2])/data[i,3])^2 - log(data[i,3])} logL1 <- sum(logli) logalpha <- (log(Prior1)+logL1) - (logPrior0+logL0) if ( log(runif(1, min = 0, max =1)) < logalpha ) { pValues <- candidatepValues logPrior0 <- log(Prior1) logL0 <- logL1}} pChain[c,1:2] <- pValues pChain[c,3] <- logL0 } nAccepted = length(unique(pChain[,1])) acceptance = (paste(nAccepted, "out of ", chainLength, "candidates accepted ( = ", round(100*nAccepted/chainLength), "%)")) print(acceptance) mp <- apply(pChain, 2, mean) print(mp) pCovMatrix <- cov(pChain) print(pCovMatrix) MCMC in R

3.12 Using data in Bayesian calibration of BASFOR Prior pdf Data Bayesian calibration Posterior pdf

3.13 Parameter correlations 39 parameters 39 parameters

3.14 Continued calibration when new data become available Prior pdf New data Bayesian calibration Prior pdf Posterior pdf

3.14 Continued calibration when new data become available Prior pdf Posterior pdf Prior pdf New data Bayesian calibration

3.15 Bayesian projects at CEH-Edinburgh [CO2] Uncertainty in earth system resilience (Clare Britton & David Cameron) Time Parameterization and uncertainty quantification of 3-PG model of forest growth & C-stock (Genevieve Patenaude, Ronnie Milne, M. v.Oijen) • Selection of forest models (NitroEurope team) • Data Assimilation forest EC data(David Cameron, Mat Williams) • Risk of frost damage in grassland(Stig Morten Thorsen, Anne-Grete Roer, MvO) • Uncertainty in agricultural soil models(Lehuger, Reinds, MvO) • Uncertainty in UK C-sequestration (MvO, Jonty Rougier, Ron Smith, Tommy Brown, Amanda Thomson)

3.16 BASFOR: forest C-sequestration 1920-2000 Soil N-content C-sequestration Uncertainty of C-sequestration • Uncertainty due to model parameters only, NOT uncertainty in inputs / upscaling

3.18 What kind of measurements would have reduced uncertainty the most ?

3.19 Prior predictive uncertainty & height-data Prior pred. uncertainty Biomass Height Height data Skogaby

3.20 Prior & posterior uncertainty: use of height data Prior pred. uncertainty Biomass Height Height data Skogaby Posterior uncertainty (using height data)

3.20 Prior & posterior uncertainty: use of height data Prior pred. uncertainty Biomass Height Height data (hypothet.) Posterior uncertainty (using height data)

3.20 Prior & posterior uncertainty: use of height data Prior pred. uncertainty Biomass Height Posterior uncertainty (using height data) Posterior uncertainty (using precision height data)

3.22 Summary for BC vs tuning Model tuning Define parameter ranges (permitted values) Select parameter values that give model output closest (r2, RMSE, …) to data Do the model study with the tuned parameters (i.e. no model output uncertainty) Bayesian calibration Define parameter pdf’s Define data pdf’s (probable measurement errors) Use Bayes’ Theorem to calculate posterior parameter pdf Do all future model runs with samples from the parameter pdf (i.e. quantify uncertainty of model results) BC can use data to reduce parameter uncertainty for any process-based model

4. Bayesian Model Comparison (BMC)

4.1 Multiple models -> structural uncertainty bgc century hybrid NdepUE (kg C kg-1 N) [Levy et al, 2004]

4.2 Bayesian comparison of two models Model 1 Model 2 Bayes Theorem for model probab.: P(M|D) = P(M) P(D|M) / P(D) The “Integrated likelihood” P(D|Mi) can be approximated from the MCMC sample of outputs for model Mi (*) P(M1) = P(M2) = ½ P(M2|D) / P(M1|D) =P(D|M2) / P(D|M1) The “Bayes Factor”P(D|M2) / P(D|M1) quantifies how the data D change the odds of M2 over M1 (*) harmonic mean of likelihoods in MCMC-sample (Kass & Raftery, 1995)

4.3 BMC: Tuomi et al. 2007

Process-based modelling of vegetations and uncertainty quantification