390 likes | 536 Views
USING BAYESIAN HIERARCHICAL MODELLING TO PRODUCE HIGH RESOLUTION MAPS OF AIR POLLUTION IN THE EU. Gavin Shaddick University of Bath RSS Avon Local Group October 2006. A PMoSPHERE. Air Pollution Modelling for Support to Policy on Health, Environment and Risk Management in Europe.
E N D
USING BAYESIAN HIERARCHICAL MODELLING TO PRODUCE HIGH RESOLUTION MAPS OF AIR POLLUTION IN THE EU Gavin Shaddick University of Bath RSS Avon Local Group October 2006
APMoSPHERE Air Pollution Modelling for Support to Policy on Health, Environment and Risk Management in Europe APMoSPHERE is a thematic project, funded under the Global Monitoring for Environment and Security initiative, as part of the European Union’s Fifth Research Framework Programme. Its aim is to compile high resolution maps of air pollution across the EU, as a basis for scientific research and policy support.
APMoSPHERE Air Pollution Modelling for Support to Policy on Health, Environment and Risk Management in Europe APMoSPHERE is a thematic project, funded under the Global Monitoring for Environment and Security initiative, as part of the European Union’s Fifth Research Framework Programme. Its aim is to compile high resolution maps of air pollution across the EU, as a basis for scientific research and policy support.
APMoSPHERE Partners • Department of Epidemiology and Public Health, • Imperial College • Institute for Risk Assessment Sciences, • University of Utrecht • Institute for Environmental Research and Sustainable Development, National Observatory of Athens • Centre for International Climate and Environmental Research, Oslo • Department of Mathematical Sciences, University of Bath • AEA Technology Netcen
What APMoSPHERE will do • Key objectives of APMoSPHERE are: • to produce a detailed (1km) inventory of atmospheric emissions by major sector for the EU • to develop and test a range of different methods for mapping air pollution on the basis of these emissions estimates, in combination with other routinely available data sets (including air pollution monitoring data) • using these various methods and data sets to generate detailed (1km) and updatable maps of air pollution, together with a set of policy-related indicators on potential ecological and health risks • based on these results, to provide an assessment of the air pollution situation in the EU, and implications for future air quality monitoring and policy • The pollutants: • Particulates (PM10 and black smoke) • Nitrogen oxides (NOx and NO2) • Carbon monoxide • Sulphur dioxide • Ozone
What APMoSPHERE will do • Key objectives of APMoSPHERE are: • to produce a detailed (1km) inventory of atmospheric emissions by major sector for the EU • to develop and test a range of different methods for mapping air pollution on the basis of these emissions estimates, in combination with other routinely available data sets (including air pollution monitoring data) • using these various methods and data sets to generate detailed (1km) and updatable maps of air pollution, together with a set of policy-related indicators on potential ecological and health risks • based on these results, to provide an assessment of the air pollution situation in the EU, and implications for future air quality monitoring and policy • Particulates (PM10 and black smoke) • Nitrogen oxides (NOx and NO2) • Carbon monoxide • Sulphur dioxide • Ozone
Geographic Information System • Study Area • EU15 + Norway • Concentration data • AIRBASE & EMEP * • 1 km predictors • Topography • Meteorology • Roads * • Land cover * • Light intensity * • Modelled Emissions * • Population data • 1 km modelled population*
Aims • Provide modelled exposures (and measures of uncertainty). • Impute missing values • Unmeasured locations • Combine information from multiple sources • Investigate the spatio-temporal modelling of pollutants. • Assessing the contribution of spatial, temporal and random variabilty.
Data dependencies • Relationship with covariates • Climate, e.g. temperature • Local emissions, e.g. land cover • Topography, e.g. altitude • Temporal dependencies. • Spatial dependencies. • Distance between monitoring sites. • Site type (e.g. background, traffic).
Model framework • Bayesian Hierarchical Model. • Pollutants (log) modelled as a function of the ‘true’ underlying level with unstructured error. • Incorporate covariate information • True underlying level is a function of the previous year’s level. • Missing values treated as unknown parameters within the Bayesian framework and can be estimated.
Priors • Information from previous studies or years • Expert opinion • Physical science • ‘vague’
Posteriors and parameter estimates • In simple cases, e.g. where both prior and likelihood are conjugate, exact expressions for the posterior distributions can be found • In more complex cases, the posterior may be intractable • Can use simulation to ‘build up’ the posterior • MCMC (WinBUGS)
Model stages • Level 1 : Observed data stage. • Yt = t + covariates + site effect + vt, vt ~ N(0,v) • Level 2(a) : Temporal/system stage. • t = αt-1 + wt, wt ~ N(0,w) • Level 2(b) : Spatial stage • Site random effects modelled as multivariate normal with correlations proportional to the distance,d, between sites. • f(d) = exp(-d) • Site effects can be estimated at unmeasured locations conditional on the measured values. • Level 3 : Hyperparameters. • Assign prior distributions to covariate effects and variances.
Prior information • For the spatial effect • Φ given a uniform (1.3-4) distribution • Corresponds to correlations falling to between 0.13 and 0.52 at a distance of 50km • Normal distributions for covariate effects • Gamma distributions for (inverse of) variances [precisions]
Results • UK data for SO2, 1997-2001
Random error Components of variation • Random (unstructured) error – 26% • Temporal – 13% • Spatial – 61%
Spatial effects • Posterior median for Φ : 3.79, 95% CrI (2.95-4.00)
Predictions for UK Overall mean + temporal (2001) effect + covariate effect + spatial effect
Extending methodology to EU level • Increased number of sites brings large computational burden • Following analysis performed on NO2 in 2001 • 75 % dataset (sites) used to build models • 25 % for validation
Modelling at different scales • Based on theoretical and empirical environmental models • Variograms • Scales defined by site type and associated covariates • Global (climate and topological) • Rural (transport, population density, agriculture) • Urban (transport, population density, urban greenery) • Eases computational burden
Model stages • Global model • YGs = G + global covariatesS + site effects + vGs, vGs ~ N(0,2G) • Rural model • (YRs – predicted(YRs) ) = R + rural covariatesS + vRs, vRs ~ N(0,2R) • Urban model • (YUs – predicted(YUs) ) = U + urban covariatesS + vUs, vUs ~ N(0,2U) • Predictions were made using the global models for every one of the 1km x 1km cells (2854116) • additional effects of rural (2788454 cells) • urban (65662 cells) • used to create an further two sets of predictions which were then combined to create a composite map.
Results – global model • Increases with distance from sea and for climate variables 2 & 5 – areas with warm or hot summers • Decreases with altitude • Posterior median for , 0.037, corresponds to fall in correlation to 0.024 at 100km • Without any geograpahical covariates, much smaller (by factor of ten), indicating much more ‘spatial’ residual error
Results – rural and urban • Rural - significant effect of major roads • Urban - clear overall increase (intercept term) • transport (major, minor roads) • population density • negative association with altitude
Pollutant NO2 Scale Composite of global, rural and urban background Time period 2001, annual average Geographic extent Excludes Norway and Sweden Statistics (ug/m3) Min 0.45 Max 139.06 Mean 12.47 Std dev 5.64 Modeling method Bayesian Hierarchical Modelling Model
Pollutant NO2 Scale Composite of global, rural and urban background Time period 2001, annual average Geographic extent Excludes Norway and Sweden Statistics (ug/m3) Min 1.66 Max 287.36 Mean 19.19 Std dev 9.04 Length of 95% credible interval
Validation • Performed at each scale (global, rural, urban) • RSME, MAbsE, R2, etc… • Best results for NO2, PM10 and O3 • Best results for urban scale (relationships with covariates) • exception of O3
Summary • Applied spatial-temporal model to ca. 200 sites measuring SO2 in UK (1997-2001). • Assessed proportions of spatial, temporal and random variation • Applied spatial model to entire EU • Produced predicted levels at 1km resolution for different scales • Produced composite maps with measures of uncertainty
Future work/considerations • Combined spatial models • different site types modelling simultaneously • Computational aspects • Estimation and (joint) prediction • Sensitivity analysis (to priors) • Conditional modelling • Neighbouring sites • Other pollutants • multi-pollutant models
http://www.apmosphere.org More information on APMoSPHERE
Alternative approach – conditional modelling • Problems handling large spatial matrices at such a high resolution. • Define sites as having ‘neighbours’ (may include distance cut-off). • Allows feasibility of different resolutions to be examined. • Can be much, much faster! • Prediction and estimation may performed together during the MCMC.
Conditional model • Ys ~ N(Ss,v) • Ss = β + Ws • Ws ~ N(ρΣi in δs Ws/ns, nsτ) • Where Σi in δs Ws/ns is the average of the neighbours of point s. • The number of points that constitute the neighbourhood can be varied
A 100km resolution structure with 10 neighbours 372 unknown points
Higher resolutions • Example of 50km resolution 418 known 1469 unknown points
Computational aspects • 100,000 iterations with ca. 400 sites • Joint model – 5 days • Conditional model – 30 minutes • Using 2.5GHZ PC with 1GB RAM • Using conditional model with observed and prediction points together at 20km • 1 day (1000 iterations – 15 minutes) • Higher resolutions computationally feasible (but problems writing the file!)