560 likes | 652 Views
FOUR METHODS OF ESTIMATING PM 2.5 ANNUAL AVERAGES. Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality, Planning, and Standards Emissions Monitoring, and Analysis Division. Project Objectives.
E N D
FOUR METHODS OF ESTIMATING PM2.5 ANNUAL AVERAGES Yan Liu and Amy Nail Department of Statistics North Carolina State University EPA Office of Air Quality, Planning, and Standards Emissions Monitoring, and Analysis Division
Project Objectives • Estimation of annual average of PM2.5 concentration • Estimation of standard errors associated with annual average estimates • Estimation of the probability that a site’s annual average exceeds 15 mg/m3 • At 2400 lattice points for 2000, 2001 • Comparisons of 4 different methodologies: 1. Quarter-based analysis (Yan) 2. Annual-based analysis (Yan) Daily-based analyses: 3. Doug Nychka’s method (Bill) 4. Generalized least squares in SAS Proc Mixed (Amy)
Why are Standard Errors Important? • We may estimate that the annual average for lattice point 329 is 16 mg/m3, which exceeds the standard of 15. But since our estimate has some uncertainty or standard error, we’d like to take this uncertainty into account in order to determine the probability that lattice point 329 exceeds 15.
…we also want maps like this. Note: This Map is WRONG--so don’t show it to anyone! We haven’t figured out the correct way to determine errors, so we cannot correctly draw a probability map yet.
Data Description • Concentrations of PM2.5 measured during 2000, 2001 • The domain analyzed: the portion of the U.S. east of –100o longitude • Concentrations measured every third day
Method 1 – Quarterly Analysis • 3 months in each quarter • Q1(Jan. - Mar.) Q2(Apr. - Jun.) • Q3(Jul. - Sep.) Q4(Oct. - Dec.) • Within quarters, 75% completeness • Found quarter mean conc. at each site • For each quarter, kriged mean conc. over lattice • Averaged the quarter predictions to get annual average estimate
Method 2 – Annual Analysis • Used sites common to all 4 quarters in quarterly analysis • Found annual mean conc. at each site • Kriged annual mean conc. over lattice
Model for Quarterly and Annual Analyses • Predicted value = quadratic surface prediction (SP) + error prediction (KP)
Estimating Quadratic Surface • Model: Conc = 0 + 1lat + 2lon + 3lat2 + 4lon2 + 5lat * lon + Assume: 1) E() = 0, Var() = 2 I 2) The betas are estimated by SAS assuming errors iid • Fit parameters using ordinary least squares in SAS proc reg • Obtained surface predictions (SP) and their standard errors (SEsp) and the ’s
Kriging the Error Surface • Model: • {(s):s R2} E((s)) = 0 Var((s)-(s’)) = 0 if s=s’ 2n + 2(1- e-dist/) if ss’ • Estimated variogram parameters using nonlinear least squares in Splus • Obtained kriging predictions (KP) and their standard errors (SEkp)
sill (h) Spherical model Exponentialmodel Gaussian model range h Variogram Models • 3 commonly used variogram models: • Exponential • (h)=1 – exp (-3h/a) • Spherical • (h)=1.5 • (h/a) - 0.5 • (h/a)3 if h a • (h)=1 otherwise • Gaussian • (h)=1 - exp (-3h2 /a2) a: range h: distance
Cross Validation to Select Variogram Model • Idea: temporarily remove the sample value at a particular location one at a time, estimate this value from remaining data using the different variogram models. • Prediction error = observed - predicted • MSE = 1/(n-1) (prediction error)2
Cross Validation MSE for Three Variogram Models • Exponential model has the least MSE. • Conclusion: use Exponential model
Calculating Predicted Annual Averages • Quarter averages: PQi = SPQi + KPQi • Annual average from quarterly analysis: Pannual = ( PQi) / 4 • Annual average from annual analysis: Pannual = SPannual + KPannual 4 i=1
Calculation of Standard Error for Annual Averages • Standard errors of quarterly averages: SEQi = (SEspi)2 + (SEkpi)2 • Standard errors of annual averages from quarterly analysis: SEannual = 1/16 (SEQi)2 • Standard errors of annual averages from annual analysis: SEannual = (SEsp)2 + (SEkp)2
Sources of Error • Less than 5% of total errors is coming from fitting a quadratic surface. • Kriging prediction error dominates.
Problems With Quarterly & Annual Analysis • The surface prediction and kriging prediction are not independent. • Var (SP + KP) Var (SP) + Var (KP)
More Problems With Quarterly and Annual Analysis • Not using all available data • When kriging residuals, estimated variogram is biased low (Kim and Boos 2002) (This problem could be solved by using generalized least squares.) • Ignored standard deviation of annual and/or quarterly averages in calculation of kriging prediction error • Quarterly averages may not be independent
Methods 3 & 4 - Daily-Based • Used every third day data (122 days per year) • Kriged each day to obtain predictions at 2400 lattice points • At each lattice point fit a timeseries to the 122 days’ estimates to estimate annual average • Calculated timeseries error for annual average using proc arima
Method 3 - “Doug’s Method” • Fit a quadratic surface using the Krig function in Splus • Used an algorithm that minimizes generalized cross validation error in order to estimate all parameters--including both quadratic surface parameters and covariance parameters • Did not assume errors iid when fitting quad surf, so coefficients in quad surf estimated based on cov structure • Specified an exponential covariance structure with a nugget • Provided the fixed value of 200 km for range parameter for all 122 days
Method 4 - “Amy’s Method” • Fit a quadratic surface using Generalized Least Squares in SAS Proc Mixed • Restricted (or residual) Maximum Likelihood used to estimate all parameters • Did not assume errors iid when fitting quad surf, so coefficients in quad surf estimated based on cov structure • Specified an exponential covariance structure with a nugget • Estimated each parameter each day
Problems with Doug’s Method • Using the same value for range parameter every day requires assumption that the range parameter is constant over time. Not a valid assumption. Amy’s method does not make this assumption. • Ignored kriging prediction error in calculation of timeseries error for annual average.
Problems with Amy’s Method • REML assumes data for each day is normally distributed. It isn’t. Can fix by using a transformation, but must be careful not to introduce bias in back-transform. There is an unbiased back-transform predictor and an associated estimate of error in Cressie section 3.2.2. Also must decide whether to transform each day using the same function. Doug’s method does not require normality assumption. • Ignored kriging prediction error in calculation of timeseries error for annual average.
What if we “propagate” errors? • At a given lattice point we have 122 days’ worth of predictions, each with a kriging prediction error. What if we treat the 122 days as independent observations (they aren’t, they are AR1) and combine the errors accordingly? And we do this for each of our 2400 lattice points.
The Big Problem • None of our standard error estimates are correct! • They are all underestimates! • We need to learn how to put spatial error components together with temporal error components.
Model for one day • Yij = o + 1i + 2i2 + 3j + 4j2 + 5ij + ij • Where i = lattitude j = longitude • E(ij) = 0 • Cov(ij, I’j’) = 2n + 2e-dist/ i=i’and j=j’ 2e-dist/ ii’ or jj’
Model for one site • Yk= + (Yk-1- ) + ek k = 1,…,122 • Where E(ek) = 0 • Var (ek) = 2 • Note: this is an AR1 model. The errors are iid (0, 2) because the temporal correlation is accounted for using the (Yk-1- ) term.
Model for all sites and days? • Yijk = o,k + 1,ki + 2,ki2 + 3,kj + 4,kj2 + 5,kij + ijk + eijk • Where E(ijk ) = 0, E(eijk) = 0 • We’ve assumed isotropy and stationarity for simplicity. • But how do we model Cov(ijk, I’j’k’), Cov(eijk, ei’j’k’), and Cov (ijk, ei’j’k’)?