470 likes | 550 Views
Local Enhancement of Global Estimation. Molly Leecaster, Ph.D. Kerry Ritter, Ph.D. . DAMARS and STARMAP 2 nd Annual Conference Oregon State University Corvallis, OR August 11, 2003. Acknowledgement. PROJECT FUNDING.
E N D
Local Enhancement of Global Estimation • Molly Leecaster, Ph.D. • Kerry Ritter, Ph.D. DAMARS and STARMAP 2nd Annual Conference Oregon State University Corvallis, OR August 11, 2003
Acknowledgement PROJECT FUNDING • The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of the presenter and STARMAP, the Program they represent. EPA does not endorse any products or commercial services mentioned in this presentation.
Outline of Presentation • Introduction • Two-stage sample design • Spatial modeling of binary EMAP data • Indicator kriging • Conditional autoregressive model • Simulation Example • Future work
Introduction • EMAP developed for estimation of areal extent of resources • Sample locations are spatially separated • EMAP participants are interested in global estimation but also have local concerns • Spatial modeling • EMAP data does not provide information on the local spatial structure required for good spatial models • Therefore …. Augment EMAP design to improve spatial modeling
Goals • Present enhancement to EMAP design • Use of enhanced sample in spatial models of indicator data • Indicator kriging • Conditional autoregressive model
Outline of Presentation • Introduction • Two-stage sample design • Spatial modeling of EMAP data • Simulation Example • Future work
Two-stage: Systematic Grid Plus Star Cluster Sample Design • Two-stage because two goals • Systematic (EMAP) grid for global structure • Star cluster sample for variogram estimation • Enhance EMAP design with additional sample locations • Ideal for areal extent and prediction • Ideal for variogram estimation
Two-Stage Design Pink…….…….absence Blue…….…….presence Black….……...systematic Green.………..star clusters 1 Orange…..…..star clusters 2
Stage One: Systematic Component (EMAP) • Based on global estimation requirements • e.g. 30 spatially separated locations per strata
Stage Two:Star Cluster Component • Star clusters of sample sites around stage-one locations • Star clusters provide estimate of small scale pair-wise variance • Star clusters also provide many added pairs of samples at various distance lags • Star clusters provide directional information at small scale • How to specify star clusters?
Stage Two:Star Cluster Component • Location of star clusters • Adaptive, locate at specified observed response • Does this bias the variogram estimation? • Random stage-one locations • Systematic subset of stage-one locations • Size of star clusters • Diameter of star = variogram range • Diameter of star > variogram range • Number of star clusters • At least two, but how many more?
Outline of Presentation • Introduction • Two-stage sample design • Spatial modeling of EMAP data • Simulation Example • Future work
Spatial Models for Binary Data • Indicator kriging for geo-referenced data • Conditional autoregressive model for binary lattice data
Indicator Kriging • Binary geo-referenced data • Spatial correlation structure modeled from data • Precision of predictions depends on sample spacing and variogram parameters
Ordinary Indicator Kriging • Estimate local indicator mean, , at each location • Apply simple IK estimator using estimated mean
Conditional Autoregressive Model for Binary Data • Binary lattice data • Spatial correlation structure assumed: locally (neighborhood) dependent Markov random field • Neighborhood defined as fixed pattern of surrounding grid points • Precision of predictions depends on neighborhood structure, grid size, and variance of response
Comparison of Models • Ordinary Indicator Kriging • Advantages • Knowledge of spatial relationship improves prediction • Assumed spatial relationship based on data • Disadvantages • Not robust to variogram mis-specification • Requires strong stationarity assumption • Conditional autoregressive • Advantages • No need to estimate or model variogram • Can be used without geo-referenced data • Disadvantages • Assumed spatial relationship based on a grid size that could be inaccurate
Outline of Presentation • From last year to now … progress & new directions • Two-stage sample design • Spatial modeling of EMAP data • Simulation Example • Future work
Simulation Example • Used simulation so spatial structure was known • Simulated response from specific variogram model on to 50x50 hexagon grid of points • Specified presence/absence cutoff • Applied two-stage sample design (2 realizations) • Estimated and modeled variogram from sample data • For some, did two manual and one automatic fit • Predicted probability of presence using indicator kriging and conditional autoregressive model
Simulation Methods • Simulated data from Gaussian random field (S-Plus) • Spherical variogram, range = 22, sill = 0.4, nugget = 0 • Simulated value > 2 => presence • Sample Designs • Systematic sample (n=30) • Systematic sample plus 2 star clusters (n=54) • Systematic sample plus 4 star clusters (n=78) • Models • Indicator kriging • Conditional autoregressive model
Data Simulation with Sample Sites Pink…….…….absence Blue…….…….presence Black….……...systematic Green.………..star clusters 1 Orange…..…..star clusters 2
Variogram for Sample Designs Systematic Systematic + 2 Stars Systematic + 4 Stars
Three Fits: Systematic + 2 Stars Automatic Fit Manual Fit #1 • Range Sill Nugget • 17 0.3 0 • 0.4 0 • 0.27 0 • All use correct model Manual Fit #2
Predictions from 3 Variogram Fits Automatic Fit Manual Fit #1 Manual Fit #2
Comparison of Prediction Errors • Sensitivity • Number of presence sites predicted to be present • Specificity • Number of absence sites predicted to be absent • True Positive Rate • Number of predicted presence sites that truly are present • True Negative Rate • Number of predicted absence sites that truly are absent
Comparison of Predictions (Data1F)(positive if probability > 0.5)(Auto, Manual #2)
Comparison of Predictions (Data1F)(positive if probability > 0.3)(Auto, Manual #2)
Data Simulation with Sample Sites Pink…….…….absence Blue…….…….presence Black….……...systematic Green.………..star clusters 1 Orange…..…..star clusters 2
Variograms for Sample Designs Systematic Systematic + 2 Stars Systematic + 4 Stars
Three Fits: Systematic Automatic Fit Manual Fit #1 • Range Sill Nugget • 30 .25 .21 • 15 .27 0 • .22 0 • All use correct model Manual Fit #2
Predictions from 3 Variogram Fits Automatic Fit Manual Fit #1 Manual Fit #2
Comparison of Predictions (Data3F) (positive if probability > 0.5)(Auto, Manual #2)
Comparison of Predictions (Data3F) (positive if probability > 0.3)(Auto, Manual #2)
Simulation Conclusions - Design • Two star clusters improved small-scale features of variogram • Two star clusters improved prediction accuracy • Four star clusters offered little improvement over two stars
Simulation Conclusions - Models • Variogram model affects predictions • Kriging tends toward overall mean probability of presence, i.e. it smooths • Kriging builds patches whose diameter is approximately the range of the variogram • Conditional autoregressive model attempts to connect observed presence • Neither model had consistently higher sensitivity or specificity
Outline of Presentation • From last year to now … progress & new directions • Two-stage sample design • Spatial modeling of EMAP data • Simulation Example • Future work
Future Work • Further simulation studies on two stage design • Effect of sample size • Number of star clusters necessary to improve variogram estimation • Effect of size of star clusters • Bias from adaptive second-stage sampling • Advantages of indicator kriging and conditional autoregressive model • Sensitivity of conditional autoregressive model to initial values, prior distributions, and grid size • Sensitivity of kriging to variogram model specification
Future Work • Apply two-stage sample design to real data • DDT data from Santa Monica Bay, CA • EMAP data and local monitoring data • Freely distribute functions for applying the conditional autoregressive model on a hexagon lattice • Functions in R to produce hexagon lattice input for WinBUGS • File in WinBUGS to apply model • Investigate optimal grid size to achieve EMAP and spatial modeling goals
Systematic (EMAP) Grid Based on Variogram Model • Kriging variance • Analog for conditional autoregressive model
Systematic (EMAP) Grid Based on Variogram Model • Prediction variance is minimized by large covariance between prediction location and sample locations • For kriging, grid refers to sample locations • For conditional autoregressive, grid refers to sample locations and prediction locations • Want -------- Sample locations “close” together • Samples too far apart => • Kriging -> correctly uses no spatial relationship • Conditional autoregressive -> incorrectly uses assumed spatial relationship • Samples too close together => waste of resources