310 likes | 428 Views
Aquatic Resource Surveys. Designs and Models for. DAMARS. R82-9096-01. Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys. Breda Munoz Virginia Lesser.
E N D
Aquatic Resource Surveys Designs and Models for DAMARS R82-9096-01 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser
This presentation was supported under STAR Research Assistance Agreement No. CR82-9096-01 awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this presentation are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.
Outline • Missing data in environmental surveys • Nonignorable missing data mechanism • Model-based approach for nonignorable missing data • Design-based estimation and nonignorable missing data • Illustration • Summary
Missing Data in Environmental Surveys • Researchers in environmental studies must obtain access to selected sites to gather field data • Denial of access: • common problem in environmental surveys • unit non-response • affects the results of data analysis
Result 1995 1996 Private Landowners Agreed to access 43% 40% Refused access 36% 37% Undeliverable 2% 2% Not returned/no contact 16% 14% Public Land 3% 7% Total 100% 100% Response Disposition 1995/1996 EMAP North Dakota Prairie Wetlands Studies(Lesser, 2001)
Introduction • (Boward et.al.,1999) The 1995-1997 Maryland Biological Stream Survey Results: overall denial access rate of 10%. • ODFW habitat surveys overall rate of access denial (Flitcroft et.al., 2002): • 1998: 10.0% • 1999: 6.0% • 2000: 12.5%
Assumptions • A probability sampling design to collect outcomes of a spatial random process Y • is a collection of sampling sites selected using the probability sampling design. • auxiliary variables
Missing Mechanism: Missing Completely at Random (MCAR) X1 Y R X2 Smith, Skinner and Clark (1999), Rubin and Little (2002)
Missing Mechanism: Missing at Random (MAR) X1 Y R X2 Smith, Skinner and Clark (1999), Rubin and Little (2002)
Missing Mechanism: Nonignorable X1 Y R X2 Smith, Skinner and Clark (1999), Rubin and Little (2002)
Model-based Approach • Under a nonignorable mechanism: we model the joint probability of the data and the missing mechanism indicator (“response” indicator) : • R(si) ~ Bernoulli(pi), Missing Mechanism model Data model covariates
Model-assisted estimation and nonignorable missing data • Assume the parameter of interest:Total of the response Y R
Model-assisted estimation and nonignorable missing data • Continuous form of the Horvitz-Thompson estimator for the total (Cordy, 1993): • Let be a collection of fixed values
Model-assisted estimation (cont.) • Sample size n: observed, n-n* missing nonignorable missing
Model-assisted estimation (cont.) denotes the
Model-assisted estimation (cont.) • Likelihood:
Model-assisted estimation (cont.) • Reparameterize model parameters (Baker and Laird (1988)): Expected cell counts
Model-assisted estimation (cont.) • Use EM algorithm to estimate expected counts of missing cells, Mij. • E-step:
Model-assisted estimation (cont.) • M-step: iterative proportional fitting (IPF) (Bishop et.al., 1975) • Algorithm based on fit of marginal totals. • EM algorithm always converges to a solution when using IPF in the M-step (Baker and Laird, 1988)
Model-assisted estimation (cont.) • Possible estimators for the total of Y: • Cell adjustment: adjustment weight (Little and Rubin, 2002)
Model-assisted estimation (cont.) • Column adjustment:
Model-assisted estimation (cont.) • Row adjustment:
Model-assisted estimation (cont.) • Variance estimators obtained using bootstrap • (Efron, 1994) Bootstrap produces asymptotically valid variance.
Illustration • We simulate a continuous multivariate normal spatial random process for y • Population: John Day Middle Fork stream reaches • 143 stream reaches divided in survey segments (~1 mile) • 6536 survey segments • Area of 785 mi2
Illustration • The population of stream reaches was stratified in 6 strata based on the number of survey segments: “<10 ” “10-20” “20-30” “30-50” “50-100” “>100” • Nonignorable missing data was generated as: • Missing rates of 15%, 30% and 50% were created.
Illustration • Sample size n = 100 • Allocation proportional to number of survey segments on each strata • Q1 = first sample quantile
Modified Bootstrap • We draw 1000 random samples of size 100 from the observed sample: • Independently across strata • Maintain proportional allocation • Maintain the row totals by the auxiliary variable • For each of the 1000 samples, we estimate • We obtain a standard error and MSE for each estimate • We repeat this process 1000 times