1 / 1

Abstract

Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University. Funding/Disclaimer

russ
Download Presentation

Abstract

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling DesignMark Delorey, F. Jay Breidt, Colorado State University Funding/Disclaimer The work reported here was developed under the STAR Research Assistance Agreement CR-829095 and CR-829096 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This poster has not been formally reviewed by EPA.  The views expressed here are solely those of the presenter and the STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in this poster. This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative Agreements # CR – 829095 and # CR – 829096 Abstract In aquatic resources, a two-stage sampling design can be employed to make the best use of what are often limited time and financial resources. Even with the ability to focus such resources, it is often the case that the sample sizes are not sufficiently large to make model-free inferences. The presence of auxiliary information for the regions of interest suggests employing a model in our inferences. Breidt, Claeskens, and Opsomer (2003) propose incorporating this auxiliary information through a class of model-assisted estimators based on penalized spline regression in single stage sampling. Zheng and Little (2003) also use penalized spline regression in a model-based approach for finite population estimation in a two-stage sample. In a survey context, weights computed from a set of auxiliary information are often applied to many study variables. With this approach, model-assisted estimators should fare better than model-based estimators. We compare the two through a series of simulations. Two-Stage Sampling • The population of elements U = {1,…, k,…, N} is partitioned into clusters or primary sampling units (PSUs), U1,…, Ui,…, . So,where Ni is the number of elements or secondary sampling units (SSUs) in Ui. • First stage: A sample of clusters, sI, is selected based on a design, pI() with inclusion probabilities Ii and Iij. • Ii and Iij are the first and second order inclusion probabilities, respectively • Second stage: For every i  sI, a sample si is drawn from Ui based on the design pi( | sI) • Typically require second stage design to be invariant and independent of the first stage Two-Stage Sampling with Aquatic Resources • Time and expense constraints may make two-stage sampling more efficient • Auxiliary information may be available on different scales The Estimators (for population totals) • Horvitz-Thompson (HT)where • Model-assistedwhere is the PSU total predicted by the model • Model-basedwhere is the ith cluster mean predicted by the model Notes on the Models and Model Parameters • 3 different models used • Linear • Penalized spline with random effect for PSU • Penalized spline with no random effect for PSU • In a survey context, such as those found in environmental monitoring, it is often desirable to obtain a single set of survey weights that can be used to predict any study variable. To accommodate this: • Smoothing parameter for spline is selected by fixing the degrees of freedom for the smooth rather than using a data driven approach • Variance component for PSU effect is computed for the linear model and resulting covariance matrix and corresponding survey weights are applied to samples from other data sets • In this kind of survey context, model-assisted estimators have good efficiency properties and should be superior to model-based estimators which rely on correct specification of variance components Case A: Cluster Level Auxiliaries (Our focus) • The auxiliary information is available for all clusters in the population • Leads to regression modeling of quantities associated with the clusters, such as cluster totals • Cluster quantities can be computed for all clusters • Population quantities can be computed from cluster estimates • Example: Lake represents a cluster; auxiliary information is elevation Generating Responses • 500 PSUs; the number of SSUs per cluster ~ Uniform(50, 400) • PSU = m(I) + , where m() is one of the eight functions below and  ~ N(0, 2I) • We use first order inclusion probabilities proportional to size (pps) • Auxiliary data is often proportional to size of cluster • Response of interest yij = i + ij. where yij is the jth element in the ith cluster and ij ~iid N(0, 2) Comments on Simulation Results • 500 samples from each of the populations were drawn • H-T = Horvitz-Thompson estimatorM-A: lin = Model-assisted estimator using a linear modelM-B: pmmra = Model-based estimator using a penalized spline and including a random effect for PSUM-A: pmm = Model-assisted estimator using a penalized spline with no random effect for PSU • Point represents MSEEstimator:MSEModel-assisted estimator with radom effect for PSU • Vertical black bars represent approximate 95% confidence intervals • Model-assisted estimator with random effect for PSU is as efficient or more efficient than model-based estimator; we do not appear to lose efficiency (with respect to MSE) by using model-assisted non-parametric methods Case B: Complete Element Level Auxiliaries • The auxiliary information is available for all elements in the population • Leads to regression modeling of quantities associated with the elements • Cluster and population quantities can then be computed from element estimates and observations • Example: EMAP hexagon is cluster; lake is element; auxiliary information is elevation Case C: Limited Element Level Auxiliaries • The auxiliary information is available for all elements in selected clusters only • Leads to regression modeling of quantities associated with the elements • Regression estimators can be used for cluster-level quantities only for the clusters selected in the first-stage sample • Example: Aerial photography of selected sites (clusters); for each point (element) in site, we have percent forested, urban, industrial Case D: Limited Cluster Level Auxiliaries • The auxiliary information is available for all clusters in the first-stage sample • Not a very interesting case • Design-based estimator can be used for population quantities • In some cases, good estimators for population quantities are not available • Example: Cluster is lake; auxiliary information is measure of size which is not available until site is visited

More Related