1 / 27

Variance Estimation in Complex Surveys

Variance Estimation in Complex Surveys. Drew Hardin Kinfemichael Gedif. So far. Variance for estimated mean and total under SRS, Stratified, Cluster (single, multi-stage), etc. Variance for estimating a ratio of two means under SRS (we used linearization method). What about other cases?.

bao
Download Presentation

Variance Estimation in Complex Surveys

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variance Estimation in Complex Surveys Drew Hardin Kinfemichael Gedif

  2. So far.. • Variance for estimated mean and total under • SRS, Stratified, Cluster (single, multi-stage), etc. • Variance for estimating a ratio of two means under • SRS (we used linearization method)

  3. What about other cases? • Variance for estimators that are not linear combinations of means and totals • Ratios • Variance for estimating other statistic from complex surveys • Median, quantiles, functions of EMF, etc. • Other approaches are necessary

  4. Outline • Variance Estimation Methods • Linearization • Random Group Methods • Balanced Repeated Replication (BRR) • Resampling techniques • Jackknife, Bootstrap • Adapting to complex surveys • ‘Hot’ research areas • Reference

  5. Linearization (Taylor Series Methods) • We have seen this before (ratio estimator and other courses). • Suppose our statistic is non-linear. It can often be approximated using Taylor’s Theorem. • We know how to calculate variances of linear functions of means and totals.

  6. Linearization (Taylor Series Methods) • Linearize • Calculate Variance

  7. Linearization (Taylor Series) Methods • Pro: • Can be applied in general sampling designs • Theory is well developed • Software is available • Con: • Finding partial derivatives may be difficult • Different method is needed for each statistic • The function of interest may not be expressed a smooth function of population totals or means • Accuracy of the linearization approximation

  8. Random Group Methods • Based on the concept of replicating the survey design • Not usually possible to merely go and replicate the survey • However, often the survey can be divided into R groups so that each group forms a miniature versions of the survey

  9. Stratum 1 1 2 3 4 5 6 7 8 Stratum 2 1 2 3 4 5 6 7 8 Stratum 3 1 2 3 4 5 6 7 8 Stratum 4 1 2 3 4 5 6 7 8 Stratum 5 1 2 3 4 5 6 7 8 Treat as miniature sample Random Group Methods

  10. Unbiased Estimator (Average of Samples) • Slightly Biased Estimator (All Data)

  11. Random Group Methods • Pro: • Easy to calculate • General method (can also be used for non smooth functions) • Con: • Assumption of independent groups (problem when N is small) • Small number of groups (particularly if one strata is sampled only a few times) • Survey design must be replicated in each random group (presence of strata and clusters remain the same)

  12. Resampling and Replication Methods • Balanced Repeated Replication (BRR) • Special case when nh=2 • Jackknife (Quenouille (1949) Tukey (1958)) • Bootstrap (Efron (1979) Shao and Tu (1995)) • These methods • Extend the idea of random group method • Allows replicate groups to overlap • Are all purpose methods • Asymptotic properties ??

  13. Balanced Repeated Replication • Suppose we had sampled 2 per stratum • There are 2H ways to pick 1 from each stratum. • Each combination could treated as a sample. • Pick R samples.

  14. Balanced Repeated Replication • Which samples should we include? • Assign each value either 1 or –1 within the stratum • Select samples that are orthogonal to one another to create balance • You can use the design matrix for a fraction factorial • Specify a vector ar of 1,-1 values for each stratum • Estimator

  15. Balanced Repeated Replication • Pro • Relatively few computations • Asymptotically equivalent to linearization methods for smooth functions of population totals and quantiles • Can be extended to use weights • Con • 2 psu per sample • Can be extended with more complex schemes

  16. The JackknifeSRS-with replacement • Quenoule (1949); Tukey (1958); Shao and Tu (1995) • Let be the estimator of  after omitting the ith observation • Jackknife estimate • Jackknife estimator of the • For Stratified SRS without replacement Jones (1974)

  17. The Jackknifestratified multistage design • In stratum h, delete one PSU at a time • Let be the estimator of the same form as when PSU i of stratum h is omitted • Jackknife estimate: • Or using pseudovalues

  18. The Jackknifestratified multistage design • Different formulae for • Where • Using the pseudovalues

  19. The JackknifeAsymptotics • Krewski and Rao (1981) • Based on the concept of a sequence of finite populations with L strata in • Under conditions C1-C6 given in the paper Where method is the estimator used (Linearization, BRR, Jackknife)

  20. The BootstrapNaïve bootstrap • Efron (1979); Rao and Wu (1988); Shao and Tu (1995) • Resample with replacement in stratum h • Estimate: • Variance: • Or approximate by • The estimator is not a consistent estimator of the variance of a general nonlinear statistics

  21. The BootstrapNaïve bootstrap • For • Comparing with • The ratio does not converge to 1for a bounded nh

  22. The BootstrapModified bootstrap • Resample with replacement in stratum h • Calculate: • Variance: • Can be approximated with Monte Carlo • For the linear case, it reduces to the customary unbiased variance estimator • mh < nh

  23. More on bootstrap • The method can be extended to stratified srs without replacement by simply changing • For mh=nh-1, this method reduces to the naïve BS • For nh=2, mh=1, the method reduces to the random half-sample replication method • For nh>3, choice of mh …see Rao and Wu (1988)

  24. SimulationRao and Wu (1988) • Jackknife and Linearization intervals gave substantial bias for nonlinear statistics in one sided intervals • The bootstrap performs best for one-sided intervals (especially when mh=nh-1) • For two-sided intervals, the three methods have similar performances in coverage probabilities • The Jackknife and linearization methods are more stable than the bootstrap • B=200 is sufficient

  25. ‘Hot’ topics • Jackknife with non-smooth functions (Rao and Sitter 1996) • Two-phase variance estimation (Graubard and Korn 2002; Rubin-Bleuer and Schiopu-Kratina 2005) • Estimating Function (EF) bootstrap method (Rao and Tausi 2004)

  26. Software • OSIRIS – BRR, Jackknife • SAS – Linearization • Stata – Linearization • SUDAAN – Linearization, Bootstrap, Jackknife • WesVar – BRR, JackKnife, Bootstrap

  27. References: • Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of statistics 7, 1-26. • Graubard, B., J., Korn, E., L. (2002). Inference for supper population parameters using sample surveys. Statistical Science, 17, 73-96. • Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples: Properties of linearization, jackknife, and balanced replication methods. The annals of statistics. 9, 1010-1019. • Quenouille, M., H.(1949). Problems in plane sampling. Annals of Mathematical Statistics 20, 355-375. • Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex survey data. JASA, 83, 231-241. • Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation under stratified multistage sampling. Communications in statistics. 33:, 2087-2095. • Rao, J. N. K., and Sitter, R. R. (1996). Discussion of Shao’s paper.Statistics, 27, pp. 246–247. • Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005). On the two-phase framework for joint model and design based framework. Annals of Statistics (to appear) • Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Verlag. • Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics. 29:614. Not referred in the presentation • Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-Verlag. • Shao, J. (1996). Resampling Methods in Sample Surveys. Invited paper, Statistics, 27, pp. 203–237, with discussion, 237–254.

More Related