1 / 44

Variance Estimation

Optimal Number of Replicates for Variance Estimation Mansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim Chromy Third International Conference on Establishment Surveys (ICES-III) June 21, 2007. Variance Estimation.

Download Presentation

Variance Estimation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimal Number of Replicates for Variance EstimationMansour Fahimi, Darryl Creel, Peter Siegel, Matt Westlake, Ruby Johnson, and Jim ChromyThird International Conference on Establishment Surveys(ICES-III)June 21, 2007

  2. Variance Estimation • Two general approaches for variance estimation With weighted data obtained under complex designs: • Linearization • Replication

  3. Linearization • Approximate complex statistics in terms of L linear statistics • Estimate variance of from:

  4. Replication • Partition the full sample into R subsamples (replicates) • Obtain separate estimates forfrom each replicate: • Estimate variance of by:

  5. How Many Replicates? • Recommendations regarding the optimal number of replicates for variance estimation are at variance: • Computational resources required can be intensive • For certain statistics a larger number of replicates might be needed to produce stable estimates of variance • What is the point of diminishing returns?

  6. Research Methodology • Relying on two complex establishment surveys, this work presents an array of empirical results regarding the number of bootstrap replicates for variance estimation: • National Study of Postsecondary Faculty (NSOPF:04) • National Postsecondary Student Aid Study (NPSAS:04)

  7. General Design SpecificationsNational Study of Postsecondary Faculty (NSOPF:04) • Survey of about 35,000 faculty and instructional staff • Across a sample of 1,080 institutions • In the 50 States and the District of Columbia

  8. Sampling Methodology • Institutions selected with probability proportional to a measure of size to over-represent: • Hispanic • Non-Hispanic Black • Asian and Pacific Islander • Full-time other female • Used RTI’s cost/variance optimization procedure for sample allocation

  9. Institution Sampling Frame

  10. Institution Sample

  11. Expected Faculty CountsFrom Sampled Institutions by Strata

  12. Target Number of Respondentsby Institution and Faculty Strata

  13. Distribution of Respondents(by institution and faculty strata)

  14. Variance Estimation Methodology(NSOPF:04) • Used methodology developed by Kaufman (2004) to create bootstrap replicate weights: • Reflected finite population correction adjustment for the first stage (institution) selection. • Second stage (faculty selection) finite population correction factors were close to one and not reflected. • Produced 65 bootstrap replicates to meet Data Analysis System (DAS) requirements of NCES. • Calculated standard error of several statistics using the above bootstrap replicates and Taylor linearization method in SUDAAN.

  15. Comparisons of Variance EstimatesSE of Percent Teaching as Principal Activity by Rank(Bootstrap vs. Linearization)

  16. Comparisons of Variance EstimatesSE of Percent Research as Principal Activity by Rank(Bootstrap vs. Linearization)

  17. Comparisons of Variance EstimatesSE of Percent Administration as Principal Activity by Rank(Bootstrap vs. Linearization)

  18. Comparisons of Variance EstimatesSE of Percent Full-time by Institution Type (Bootstrap vs. Linearization)

  19. Revised Variance Estimation Methodology(NSOPF:04) • Used methodology developed by Kaufman (2004) to create 200 bootstrap replicate weights. • Used 10, 11, …., 200 replicates to estimate relative standard error (RSE) of different statistics. • Repeated the above using 9 random permutations of replicates to estimate RSE of the same statistics. • Used Taylor linearization to estimate relative standard error of estimates via SUDAAN.

  20. RSE of Percent Asians by Number of Replicates

  21. RSE of Percent Asians by Number of Replicates(Taylor Linearization and Permutations of Replicates)

  22. RSE of Percent Age < 35 by Number of Replicates

  23. RSE of Percent Age < 35 by Number of Replicates(Taylor Linearization and Permutations of Replicates)

  24. RSE of Percent Citizen by Number of Replicates

  25. RSE of Percent Citizen by Number of Replicates(Taylor Linearization and Permutations of Replicates)

  26. RSE of Percent Full-time by Number of Replicates

  27. RSE of Percent Full-time by Number of Replicates(Taylor Linearization and Permutations of Replicates)

  28. RSE of Percent Master’s by Number of Replicates

  29. RSE of Percent Master’s by Number of Replicates(Taylor Linearization and Permutations of Replicates)

  30. RSE of Percent Teaching as Principal Activity by Number of Replicates

  31. RSE of Percent Teaching as Principal Activity by Number of Replicates(Taylor Linearization and Permutations of Replicates)

  32. RSE of Mean Income by Number of Replicates

  33. RSE of Mean Income by Number of Replicates(Taylor Linearization and Permutations of Replicates)

  34. RSE of Median Income by Number of Replicates

  35. RSE of Median Income by Number of Replicates(Taylor Linearization and Permutations of Replicates)

  36. RSE of Regression InterceptIncome = Hours + Race + Hours  Race

  37. RSE of Regression InterceptIncome = Hours + Race + Hours  Race(Taylor Linearization and Permutations of Replicates)

  38. RSE of Regression Slope (Hours)Income = Hours + Race + Hours  Race

  39. RSE of Regression Slope (Hours)Income = Hours + Race + Hours  Race(Taylor Linearization and Permutations of Replicates)

  40. RSE of Regression Slope (Race)Income = Hours + Race + Hours  Race

  41. RSE of Regression Slope (Race)Income = Hours + Race + Hours  Race(Taylor Linearization and Permutations of Replicates)

  42. RSE of Regression Slope (Hours  Race)Income = Hours + Race + Hours  Race

  43. RSE of Regression Slope (Hours  Race)Income = Hours + Race + Hours  Race(Taylor Linearization and Permutations of Replicates)

  44. Conclusions(Rough & Interim) • Complex statistics do require more replicates for stable variance estimation • It seems that: • 64 replicates might be inadequate • 200 replicates seem to be overkill • Somewhere between 100 to 200 replicates might be sufficient

More Related