1 / 31

Variance Estimation: Drawing Statistical Inferences from IPUMS-International Census Data

Variance Estimation: Drawing Statistical Inferences from IPUMS-International Census Data. Lara L. Cleveland IPUMS-International November 14, 2010 Havana, Cuba. Overview. Characteristics of Complex Samples Public Use Census Data IPUMS-International Census Samples

oya
Download Presentation

Variance Estimation: Drawing Statistical Inferences from IPUMS-International Census Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variance Estimation:Drawing Statistical Inferences from IPUMS-International Census Data Lara L. Cleveland IPUMS-International November 14, 2010 Havana, Cuba

  2. Overview • Characteristics of Complex Samples • Public Use Census Data • IPUMS-International Census Samples • Adjusting for Sampling Error • Assessment Strategy • Results • Recommendations and Future Work The IPUMS projects are funded by the National Science Foundation and the National Institutes of Health

  3. Overview • Characteristics of Complex Samples • Public Use Census Data • IPUMS-International Census Samples • Adjusting for Sampling Error • Assessment Strategy • Results • Recommendations and Future Work The IPUMS projects are funded by the National Science Foundation and the National Institutes of Health

  4. Public Use Census Microdata Publicly available census microdata often derive from complex samples. HOWEVER, social science researchers commonly apply methods designed for simple random samples.

  5. Public Use Census Data: Complex Samples underestimated standard errors underestimated standard errors overestimated standard errors • Clustering • By household (sample households rather than individuals) • Some samples geographically clustered • Can result in underestimated standard errors • Differential weighting • Oversample select populations • Also leads to underestimated standard errors • Stratification • Explicitly by person or household characteristics • Implicitly by geographical area • Can result in overestimated standard errors

  6. IPUMS-I Data Processing • Data received varies in quality, detail and extent of documentation • 3 Sampling Processes • Country-produced public use sample • Sample drawn by partner country to IPUMS-I specifications • Full count data sampled by IPUMS-I

  7. Samples Drawn by IPUMS-I • High density (typically 10% samples) • Household samples • Clustered by household • Systematic sample (every nth household) • Typically geographic sorting – presumed here • Implicit geographic stratification • Uniformly weighted (self-weighting)

  8. Variance Estimation: Data Quality Assessment/Improvement • As researchers and data users • Assess accuracy of the data • Calculate precise estimates • As data custodians and disseminators • Distribute quality data samples • Create tools to facilitate research

  9. Overview • Characteristics of Complex Samples • Public Use Census Data • IPUMS-International Census Samples • Adjusting for Sampling Error • Assessment Strategy • Results • Recommendations and Future Work The IPUMS projects are funded by the National Science Foundation and the National Institutes of Health

  10. Assessment Strategy Create or specify variables to account for sampling error for use in current statistical packages • Cluster (Household identifier) • Strata (Pseudo-strata) Compare estimates from full count data to estimates from sample data using 3 methods: • Subsample Replicate • Taylor Series Linearization • Simple Random Sample (SRS)

  11. Assessing Accuracy: Full Count Data “True” or “Gold Standard” Estimates • Full count census data • Simulate sample design100 – 10% replicates • Estimate the mean and standard error of the mean for several household and person-level variables Recent census data from 4 countries: Bolivia 2001, Ghana 2000, Mongolia 2000, Rwanda 2002 Full count, clean, well formatted data requiring no special corrections

  12. Assessing Accuracy: Sample Data Sub-sample Replicate • Mimic sample design – 100 10% subsamples • Labor and resource heavy Taylor Series Linearization • Clustering: household identifier • Stratification: pseudo-strata variable • 10 adjacent households within geographic unit • Incomplete strata pooled with preceding strata • Available in most statistical packages Simple Random Sample as control/comparison

  13. Overview • Characteristics of Complex Samples • Public Use Census Data • IPUMS-International Census Samples • Adjusting for Sampling Error • Assessment Strategy • Results • Recommendations and Future Work The IPUMS projects are funded by the National Science Foundation and the National Institutes of Health

  14. Table Format From Full Count Data – “Gold Standard” Full Count Mean S.E. of mean from Full Count Replicate From Sample Data: Ratios of Standard Errors SE(Sub-sample Replicate) / SE(Full Count Replicate) SE(Sample Taylor Series) / SE(Full Count Replicate) SE(SRS) / SE(Full Count Replicate) Ratios ~1.0: Sample estimate resembles “true” value >1.0: Sample estimate overestimates SE <1.0: Sample estimate underestimates SE

  15. Table 1. Rwanda 2002: Comparing Complete Count and Sample Standard Error Estimates ~1.0 ~1.0 ~1.0 ~1.0

  16. Table 2. Mongolia 2000: Comparing Complete Count and Sample Standard Error Estimates ~1.0 ~1.0 ~1.0 ~1.0

  17. Table 3. Bolivia 2001: Comparing Complete Count and Sample Standard Error Estimates ~1.0 ~1.0 ? ? ? ~1.0 ~1.0

  18. Table 4. Bolivia 2001: Decomposition of Clustering and Stratification Effects on Taylor Series Standard Error Estimates from the 10% Sample ?

  19. Table 5. Ghana 2000: Comparing Complete Count and Sample Standard Error Estimates ? ? ? ? ? ~1.0 ~1.0

  20. Overview • Characteristics of Complex Samples • Public Use Census Data • IPUMS-International Census Samples • Adjusting for Sampling Error • Assessment Strategy • Results • Recommendations and Future Work The IPUMS projects are funded by the National Science Foundation and the National Institutes of Health

  21. Recommendations: Clustering • Many research projects need not worry • Subpopulations that rely on only one person per HH (e.g., fertility, aging, some work-related studies) • Design research to select a single person from the household • Use household identifier in stat packages • Future: Variable that includes identifier for geographic clustering as needed

  22. Recommendations: Stratification • Most researchers need no modification • Stratification increases precision • Estimates are conservative • If concerned, use pseudo-strata • Investigations of weak relationships • For some sub-population studies • Future: Pseudo-strata variable to specify information about implicit stratification

  23. Recommendations: Web Guidance

  24. Recommendations: Web Guidance

  25. Current and future work Determine optimal pseudo-strata size Investigate Ghana data distribution Seek more geographic detail in the data Compare estimates to published population counts Additional data quality tests

  26. Thank you! Questions? Lara L. Cleveland IPUMS International Minnesota Population Center University of Minnesota 50 Willey Hall 225 – 19th Avenue South Minneapolis, MN 55455

  27. Table 1. Rwanda 2002: Comparing Complete Count and Sample Standard Error Estimates

  28. Table 2. Mongolia 2000: Comparing Complete Count and Sample Standard Error Estimates

  29. Table 3. Bolivia 2001: Comparing Complete Count and Sample Standard Error Estimates

  30. Table 4. Bolivia 2001: Decomposition of Clustering and Stratification Effects on Taylor Series Standard Error Estimates from the 10% Sample

  31. Table 5. Ghana 2000: Comparing Complete Count and Sample Standard Error Estimates

More Related