1 / 21

Variance Estimation When Donor Imputation is Used to Fill in Missing Values

Variance Estimation When Donor Imputation is Used to Fill in Missing Values. Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International Conference on Establishment Surveys Montréal, June 18-21, 2007. Overview. Context Donor imputation Variance estimation

lita
Download Presentation

Variance Estimation When Donor Imputation is Used to Fill in Missing Values

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International Conference on Establishment Surveys Montréal, June 18-21, 2007

  2. Overview • Context • Donor imputation • Variance estimation • Simulation study • Conclusion

  3. Context • Population parameter to be estimated : • Domain total: • Estimator in the case of full response: • Calibration estimator • Horvitz-Thompson estimator

  4. Donor Imputation • Imputed estimator : • With donor imputation, the imputed value is • A variety of methods can be considered in order to find a donor l(k) for the recipient k with

  5. Donor Imputation • Two simple examples: • Random Hot-Deck Imputation Within Classes • Nearest-neighbour imputation • Practical considerations that add some complexity to the imputation process: • Post-imputation edit rules • hierarchical imputation classes

  6. Imputation Model • Most imputation methods can be justified by an imputation model: • The donor imputed estimator is assumed to be approximately unbiased under the model:

  7. CurrentVariance Estimation Methods • Assuming negligible sampling fractions • Chen and Shao (2000, JOS) for NN imputation • Resampling methods • Our method is closely related to: • Rancourt, Särndal and Lee (1994, proc. SRMS): Assumes a ratio model holds • Brick, Kalton and Kim (2004, SM): Condition on the selected donors

  8. Imputation Model Approach • Variance decomposition of Särndal (1992, SM): • For any donor imputation method, we have:

  9. Estimation of the nonresponse variance • The estimation of the nonresponse variance is achieved by estimating • Noting that the nonresponse error is: • Then, the nonresponse variance estimator is:

  10. Estimation of the mixed component • Similarly, the estimation of the mixed component is achieved by estimating • The mixed component estimator is: • This component can be either positive or negative and may not always be negligible

  11. Estimation of the sampling variance • Let be the full response variance est. • The strategy consists of • Estimating • Replace by their estimates the unknown • This leads to the sampling variance estimator:

  12. Estimation of the sampling variance • This strategy is essentially equivalent to • Randomly imputing the missing values using the imputation model • Computing the full response sampling variance estimator by treating these imputed values as true values • Repeating this process a large number of times and taking the average of the sampling variance estimates • Similar to multiple imputation sampling variance estimator

  13. Simulation study • Generated a population of size 1000 • Two y-variables: • LIN: Linear relationship between y and x • NLIN: Nonlinear relationship between y and x • Two different sample sizes: • Small sampling fraction: n=50 • Large sampling fraction: n=500 • Response probability depends on x with an average of 0.5

  14. Simulation study • Imputation: Nearest-Neighbour imputation using x as the matching variable • Estimation of • LIN: Linear model in perfect agreement with the LIN y-variable • NPAR: Nonparametric estimation using the procedure TPSPLINE of SAS

  15. Simulation study • Two objectives: • Compare the two ways of estimating • LIN and NPAR • Compare three nonparametric methods: • NPAR • NPAR_Naïve: NPAR with the sampling variance being estimated by the naïve sampling variance (Brick, Kalton and Kim, 2004) • CS : method of Chen and Shao (2000)

  16. Results: Large sampling fraction

  17. Results: Small sampling fraction

  18. Results: Large sampling fraction

  19. Conclusion • Nonparametric estimation of seems beneficial (robust) with Nearest-Neighbour imputation • Our proposed method is valid even for large sampling fractions • It seems to be slightly better to use our sampling variance estimator instead of the naïve sampling variance estimator

  20. Conclusion • Work done in the context of developing a variance estimation system (SEVANI) • Methodology implemented in the next version 2.0 of SEVANI • Estimation of : • Linear model • Nonparametric estimation

  21. Thanks - Merci Jean-François Beaumont Jean-Francois.Beaumont@statcan.ca Cynthia Bocci Cynthia.Bocci@statcan.ca

More Related