210 likes | 485 Views
Variance Estimation When Donor Imputation is Used to Fill in Missing Values. Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International Conference on Establishment Surveys Montréal, June 18-21, 2007. Overview. Context Donor imputation Variance estimation
E N D
Variance Estimation When Donor Imputation is Used to Fill in Missing Values Jean-François Beaumont and Cynthia Bocci Statistics Canada Third International Conference on Establishment Surveys Montréal, June 18-21, 2007
Overview • Context • Donor imputation • Variance estimation • Simulation study • Conclusion
Context • Population parameter to be estimated : • Domain total: • Estimator in the case of full response: • Calibration estimator • Horvitz-Thompson estimator
Donor Imputation • Imputed estimator : • With donor imputation, the imputed value is • A variety of methods can be considered in order to find a donor l(k) for the recipient k with
Donor Imputation • Two simple examples: • Random Hot-Deck Imputation Within Classes • Nearest-neighbour imputation • Practical considerations that add some complexity to the imputation process: • Post-imputation edit rules • hierarchical imputation classes
Imputation Model • Most imputation methods can be justified by an imputation model: • The donor imputed estimator is assumed to be approximately unbiased under the model:
CurrentVariance Estimation Methods • Assuming negligible sampling fractions • Chen and Shao (2000, JOS) for NN imputation • Resampling methods • Our method is closely related to: • Rancourt, Särndal and Lee (1994, proc. SRMS): Assumes a ratio model holds • Brick, Kalton and Kim (2004, SM): Condition on the selected donors
Imputation Model Approach • Variance decomposition of Särndal (1992, SM): • For any donor imputation method, we have:
Estimation of the nonresponse variance • The estimation of the nonresponse variance is achieved by estimating • Noting that the nonresponse error is: • Then, the nonresponse variance estimator is:
Estimation of the mixed component • Similarly, the estimation of the mixed component is achieved by estimating • The mixed component estimator is: • This component can be either positive or negative and may not always be negligible
Estimation of the sampling variance • Let be the full response variance est. • The strategy consists of • Estimating • Replace by their estimates the unknown • This leads to the sampling variance estimator:
Estimation of the sampling variance • This strategy is essentially equivalent to • Randomly imputing the missing values using the imputation model • Computing the full response sampling variance estimator by treating these imputed values as true values • Repeating this process a large number of times and taking the average of the sampling variance estimates • Similar to multiple imputation sampling variance estimator
Simulation study • Generated a population of size 1000 • Two y-variables: • LIN: Linear relationship between y and x • NLIN: Nonlinear relationship between y and x • Two different sample sizes: • Small sampling fraction: n=50 • Large sampling fraction: n=500 • Response probability depends on x with an average of 0.5
Simulation study • Imputation: Nearest-Neighbour imputation using x as the matching variable • Estimation of • LIN: Linear model in perfect agreement with the LIN y-variable • NPAR: Nonparametric estimation using the procedure TPSPLINE of SAS
Simulation study • Two objectives: • Compare the two ways of estimating • LIN and NPAR • Compare three nonparametric methods: • NPAR • NPAR_Naïve: NPAR with the sampling variance being estimated by the naïve sampling variance (Brick, Kalton and Kim, 2004) • CS : method of Chen and Shao (2000)
Conclusion • Nonparametric estimation of seems beneficial (robust) with Nearest-Neighbour imputation • Our proposed method is valid even for large sampling fractions • It seems to be slightly better to use our sampling variance estimator instead of the naïve sampling variance estimator
Conclusion • Work done in the context of developing a variance estimation system (SEVANI) • Methodology implemented in the next version 2.0 of SEVANI • Estimation of : • Linear model • Nonparametric estimation
Thanks - Merci Jean-François Beaumont Jean-Francois.Beaumont@statcan.ca Cynthia Bocci Cynthia.Bocci@statcan.ca