130 likes | 390 Views
Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP) . María García , Chandra Erdman, and Ben Klemens. Outline. Background on the Survey of Income and Program Participation (SIPP) Methods for missing data imputation - Randomized Hot deck
E N D
Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP) MaríaGarcía, Chandra Erdman, and Ben Klemens
Outline • Background on the Survey of Income and Program Participation (SIPP) • Methods for missing data imputation - Randomized Hot deck - SRMI • Simulation study • Evaluation • Concluding remarks
Background on the SIPP • Longitudinal survey, data collected in panels with interviews at set frequencies (2- 4 years) • Demographic characteristics, assets, liabilities, labor force participation, earnings, etc. • Provide comprehensive information about income and program participation • Evaluate federal, state, and local programs and provide measures of economic well-being
Background on the SIPP • Hot deck for most missing data imputation • Recent major redesign • Research ways to improve data processing. • Explore alternative imputation methods • Focus on missing monthly job-level earnings (twelve variables) • Sequential Regression Multivariate Imputation (SRMI, Raghunathan et al., 2001)
Sequential Regression Multivariate Imputation (SRMI) • Data matrix • Each column • Imputations are based on univariatedistributions • Instead of drawing from a joint distribution for variables, draw times from the univariate conditional distribution for each variable,
SRMI Impute missing values sequentially conditioning on observed and imputed variables • Regression model • Impute sequentially for each variable: 1. Draw from ) 2. Draw from |; )
Simulation Study • SRMI -R package mi (Su et al., 2011) - Job-level earnings indicator – logistic regression - Monthly earnings indicator imputed to positive – impute corresponding missing earnings using SRMI • Hot deck - TEA’s randomized hot deck (Klemens, 2012) • Multiple imputation
Simulation Study • Simulation data - Complete 2004 SIPP panel data – “true” - Randomly select multiple sets of 10% of observations for which the job-level earnings are to be set to missing (100 repetitions) • Explanatory variables - Age, sex, race, education, occupation, industry, firm size, job-type, hours, lead, lag, etc.
Average Difference in RMSE (SRMI – Hot Deck) No hay nada
Between-Imputation, Within-Imputation, and Total Variance of Mean Monthly Earnings for Some Months No hay nada
RMSE of Mean Monthly Earnings No hay nada
Concluding Remarks • Results show the model-based approach to imputation is a feasible alternative to hot deck for imputing missing values in the SIPP and should be further explore. • Model can incorporate more information than the hot-deck without depleting the donor pool. • Possibility to use any available auxiliary information. (e.g. administrative data) • Set up the model in a multiple imputation environment so we can estimate variances. • Disadvantage of using package mi for SRMI: computationally intensive
Thank you! maria.m.garcia@census.gov