1 / 13

María García , Chandra Erdman, and Ben Klemens

Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP) . María García , Chandra Erdman, and Ben Klemens. Outline. Background on the Survey of Income and Program Participation (SIPP) Methods for missing data imputation - Randomized Hot deck

viet
Download Presentation

María García , Chandra Erdman, and Ben Klemens

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP) MaríaGarcía, Chandra Erdman, and Ben Klemens

  2. Outline • Background on the Survey of Income and Program Participation (SIPP) • Methods for missing data imputation - Randomized Hot deck - SRMI • Simulation study • Evaluation • Concluding remarks

  3. Background on the SIPP • Longitudinal survey, data collected in panels with interviews at set frequencies (2- 4 years) • Demographic characteristics, assets, liabilities, labor force participation, earnings, etc. • Provide comprehensive information about income and program participation • Evaluate federal, state, and local programs and provide measures of economic well-being

  4. Background on the SIPP • Hot deck for most missing data imputation • Recent major redesign • Research ways to improve data processing. • Explore alternative imputation methods • Focus on missing monthly job-level earnings (twelve variables) • Sequential Regression Multivariate Imputation (SRMI, Raghunathan et al., 2001)

  5. Sequential Regression Multivariate Imputation (SRMI) • Data matrix • Each column • Imputations are based on univariatedistributions • Instead of drawing from a joint distribution for variables, draw times from the univariate conditional distribution for each variable,

  6. SRMI Impute missing values sequentially conditioning on observed and imputed variables • Regression model • Impute sequentially for each variable: 1. Draw from ) 2. Draw from |; )

  7. Simulation Study • SRMI -R package mi (Su et al., 2011) - Job-level earnings indicator – logistic regression - Monthly earnings indicator imputed to positive – impute corresponding missing earnings using SRMI • Hot deck - TEA’s randomized hot deck (Klemens, 2012) • Multiple imputation

  8. Simulation Study • Simulation data - Complete 2004 SIPP panel data – “true” - Randomly select multiple sets of 10% of observations for which the job-level earnings are to be set to missing (100 repetitions) • Explanatory variables - Age, sex, race, education, occupation, industry, firm size, job-type, hours, lead, lag, etc.

  9. Average Difference in RMSE (SRMI – Hot Deck) No hay nada

  10. Between-Imputation, Within-Imputation, and Total Variance of Mean Monthly Earnings for Some Months No hay nada

  11. RMSE of Mean Monthly Earnings No hay nada

  12. Concluding Remarks • Results show the model-based approach to imputation is a feasible alternative to hot deck for imputing missing values in the SIPP and should be further explore. • Model can incorporate more information than the hot-deck without depleting the donor pool. • Possibility to use any available auxiliary information. (e.g. administrative data) • Set up the model in a multiple imputation environment so we can estimate variances. • Disadvantage of using package mi for SRMI: computationally intensive

  13. Thank you! maria.m.garcia@census.gov

More Related