María García , Chandra Erdman, and Ben Klemens

Multiple Imputation Methods for Imputing Earnings in the Survey of Income and Program Participation (SIPP) MaríaGarcía, Chandra Erdman, and Ben Klemens

Outline • Background on the Survey of Income and Program Participation (SIPP) • Methods for missing data imputation - Randomized Hot deck - SRMI • Simulation study • Evaluation • Concluding remarks

Background on the SIPP • Longitudinal survey, data collected in panels with interviews at set frequencies (2- 4 years) • Demographic characteristics, assets, liabilities, labor force participation, earnings, etc. • Provide comprehensive information about income and program participation • Evaluate federal, state, and local programs and provide measures of economic well-being

Background on the SIPP • Hot deck for most missing data imputation • Recent major redesign • Research ways to improve data processing. • Explore alternative imputation methods • Focus on missing monthly job-level earnings (twelve variables) • Sequential Regression Multivariate Imputation (SRMI, Raghunathan et al., 2001)

Sequential Regression Multivariate Imputation (SRMI) • Data matrix • Each column • Imputations are based on univariatedistributions • Instead of drawing from a joint distribution for variables, draw times from the univariate conditional distribution for each variable,

SRMI Impute missing values sequentially conditioning on observed and imputed variables • Regression model • Impute sequentially for each variable: 1. Draw from ) 2. Draw from |; )

Simulation Study • SRMI -R package mi (Su et al., 2011) - Job-level earnings indicator – logistic regression - Monthly earnings indicator imputed to positive – impute corresponding missing earnings using SRMI • Hot deck - TEA’s randomized hot deck (Klemens, 2012) • Multiple imputation

Simulation Study • Simulation data - Complete 2004 SIPP panel data – “true” - Randomly select multiple sets of 10% of observations for which the job-level earnings are to be set to missing (100 repetitions) • Explanatory variables - Age, sex, race, education, occupation, industry, firm size, job-type, hours, lead, lag, etc.

Average Difference in RMSE (SRMI – Hot Deck) No hay nada

Between-Imputation, Within-Imputation, and Total Variance of Mean Monthly Earnings for Some Months No hay nada

RMSE of Mean Monthly Earnings No hay nada

Concluding Remarks • Results show the model-based approach to imputation is a feasible alternative to hot deck for imputing missing values in the SIPP and should be further explore. • Model can incorporate more information than the hot-deck without depleting the donor pool. • Possibility to use any available auxiliary information. (e.g. administrative data) • Set up the model in a multiple imputation environment so we can estimate variances. • Disadvantage of using package mi for SRMI: computationally intensive

Thank you! maria.m.garcia@census.gov

María García , Chandra Erdman, and Ben Klemens

María García , Chandra Erdman, and Ben Klemens

Presentation Transcript

BALANCING SCHOOL and CLUB SPORTS

GEON Architecture: Systems Components Overview

Introduction to Genomics

CASE STUDY PRESENTATION ON NOVA FASHION GARMENTS PRIVATE LIMITED PRESENTATION BY : GROUP : 3 Mr. K. Chandrasekhar Mr. D

Wireless Networking in the TV Bands

Java Servlets

Using Health and Family Data from the National Center for Health Statistics to Study Health Disparities

PRESENTATION FROM ANDHRA RADESH Sri.K.CHANDRA MOULI , I.A.S COMMISSIONER,AMR-APARD

Physical Design and FinFETs

Monogamy of Quantum Correlations

X-ray Diagnostics and Their Relationship to Magnetic Fields

Large- Treewidth Graph Decompositions and Applications

Polynomial Bounds for the Grid-Minor Theorem

2011 Fee-For-Service Agreement for DHHS Networks

GANDHI JAYANTI 2005

LISA Interferometry TeV II Meeting Madison, Wi August 30 th , 2006

Anderson localization: from theoretical aspects to applications

Challenge of Effective Public Management