Testing New Imputation Methods for Earnings collected by the Survey of Income and Program Participation SIPP

1. Testing New Imputation Methods for Earnings collected by the Survey of Income and Program Participation (SIPP) Presentation to the ASA/SRM SIPP Working Group November 17, 2009 Gary Benedetto and Martha Stinson

2. 2 Census Imputation Research Plan Few changes made to actual production imputation methods in many years With redesign of the SIPP, this is an opportunity to consider what changes might be made Goal of this paper: test a new method on an important income variable: job-level earnings

3. 3 Proposed Improvements Model-based approach Use administrative data to mitigate problems caused when survey data are not �missing at random� Multiple imputation

4. 4 Model-based Approach: Advantages Current method (hot-deck) depends on a donor matrix with reasonable cell sizes Problem: large number of stratification variables produce cells with no donors Solution: cold-deck values are used Result: imputations take account of fewer job or person characteristics

5. 5 Model-based Approach: Implementation We employed an imputation method that used linear regressions to impute missing values Stratified sample by set of characteristics, ran regressions for each sub-group that was large enough Sub-groups that were too small were combined Variables that were dropped from stratification list were added as explanatory variables in the regression

6. 6 Data not �Missing at Random�:The Problem All imputation methods that use survey data exclusively are built on the assumption that the relationships between survey variables are the same for everyone, regardless of missing data Assume relationship between X1, X2, X3 and Y can be estimated Assume if Y is missing, X1, X2, and X3 are good predictors However if the relationship between Y and X1, X2, X3 is different when Y is missing, the imputation will be flawed

7. 7 Data not �Missing at Random�:Lessening the Impact Information from an outside source can help account for unobservable (in the survey) differences between people We used administrative earnings data in the imputation model for this purpose

8. 8 Multiple Imputation: Motivation Since the 1970s, Donald Rubin has argued that imputation adds variability to user-calculated statistics Traditional methods impute only once User has no way to account for variability Multiple imputation allows the user to calculate variance that includes a piece due to imputation

9. 9 Multiple Imputation: Our Approach We impute earnings 4 times by estimating the Posterior Predictive Distribution and taking draws from this distribution This creates 4 separate data sets, or implicates For people with non-imputed values, earnings are identical across implicates For people with imputed values, earnings vary across implicates Use these 4 data sets in our analysis that follows Combine results from the 4 implicates using simple formulae published by Rubin

10. 10 Project Specifics: SIPP data SIPP collects information on 2 jobs per wave Earnings in the public-use data are given for each job in each month Imputation flags indicate when a hot-deck imputation was performed We create a person-job level data set with person characteristics, job characteristics, and reported earnings but no original imputations We merge on data from the Detailed Earnings Record (DER) extract from the Social Security Administration�s Master Earnings File DER data are earnings reported on W-2 forms: gross, uncapped, and employer specific

11. 11 Project Specifics: Sample We impute earnings for people who: matched to the administrative data were 15+ years old at the time of the job We impute earnings for jobs that: were not unpaid family jobs were not originally type Z imputation We impute earnings for months when: the respondent was actually interviewed (i.e. we don�t do missing wave imputation) the job was on-going Summary: We impute missing earnings reports during the time period of a reported job

12. 12 Project Specifics: Process Step 1: Use Bayesian Bootstrap to impute whether missing month had positive or zero earnings Find a donor based on stratification variables but take account of sample uncertainty Chose this because this is relatively rare event and sample size wasn�t big enough to do model-based imputation

13. 13 Project Specifics: Process (cont) If the respondent was imputed to have positive earnings, use linear regression model to impute earnings Imputed monthly earnings is a random variable Distribution has two sources of variation: variation in error term in regression model variation in estimated parameters: ?�s and ?2 Take draws from distributions of ?�s and ?2 and error term Use draws to calculate predicted value based on observed X variables Predicted value is new impute Take four separate draws to create four implicates

14. 14 Project Specifics: Modeling We use the following variables to stratify the sample (byvars): Age categories, number of jobs in SIPP by number of jobs in DER, positive earnings from DER, month in SIPP sample, positive earnings in SIPP in prior and post month We use the following variables as control variables (xvars) in the linear regressions age, male, race, education, leads and lags of positive earnings indicators from DER and SIPP, leads and lags of earnings from DER and SIPP

15. 15 Results: DistributionsJob-level earnings for January 2004

16. 16 Results: Distributions Job-level earnings for January 2004, by imputation group

17. 17 Results: DistributionsPerson-level earnings for 2004

18. 18 Results: Sub-Sample Means

19. 19 Results: Earnings Volatility

20. 20 Results: Correlations

21. 21 Conclusion Research phase of new imputation methods takes time Next steps: try imputing for those without admin. data iterate several times � may smooth out volatility try with new EHC instrument

Testing New Imputation Methods for Earnings collected by the Survey of Income and Program Participation SIPP

Testing New Imputation Methods for Earnings collected by the Survey of Income and Program Participation SIPP

Presentation Transcript

Retained Earnings, Treasury Stock, and the Income Statement

The New Federal Financial Report and Program Income

INCOME AND CHANGES IN RETAINED EARNINGS

Survey nonresponse and the distribution of income

New Methods for Testing and Extraction of Soluble Salts

Family Community Participation The Results of a New Survey and Implications for Practice

Annual Survey of Hours and Earnings

Corporations: Retained Earnings and the Income Statement

Research on Improvements to Current SIPP Imputation Methods

Update: Water Utilities Low-Income Program Participation

SIPP IMPUTATION SCHEME AND DISCUSSION ITEMS

Income and Changes in Retained Earnings

Quiet And Collected: Methods For Any person Handled By Panic

Data Imputation Methods and Technologies

New Methods for Testing and Extraction of Soluble Salts

Update: Water Utilities Low-Income Program Participation

INCOME AND CHANGES IN RETAINED EARNINGS

INCOME AND CHANGES IN RETAINED EARNINGS