ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL SURVEYS: A CROSS-NATIONAL COMPARISON

ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL SURVEYS: A CROSS-NATIONAL COMPARISON Joachim R. Frick and Markus M. Grabka DIW Berlin and IZA Bonn DIW Berlin Presentation at the IARIW 29th General Conference, Joensuu, Finland, 22 August 2006 Presented by: Professor Ian Plewis Centre for Longitudinal Studies Bedford Group for Lifecourse and Statistical Studies Institute of Education, University of London

Main features of the paper: 1. Item non-response for income (as for Hawkes and Plewis) 2. Panel data 3. Cross-national comparisons (SOEP, Germany; HILDA, Australia; BHPS, GB) 4. Imputation as used in the three studies.

Prevalence of income non-response: • HILDA <10% • 2. SOEP 14% • 3. BHPS 15% • These appear to be average prevalences – do they • change with the age of the panel? • Are there separate figures for ‘don’t know’ and • ‘refusals’?

Income non-response at time t predicts income non- response at time t+1 (supported by Hawkes and Plewis). Income non-response at time t predicts attrition at time t+1 (also supported by Hawkes and Plewis). More generally, the literature suggests that the more item non-response there is at time t in any longitudinal study, the more likely is attrition at time t+1. This suggests that it might be worth directing more resources at these ‘frail’ respondents.

Predictors of income non-response (combining waves using (?) probits or logits): A very strong effect of being self-employed: the self- employed are very much less likely to report their income (supported by Hawkes and Plewis), although less so in Germany than in GB and Australia. Is change in employment status associated with change in response behaviour?

Two kinds of imputation methods are used: 1. Predictive mean matching from a regression model in BHPS. 2. ‘Row and column’ imputation as set out by Little and Su (1989), in HILDA and SOEP. The authors argue, on the basis of previous research, that the second method is the better of the two.

Both are single imputation methods, presumably devised to fill in holes in public release datasets. However, most of the statistical literature now favours multiple imputation in order properly to represent the sampling variability induced by imputation.

The authors consider the effects of the imputation methods used for three issues: 1. Cross-sectional measures of inequality. 2. Longitudinal measures of income mobility. 3. Fixed effects wage regressions – are the fixed effects individuals or sweeps?

We collect panel data to measure and model change and so we should perhaps focus on the effects of imputation on change and on dynamic models.

The authors show that income mobility across quintiles is considerably higher when imputed cases are combined with observed or complete cases than it is when using only the observed cases. However, this difference emerges because there is considerable mobility for the imputed cases and some of this must be due to measurement error generated by the imputations.

A difficulty here is that the authors use cross- sectional imputation i.e. imputing an income value for each sweep whereas the real interest is in imputing mobility or change across sweeps.

Suppose we have a panel study with just two sweeps with income measured in quintiles at each sweep and with item non-response at each sweep. We have three sets of information: 1. Cases with measured income at each sweep, located in the internal 25 cells of a five by five contingency table. 2. The marginal distribution for cases measured at sweep one but not at sweep two.

3. The marginal distribution for cases measured at sweep two but not at sweep one.

Little and Rubin (2002, Ch. 13) show how to use the EM algorithm to estimate the contingency table for all cases, both fully and partially classified, and this approach (or a variant of it that accounts for the ordering of the quintiles) might be more appropriate for this particular question.

One of the interesting findings from the estimated wage equations is that the effect of being self- employed on wages is, for all three studies, more positive once the imputed cases are introduced into the analysis.

Concluding remarks 1. This is a very interesting and thought-provoking paper. 2. It shows that imputation for missing income responses can alter substantive conclusions about, for example, income mobility. 3. BUT the single imputation methods currently used by these panel studies are not those most favoured in the statistical literature.

Concluding remarks 4. AND imputing levels and taking differences might not be the best way of imputing for change. 5. ALSO income non-response is just one facet of missing data and ideally needs to be considered along with unit non-response at the outset of the panel and attrition as the panel ages. 6. AS ALWAYS, SENSITIVITY ANALYSES ARE CRUCIAL.

ITEM-NON-RESPONSE AND IMPUTATION OF LABOR INCOME IN PANEL SURVEYS: A CROSS-NATIONAL COMPARISON