160 likes | 253 Views
Incorporating information about non-response into analyses of NCDS data. Ian Plewis Centre for Longitudinal Studies Bedford Group for Lifecourse and Statistical Studies Institute of Education, University of London 29 June 2006. www.ioe.ac.uk/bedfordgroup.
E N D
Incorporating information about non-response into analyses of NCDS data Ian Plewis Centre for Longitudinal Studies Bedford Group for Lifecourse and Statistical Studies Institute of Education, University of London 29 June 2006 www.ioe.ac.uk/bedfordgroup
NCDS longitudinal target and observed samples, sweeps 0 to 6
The substantive question of interest is whether and how well we can predict whether or not someone has any educational qualifications at age 23 (i.e. at sweep 4) from circumstances in early childhood (up to age 7 or sweep 1). The target sample at age 23 = 15885 - attrition 1837 - wave non-response 2001 - status not known 3
So, observed sample at age 23 = 12044 Item non-response 1765 So, analysis sample = 10279
SUBSTANTIVE MODEL Estimates from probit model for no qualifications at age 23 (n = 10279):
RESPONSE MODEL (1): Estimates from multivariate logistic model for response at age 23 (n = 12853):
RESPONSE MODEL (1): This model generates an estimate of the probability of a response at age 23 and we can use the inverse of this probability as a weight. The application of inverse probability weights assumes that data are ‘missing at random’ or that missingness is ignorable.
SUBSTANTIVE MODEL WEIGHTED FOR NON-RESPONSE FROM (1) N.B. n = 10279 for ‘no weights’; 9767 for ‘response weights’
RESPONSE MODEL (2): Estimates from multivariate logistic model for response at age 23 (n = 8072): From Hawkes and Plewis, JRSS(A), 2006, 3, 479-492.
SUBSTANTIVE MODEL WEIGHTED FOR NON-RESPONSE FROM (2) N.B. n = 10279 for ‘no weights’; 5996 for ‘response weights’
HECKMAN SELECTION MODEL Jointly modelling: (i) the probability of no qualifications at age 23 (probit) and (ii) the probability of being included in the sample at age 23 (probit). Need ‘instruments’ for the selection model – use ‘sex’ and ‘number of family moves, birth to 7’.
HECKMAN SELECTION MODEL Model allows for correlated residuals, i.e. for non- ignorable or informative non-response. Obtain ML estimates from ‘heckprob’ in STATA.
SUBSTANTIVE MODEL ALLOWING FOR SELECTION N.B. n = 10279 for ‘no weights’; 10150 for ‘selection’ Residual correlation = -0.58
CONCLUSIONS • Applying these corrections for non-response has little affect • on the substantive conclusions for this particular model. • Methodological issues: • Inverse probability weighting: • (1) Standard errors of the estimates should be adjusted to allow for the fact that the weights are themselves estimated. • (2) Might a better adjustment take account of the differences between the attrition cases and the wave non-respondents?
CONCLUSIONS (3) Missing weights – assume they are one (rather than zero)? Selection models: (1) Vulnerable to mis-specification (2) Depend on the validity of the instruments. Other approaches: (1) Imputation, especially multiple imputation.