120 likes | 134 Views
Handling attrition and non-response in longitudinal data. Harvey Goldstein University of Bristol. What’s the problem?. Loss of individuals in a survey over time can lead to smaller numbers By aged 42 ~70% of original NCDS cohort gave information Non – random loss can lead to biases
E N D
Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol
What’s the problem? • Loss of individuals in a survey over time can lead to smaller numbers • By aged 42 ~70% of original NCDS cohort gave information • Non – random loss can lead to biases • Especially important when loss is associated with the variable values that are not subsequently available
Fixing the losses • Preventing loss is another topic. This is a look at how you might compensate for it. • A brief look at traditional weighting procedures • Use of multiple imputation (MI) – a simple introduction and its application to attrition • Combining MI with weighting
Traditional approach to handling attrition and missing data • Sets of weights • Sample design and any initial non-response provide basic weights for wave 1 • For several waves we can define ‘typical’ pathways and provide weights for each one. e.g. LSYPE may require 12 or more depending on selected ‘components’ • For item non-response ‘hot deck’ single imputation (weighted?) often used
Problems with weighting procedures • Inefficient – can only use the data available for each combination of variables analysed • Restrictive, since weights are only provided for chosen ‘pathways’ • Possibly inconsistent results through different weights for different analyses • Not very transparent for use
Problems with hot deck imputation • Not theoretically based • Selection of ‘matched’ cases may not always be possible – especially in multilevel data • Single imputation does not allow easy computation of standard errors
Multiple imputation – verybriefly • Consider the model of interest (MOI), assuming normal x, y • We turn this into a multivariate normal response model • and obtain residual estimates (from an MCMC chain) where x, or y are missing. Use these to ‘fill in’ and produce a complete data set. Do this (independently) n (e.g. = 20) times. Fit MOI to each data set and combine according to rules to get estimates and standard errors. • Note that other methods (listwise deletion, mean imputation, hot deck etc.) are either inefficient or biased.
Attrition treated as missing data • A missing record at a follow up gives an individual with many known and many missing values. • Even where no data at all are collected directly, ‘auxiliary’ data may be available (interviewer observations etc.) • Together with ‘item missingness’ we can use MI to ‘fill in’ all the missing data.
Distributional issues • Existing methods assume normality. We would like to handle multilevel data and mixtures of normal and discrete variables with missing data. • ESRC REALCOM project developed MCMC algorithm and software for these cases • REALCOM-IMPUTE links REALCOM with MLwiN and can handle level 2 and discrete variables. • It works by transforming discrete variables to normality using a ‘latent variable’ model so that all response variables have a joint multivariate normal distribution and then applies MI theory.
Putting weights into MI • Consider a 2-level model: • Write level 2 weights as • Level 1 weights for j-th level 2 unit as Final level 1 weights These weights can be used for MOI and also for imputation. This involves an MCMC estimation using weighted likelihoods, where variances are inversely proportional to weights.
References • Multilevel models with multivariate mixed response types (2009) Goldstein, H, Carpenter, J., Kenward, M., Levin, K. Statistical Modelling (to appear) - Gives methodological background • Handling attrition and non-response in longitudinal data. Goldstein. H. International Journal of longitudinal and Life Course studies. April 2009, .http://www.journal.longviewuk.com/index.php/llcs- Discusses issues for longitudinal studies in detail • Web site for software: • http://www.cmm.bristol.ac.uk/research/Realcom