210 likes | 419 Views
Handling Missing Data on ALSPAC. Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008. Outline. What causes missing data? ‘Types’ of missing data Methods for missing data: quick overview ALSPAC ‘Blitz’ on non-respondents
E N D
Handling Missing Data on ALSPAC Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008
Outline • What causes missing data? • ‘Types’ of missing data • Methods for missing data: quick overview • ALSPAC ‘Blitz’ on non-respondents • Investigating MNAR data in ALSPAC
Example ‘ALSPAC’ analysis • At age 11 • Outcome: Mood (ordinal, 3 categories) • Depressive symptoms, maternally rated • Main exposure: Physical activity (score) • Measured on actigraph, 3 days • Adjustment: • BMI (score) • Sex, Age at screening • Ordinal logistic regression
Missing Value (MV) pattern1 1All MV patterns < 200 cases ignored
Incomplete questionnaire Refusal Follow-up Fail to attend clinic Parent characteristics Parent & child characteristics What causes missing data? Interviewer effectiveness Incentive for participant Loyalty Letter Telephone calls Interviewer visits Non-contact
Result of processes leading to: • Refusal to answer questions (item) • Refusal to participate (unit) • No contact (unit) • Longitudinal-specific: attrition & drop-out • Non-response mechanism(s) - NRM
Rubin’s definitions1 • Missing Completely At Random (MCAR) • Independent of observed variables • Missing At Random (MAR) • NRM depends only on observed variables • Missing Not At Random (MNAR) • NRM depends on missing variables too 1Little & Rubin (2002) Statistical Analysis with Missing Data
Directed Acyclic Graph (DAG) • Rindependent data MCAR Y X R C
MAR data • R indirectly related to Y through X and C Y X R C
Methods for MAR data • Complete cases analysis/Listwise deletion • Weighting • Weighting classes, post-stratification • (Single) imputation methods • e.g. regression, hot-deck/nearest-neighbour • Multiple imputation methods • e.g. Norm, MICE • Semiparametric estimators
Imputation in practice: pitfalls1 • Omitting the outcome • Imputing non-normal variables • MAR completely implausible • Convergence of iterative procedures 1Sterne et al. (2008) British Medical Journal
Complex methods • Analysis model • e.g. Ordinal logistic regression • Imputation model: Missing given Observed • ALL assume MAR data
MAR data in reality ? • Unknown factors drive non-response Y X R C • …correlated with model predictors • …but not with Y
Why is this important? • Weakness of MAR: How do we know? • Central problem: missing data is missing! • MAR is a “leap of faith”
MNAR data ? • Unknowns directly correlated with Y? Y X R C
Physical activity example ? • NRM is mother-driven (child age 11) • Child must wear actigraph for 3 days • Mother must assess her child’s mood Mood Phys Act R BMI, Sex, Age
ALSPAC ‘Blitz’ • Co-ordinated by Family Liaison Unit • 4 tranches: Nov 2007-May 2008 • Target 5000 teenagers not in last 2 waves • Mini-clinic for difficult to persuade
Proposed analysis • MAR is context dependent • Risky behaviours (Glyn Lewis, et al) • Outcomes: Cannabis use, sexual practices, etc • Risk factors: mental health, sensation seeking, etc • Basic analysis: • Compare follow-up with main sample • Still differences after adjustment?
Unit non-response • 100% follow-up rate unlikely! • Directly model NRM • Continuum of non-response • Hard to contact less like main sample • Weighting scheme (Alho 1990; Wood et al. 2006) • Lower bound for MNAR bias
Item non-response • Parallel qualitative post • Items: questions on risky behaviours • What mechanisms drive non-response? • Test hypotheses from this project