1 / 20

Handling Missing Data on ALSPAC

Handling Missing Data on ALSPAC. Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008. Outline. What causes missing data? ‘Types’ of missing data Methods for missing data: quick overview ALSPAC ‘Blitz’ on non-respondents

amara
Download Presentation

Handling Missing Data on ALSPAC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Handling Missing Data on ALSPAC Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008

  2. Outline • What causes missing data? • ‘Types’ of missing data • Methods for missing data: quick overview • ALSPAC ‘Blitz’ on non-respondents • Investigating MNAR data in ALSPAC

  3. Example ‘ALSPAC’ analysis • At age 11 • Outcome: Mood (ordinal, 3 categories) • Depressive symptoms, maternally rated • Main exposure: Physical activity (score) • Measured on actigraph, 3 days • Adjustment: • BMI (score) • Sex, Age at screening • Ordinal logistic regression

  4. Missing Value (MV) pattern1 1All MV patterns < 200 cases ignored

  5. Incomplete questionnaire Refusal Follow-up Fail to attend clinic Parent characteristics Parent & child characteristics What causes missing data? Interviewer effectiveness Incentive for participant Loyalty Letter Telephone calls Interviewer visits Non-contact

  6. Result of processes leading to: • Refusal to answer questions (item) • Refusal to participate (unit) • No contact (unit) • Longitudinal-specific: attrition & drop-out • Non-response mechanism(s) - NRM

  7. Rubin’s definitions1 • Missing Completely At Random (MCAR) • Independent of observed variables • Missing At Random (MAR) • NRM depends only on observed variables • Missing Not At Random (MNAR) • NRM depends on missing variables too 1Little & Rubin (2002) Statistical Analysis with Missing Data

  8. Directed Acyclic Graph (DAG) • Rindependent  data MCAR Y X R C

  9. MAR data • R indirectly related to Y through X and C Y X R C

  10. Methods for MAR data • Complete cases analysis/Listwise deletion • Weighting • Weighting classes, post-stratification • (Single) imputation methods • e.g. regression, hot-deck/nearest-neighbour • Multiple imputation methods • e.g. Norm, MICE • Semiparametric estimators

  11. Imputation in practice: pitfalls1 • Omitting the outcome • Imputing non-normal variables • MAR completely implausible • Convergence of iterative procedures 1Sterne et al. (2008) British Medical Journal

  12. Complex methods • Analysis model • e.g. Ordinal logistic regression • Imputation model: Missing given Observed • ALL assume MAR data

  13. MAR data in reality ? • Unknown factors drive non-response Y X R C • …correlated with model predictors • …but not with Y

  14. Why is this important? • Weakness of MAR: How do we know? • Central problem: missing data is missing! • MAR is a “leap of faith”

  15. MNAR data ? • Unknowns directly correlated with Y? Y X R C

  16. Physical activity example ? • NRM is mother-driven (child age 11) • Child must wear actigraph for 3 days • Mother must assess her child’s mood Mood Phys Act R BMI, Sex, Age

  17. ALSPAC ‘Blitz’ • Co-ordinated by Family Liaison Unit • 4 tranches: Nov 2007-May 2008 • Target 5000 teenagers not in last 2 waves • Mini-clinic for difficult to persuade

  18. Proposed analysis • MAR is context dependent • Risky behaviours (Glyn Lewis, et al) • Outcomes: Cannabis use, sexual practices, etc • Risk factors: mental health, sensation seeking, etc • Basic analysis: • Compare follow-up with main sample • Still differences after adjustment?

  19. Unit non-response • 100% follow-up rate unlikely! • Directly model NRM • Continuum of non-response • Hard to contact less like main sample • Weighting scheme (Alho 1990; Wood et al. 2006) • Lower bound for MNAR bias

  20. Item non-response • Parallel qualitative post • Items: questions on risky behaviours • What mechanisms drive non-response? • Test hypotheses from this project

More Related