Multiple Imputation: A Powerful Tool with Limitations

Multiple imputation: the universal panacea, and its limitations Ian White <ian.white@ucl.ac.uk> MRC Clinical Trials Unit at UCL RSS: “Multiple imputation 40 years on, where are we now?” 4 December 2018

40 years of multiple imputation • MI is a complex technique that was proposed 40 years ago and has been widely used in approximately the last 10 years • How long does it take for a new statistical technique to become widely used? • “Biometrika to BMJ” • What methods that are being dreamed up today will be in use in 2058? • Is the pace of change increasing?

Plan • Introduction to multiple imputation (MI) • A personal history of MI MI as the solution to all missing data problems • Alternatives to MI in randomised trials • Difficulties of MI in multilevel data • Difficulties of MI in missing-not-at-random data • Conclusions most a

1. Introduction to multiple imputation

Why missing data are a problem • Lose power • Analysis needs an assumption about why data are missing • potentially biased estimates if we make the wrong assumption • Wrong analysis can also lead to • biased standard errors • inefficient estimates • Right analysis is typically complex Multiple imputation is the most flexible solution to missing data

Multiple imputation: the idea Multiple imputed data sets capture all the uncertainty attributable to the missing data. Analyse each one separately (easy). Combine results using Rubin’s rules  valid estimates and standard errors

Multiple imputation: the assumption • Most implementations of MI assume the data are missing at random • “missing values do not differ systematically from observed values of the same variable, conditional on observed values of other variables” • We’ll talk later about missing not at random (MNAR, = not MAR) • Question: why didn’t we celebrate 40 years of MAR in 2016? • Rubin DB. Inference and missing data. Biometrika. 1976;63:581–92.

2. A personal history of MI

Multiple imputation: origins • 1978. Rubin DB. Multiple imputations in sample surveys: a phenomenological Bayesian approach to nonresponse. Proceedings of the Survey Research Methods Section 20–28. • 1987. Rubin DB. Multiple Imputation for Nonresponse in Surveys. • Rubin’s rules • 1997. Schafer JL. Analysis of incomplete multivariate data. • Multivariate normal and other model-based imputation • argued this is OK even for dummy variables Me: MSc project, late 1990’s, proposed by Bob Carpenter

Chained equations: origins • 1999. van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine 18: 681–694. • “Regression switching”, later “Multivariate Imputation by Chained Equations” (MICE) • Idea of MICE: • each incomplete variable has its own imputation model (i.e. set of other variables used to predict it) • can be linear / logistic / ordered logistic regression, tailored to variable type • start by filling in missing values somehow • update imputations one variable at a time • repeat until initial values are forgotten • e.g. for 10 “cycles”

Stata code for doing MI • I decided (2003-ish) to write my own Stata code • Then heard Patrick Royston (MRC CTU) was writing his • We exchanged code and Patrick’s was clearly better • he incorporated just one idea of mine (see next): code to “draw” imputed regression coefficients • Patrick’s imputation code came in 2 parts: • uvis (univariate imputation sampling): impute a single variable, using other complete variables • took us a lot of work • mvis (later ice): impute multiple incomplete variables, using chained equations • carefully written housekeeping, repeatedly calling uvis • And to analyse multiply imputed data: micombine

ice: early debates How to implement “proper” multiple imputation? • To be valid, multiple imputation must be “proper” by allowing for 2 sources of uncertainty: • “perturbing” the parameters of the imputation model • “draw” method • “bootstrap” method • in the error of the imputed data • “draw” method • “match” method

Perturbing the slope: “draw” method Observed data Suppose we need to impute some missing values of y using observed values of x 30 fitted slope = 0.997 (se 0.178) 25 y 20 draw slope from N(0.997,0.1782)  1.142 15 10 5 10 15 20 x

Perturbing the slope: “bootstrap” method Observed data 30 30 Bootstrap sample of observed data 25 25 y y 20 20 Fitted slope is already perturbed 15 15 10 10 5 5 10 10 15 15 20 20 x x

Imputing values: “draw” method imputed value 30 normal distribution 25 y 20 15 10 5 10 15 20 x

Imputing values: “match” method imputed value 30 near(est) neighbour 25 y 20 15 10 5 10 15 20 x “Predictive mean matching”

Data formats • One beauty of ice was how it stored the imputed data

Publications on ice (1)

Increasing popularity, 2008-2013 Number of articles in the Lancet and New England Journal of Medicine that used MI: overall and by study type. Hayati Rezvan et al, BMCMRM 2015.

So how did multiple imputation hit the mainstream? • Sound theory • User-friendly software • Exchange of ideas with users leading to improvements in functionality and ease of use • e.g. conditional imputation (Royston, SJ 2009) • Methodological development to fix difficulties • perfect prediction (White et al, CSDA 2010) • imputing covariates for the Cox model (White & Royston, Stat Med 2009) • Dissemination • Re-writing of ice/mim in “official” Stata, 2011 • mi suite • Similar developments in other packages

MI: the universal panacea? • MI is probably a solution in almost all missing data problems • With missing exposures/confounders in observational studies, MI is probably the best solution • I’ll discuss randomised trials next • Increasingly, reviewers say “you have missing data  you must use MI” • But often it is not the best • And it still faces challenges

3. Alternatives to MI in randomised trials

3. Alternatives to MI in randomised trials • What’s special about trials: • missing data are usually in the outcome • if missing data are in covariates then randomisation can be exploited

Missing outcomes in trials • MI is a valid way to handle missing outcome data • Alternative is a regression model adjusting for baseline covariates (or mixed model, if repeated measures) • Often: MI estimates = regression model estimates + noise • Sullivan et al. Should multiple imputation be the method of choice for handling missing data in randomized trials? Statistical Methods in Medical Research, 2016, eprint.

Imputing missing outcomes

A subtlety: estimands (1) • Do we want to estimate • average treatment effect in all randomised? • average treatment effect in all with outcomes? • If treatment effect is heterogeneousthen they may be unequal • Regression model and simple MI both estimate 1 • MI by arm estimates 2 • e.g. • 50 men, 25 observed; 50 women, all observed • treatment effect +2 in men, +8 in women • regression / simple MI  average 25:50  +6 • MI by arm  average 50:50  +5

A subtlety: estimands (2)

Missing baselines in trials • Because of randomisation, it is adequate to use very simple methods • mean imputation • missing indicator method • These are at least as good as MI! • also in Sullivan paper • “Multiple imputation should not be seen as the only acceptable way to handle missing data in randomized trials” (Sullivan et al)

4. Difficulties of MI in multilevel data

4. Difficulties of MI in multilevel data • Multilevel = • multiple studies (e.g. in individual participant data meta-analysis) • or patients within wards within hospitals etc. • Why not just impute by cluster? • sometimes works! • 2 main problems • small clusters • systematically missing data (next)

Sporadically missing data

Systematically missing data

PROG-IMT study • Explore association between carotid intimal media thickness (IMT) and risk of vascular death • 8 cohort studies comprising 27557 patients • Eleven potential confounders to be included in the final analysis model • hemoglobinemia (Hb), serum creatinine (Creat), total cholesterol (Chol), body mass index (BMI), systolic blood pressure (SBP), arterial hypertension (AHT), smoking status (Smoke), diabetes mellitus (Diab), treatment for dyslipidemia (DTreat), Age, Sex • Ideal analysis: Cox model • for th individual in th study • = IMT, = confounders • , - “log HR for IMT”

PROG-IMT: the problem

PROG-IMT: our approach • Multilevel imputation model for a single variable • Use this in a MICE routine • Resche-Rigon et al. Multiple imputation for handling systematically missing confounders in meta-analysis of individual participant data. Stat Med 2013. The alternative is joint modelling as implemented by Quartagno & Carpenter (Stat Med 2016) in the R package jomo • categorical variables modelled via latent variable with Normal distribution

Multilevel imputation: our model • To impute incomplete (vector of outcomes for cluster ) within a MICE procedure • Frequentist approach • Standard linear mixed model: • Estimate parameters • Draw • Impute missing values in • e.g. for systematically missing , draw ;; • also works for sporadically missing values in

Multilevel imputation: 1 stage or 2 stage? • Standard linear mixed model: • Two ways to estimate parameters : • Fit model in 1 stage by REML • easier to constrain • Fit model in 2 stages • set and allow different • fit model to each study • combine the and by multivariate meta-analysis • REML or (faster) multivariate method of moments (MM): Jackson et al, Biom J 2013;55:231–245. • Hence draw

Multilevel MICE • MICE = multivariate imputation by chained equations • impute iteratively using conditional models • With single-level data, sensible joint models yield simple conditional models • e.g. multivariate normal conditional models are simple linear regressions • With multi-level data, this is less so • e.g. we show: simple two-level multivariate normal conditional models are multilevel linear regressions including cluster mean and level 2 heterogeneity (in general, & unless cluster sizes are equal) • Nevertheless we include our multilevel imputation procedure in a MICE algorithm

PROG-IMT: results Resche-Rigon M et al. Multiple imputation for handling systematically missing confounders in meta-analysis of individual participant data. Stat Med 2013.

Method comparison The method of Quartagno and Carpenter (2016a) appears generally accurate for binary variables, the method of Resche-Rigon and White (2016) with large clusters, and the approach of Jolani et al. (2015) with small clusters.

5. Difficulties of MI in missing-not-at-random data

5. Difficulties of MI in missing-not-at-random data • MAR often seems implausible e.g. when variables include • self-reported physical or mental health • self-reported behaviour • Sensitivity analysis is recommended if we doubt MAR • preferably by varying one or more sensitivity parameters that govern departure from MAR • We need • a MNAR model • sensitivity parameters that are easily interpreted • plausible ranges of these parameters • though this can be left to the reader in a “tipping point” analysis

NARMICE / NARFCS method Changes MICE: • Add missingness indicators to the procedure • Modify imputation models e.g. for etc. • Elicit values of sensitivity parameters (and , etc.) • Fit model to individuals with observed , using observed and currently imputed values of • this estimates all parameters except • Use fitted model plus term to impute missing values of Leacy, F. P. (2016). Multiple imputation under missing not at random assumptions via fully conditional specification. University of Cambridge, PhD thesis. BUT sensitivity parameter is very hard to interpret! It’s the expected difference between missing and observed values, conditional on ALL OTHER VARIABLES – including future outcomes

Eliciting the sensitivity parameters • We suggest that users are much happier to specify the sensitivity parameters in a marginal model • e.g. the difference between missing and observed values of an outcome, conditional only on age and sex • e.g. parameter of • we call this a “marginal sensitivity parameter” (MSP) • cf the “conditional sensitivity parameter” (CSP) used by NARFCS • MSPs are typically smaller than CSPs • We can link CSPs to MSPs Tompsett DM et al. On the use of the not-at-random fully conditional specification (NARFCS) procedure in practice. Stat Med 2018

“Please use the sliders below to show your beliefs about the probability of quitting for NON RESPONDERS, receiving INTERVENTION, at 8 weeks. Consider all the reasons why a NON RESPONDER would not return the questionnaire.” Elicitation in smoking cessation trial: iQuit in practice Step 1: Use this slider to set the position of YOUR best estimate of the most likely (modal) probability of quitting for a NON RESPONDER. Step 2: Use this slider to show how certain you are about this value.

6. Conclusions / overview

Some other unsolved problems Imputation: • How to do MI in “big data” Using multiply imputed data: • How to combine MI with other complex procedures • bootstrap • cross-validation

Key acknowledgements • MI everything – Patrick Royston, Angela Wood, Tim Morris • Missing data everything – James Carpenter, Kate Lee • MI in trials – Tom Sullivan, Kate Lee • Multilevel – Matthieu Resche-Rigon, Vincent Audigier • NARMICE/NARFCS – Finbarr Leacy, Daniel Tompsett, Margarita Moreno-Betancur

Conclusions • MI is brilliant • now that we have theory + robust methods + software • It’s usually the best available method in observational data • It’s often not the best method in randomised trials • Challenges remain, including • multilevel data • missing not at random Question • Drugs are only licensed when their indications and contra-indications are clear • Do we make the indications and contra-indications for a statistical method (like MI) clear enough?

Multiple Imputation: A Powerful Tool with Limitations

Multiple Imputation: A Powerful Tool with Limitations

Presentation Transcript