250 likes | 370 Views
Comparing two strategies for primary analysis of longitudinal trials with missing data. Peter Lane Research Statistics Unit. Acknowledgements. Missing data working group (2001 – ) Fiona Holland (Stats & Prog, Harlow) Byron Jones (RSU Harlow) Mike Kenward (LSHTM)
E N D
Comparing two strategies for primary analysis of longitudinal trials with missing data Peter Lane Research Statistics Unit
Acknowledgements • Missing data working group (2001– ) • Fiona Holland (Stats & Prog, Harlow) • Byron Jones (RSU Harlow) • Mike Kenward (LSHTM) • MNLM vs LOCF working group (2004– ) • Paul McSorley (Psychiatry area leader, RTP) • Suzanne Edwards & Wen-Jene Ko (S&P, RTP) • Kath Davy, Claire Blackburn, Andrea Machin (S&P, Harlow) FDA/Industry Workshop 23 September 2004
Contents • Outline of the problem • Methods of analysis • Six clinical trials in GSK • Simulation study • parameters estimated from trials • range of drop-out mechanisms • comparison of two methods of analysis • Conclusions FDA/Industry Workshop 23 September 2004
Outline of the problem • Missing values in longitudinal trials are a big issue • First aim should be to reduce proportion • Ethics dictate that it can’t be avoided • Information lost can’t be conjured up • There is no magic method to fix it • Magnitude of problem varies across areas • 8-week depression trial: 25%−50% may drop out by final visit • 12-week asthma trial: maybe only 5%−10% • Most serious when efficacy evaluated at end FDA/Industry Workshop 23 September 2004
Methods of analysis • Ignore drop-out • CC (complete-case analysis) • Single imputation of missing values • LOCF (last observation carried forward) • Generate small samples from estimated distributions • MI (multiple imputation) • Fit model for response at all time-points • GEE (generalized estimating equations) • MNLM (multivariate normal linear model; also referred to as MMRM, or mixed-model repeated measures) • Model drop-out as well as response • SM (selection models) • PMM (pattern-mixture models) FDA/Industry Workshop 23 September 2004
Properties of methods • MCAR: drop-out independent of response • CC is valid, though it ignores information • LOCF is valid if there are no trends with time • MAR: drop-out depends only on observations • CC, LOCF, GEE invalid • MI, MNLM, weighted GEE valid • MNAR: drop-out depends also on unobserved • CC, LOCF, GEE, MI, MNLM invalid • SM, PMM valid if (uncheckable) assumptions true FDA/Industry Workshop 23 September 2004
Usage of methods • In the past, LOCF has been used widely • seen as conservative: not necessarily true • gives envelope together with CC: not necessarily true • conditional inference: not often interpretable • MI was developed to improve imputation • concern with repeatability & assumptions • MNLM is being increasingly used • software available, but lack of understanding • SM, PMM recommended for sensitivity analysis • looks at some types of MNAR, requiring assumptions FDA/Industry Workshop 23 September 2004
Compare LOCF and MNLM • Simulation study, based on experience from trials • Six trials from a range of psychiatry areas • Pattern of treatment means over time • Covariance matrix between repeated obs • Drop-out rates • Set up a range of drop-out mechanisms • Generate many datasets and analyse both ways • Look at bias of treatment diff. at final time-point • Look at power to detect diff. FDA/Industry Workshop 23 September 2004
Trial 2 Pick two comparisons Trials 3, 4, 6 Pick one comparison Gives seven two-arm scenarios FDA/Industry Workshop 23 September 2004
Covariance matrix from Trial 4 Week Correlation SD 1 4.6 2 .686.3 3 .57.727.2 4 .52.64.837.3 5 .43.53.70.827.2 6 .39 .50.64.75.857.4 7 .33 .43.60.71.78.897.6 8 .32 .44.59.67.74.84.887.7 1 2 3 4 5 6 7 • Used estimates from each trial in simulation FDA/Industry Workshop 23 September 2004
% drop-out rates from Trials 2 & 6 Week 1 2 3 4 5 6 Total Treat 1 17 11 15 51158 Treat 2 10 13 14 10149 Treat 3 6 15 8 8 340 Week 1 2 3 4 6 8 Total Treat 1 3 9 5 6 730 Treat 2 7 7 5 7 936 Treat 3 6 3 2 3 922 • Used average rate over times and treatments from each trial FDA/Industry Workshop 23 September 2004
Drop-out mechanisms • MCAR – generate drop-out at random • MAR – classify responses at Time k by size, and simulate drop-out at Time k+1 with varying probabilities for each class • MNAR – as for MAR, but simulate drop-out at Time k, so actual response that influences drop-out is “not observed” • Divide all responses at any visit into 9 quantiles, and investigate 3 probability patterns (next slide) for drop-out FDA/Industry Workshop 23 September 2004
Drop-out probabilities Drop-out probability increases as response increases These patterns give an average 4% drop-out rate per visit FDA/Industry Workshop 23 September 2004
Trial 1, simulation results • Large treatment difference: 19 • average obs. SD: 19 • patients per arm: 93 • Example of simulation results • MCAR drop-out • 1000 simulations %power_mnlm 99.90 %power_cc 99.90 %power_locf 99.90 %bias_mnlm 0.32 %bias_cc 0.29 %bias_locf –12.17 FDA/Industry Workshop 23 September 2004
Trial 1, summary • Bias uniformly greater for LOCF • average 18% vs 4% for MNLM • all negative bias except one for LOCF (MAR extreme) • e.g. MNAR linear: 13% bias for LOCF, i.e. treat diff 15 rather than 19; 2% bias for MNLM • e.g. MNAR extreme: 24% for LOCF, 18% for MNLM • Power nearly all 100% FDA/Industry Workshop 23 September 2004
Trial 2, first comparison • Medium treatment difference: 13 • average obs. SD: 19; patients per arm: 75 • Bias greater for LOCF than MNLM except one (MNAR extreme) with 27% for LOCF, 28% for MNLM • average 23% for LOCF, 7% for MNLM • all negative bias except one for LOCF (+39% for MAR extreme) • Power uniformly higher for LOCF: average 92% vs 67% for MNLM FDA/Industry Workshop 23 September 2004
Trial 3 • Medium treatment difference: 3 • average obs. SD: 8.7; patients per arm: 116 • Similar results to Trial 2 with first comparison, except • smaller power difference: 76% for LOCF, 60% for MNLM FDA/Industry Workshop 23 September 2004
Trial 4 • Small treatment difference: 2 • average obs. SD: 6.9; patients per arm: 142 • Bias uniformly greater for LOCF (but small in magnitude as treatment difference is small) • average 44% vs 4% for LOCF • all negative bias except three for MNLM (+2, 0, 0 for MCAR, MAR light and MAR medium) • Power uniformly lower for LOCF • average 21% vs 36% for MNLM FDA/Industry Workshop 23 September 2004
Trial 5 • Small treatment difference: 2 • average obs SD: 8.9; patients per arm: 121 • Similar results to Trial 4, except • smaller bias difference: 12% for LOCF, 4% for MNLM • little power difference: 26% for LOCF, 22% for MNLM FDA/Industry Workshop 23 September 2004
Trial 6 • Almost no treatment difference: 1 • average obs. SD: 10.3; patients per arm: 115 • Bias uniformly greater for LOCF • average 28% vs 9% for MNLM • negative bias except five for MNLM (+12, +9, +5, +2, +4 for MCAR, MAR and MNAR light) • Power virtually the same • average 7% for LOCF vs 9% for MNLM FDA/Industry Workshop 23 September 2004
Trial 2, second comparison • Almost no treatment difference: 1 • average obs. SD: 19; patients per arm: 75 • Similar results to Trial 6, except • little bias difference: 23% for both FDA/Industry Workshop 23 September 2004
Conclusions 1. MNLM is nearly always superior in terms of reduced bias • LOCF is biased even for MCAR with these patterns • MNLM has virtually no bias for MCAR and MAR • MNLM has less bias than LOCF for moderateMNAR • extreme MNAR gives problems for both 2. Bias is usually negative • underestimates the effect of a drug • is this contributing to the attrition rate of late-phase drugs? FDA/Industry Workshop 23 September 2004
Conclusions (continued) 3. LOCF sometimes has more power than MNLM, sometimes less • reduced treatment effect can be more than counteracted by artificially increased sample-size • against statistical and ethical principles to augment data with invented values 4. MNLM gives very similar results to CC • MNLM adjusts CC for non-MCAR effects • LOCF adjusts CC in unacceptable ways • other methods must be used to investigate non-MAR effects: neither LOCF nor MNLM can address these problems FDA/Industry Workshop 23 September 2004
Actions within GSK • Continue to propose MNLM for primary analysis of longitudinal trials • Prepare clear guides for statisticians, reviewers and clinicians about MNLM • Continue to investigate methods for sensitivity analysis to handle MNAR drop-out FDA/Industry Workshop 23 September 2004
Selected references • Mallinckrodt et al. (2003). Assessing and interpreting treatment effects in longitudinal clinical trials with missing data. Biological Psychiatry53, 754–760. • Gueorguieva & Krystal (2004) Move Over ANOVA. Archives of General Psychiatry61, 310–317. • Mallinckrodt et al. (2004). Choice of the primary analysis in longitudinal clinical trials. Pharmaceutical Statistics3, 161–169. • Molenberghs et al. (2004). Analyzing incomplete longitudinal clinical trial data (with discussion). Biostatistics5, 445–464. • Cook, Zeng & Yi (2004). Marginal analysis of incomplete longitudinal binary data: a cautionary note on LOCF imputation. Biometrics60, 820-828. FDA/Industry Workshop 23 September 2004