1 / 62

Handling Missing Data in the Analysis of CTN Trials: Pitfalls and Possible Solutions

CTN Design & Analysis Workshop. Handling Missing Data in the Analysis of CTN Trials: Pitfalls and Possible Solutions. Neal Oden, PhD, DSC2-EMMES Gaurav Sharma, PhD, DSC2-EMMES Paul Van Veldhuisen, PhD, DSC2-EMMES Paul Wakim, PhD, CCTN, NIDA. 15 March 2011. Today’s Workshop. The problem

tacey
Download Presentation

Handling Missing Data in the Analysis of CTN Trials: Pitfalls and Possible Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CTN Design & Analysis Workshop Handling Missing Data in the Analysis of CTN Trials: Pitfalls and Possible Solutions Neal Oden, PhD, DSC2-EMMES Gaurav Sharma, PhD, DSC2-EMMES Paul Van Veldhuisen, PhD, DSC2-EMMES Paul Wakim, PhD, CCTN, NIDA 15 March 2011

  2. Today’s Workshop • The problem • Prevention • Types of missing data • Analysis methods • Case study • Open discussion

  3. Missing Data • Information within a trial that is meaningful for analysis but not collected • Focus here mostly on primary outcome data, but relevant to missing secondary outcomes and covariates too

  4. Missing Data • Randomization • Balances treatment groups for known and unknown factors • Lose benefits if there is drop-out, as groups at outcome may not have been similar at baseline • Intention-to-treat principle • Violates principle if not all participants contribute to the primary analysis

  5. Missing Data • If missing unrelated to assigned treatment • Reduces statistical power • If missing related to assigned treatment or to outcome • Biases the estimate of the treatment effect

  6. Causes of Missing Data • Due to discontinuation of study treatment • Outcomes undefined for some participants • QOL measures after death • Quantitative drug use hair analysis in individuals without hair • Test fails/specimen lost • Attrition • Related to health status/drug use • Unrelated to health status/drug use (e.g., moved)

  7. Continuing Data Collection for “Drop-Outs” • Distinction between Premature end of treatment AND End of study • Does collecting data after premature end of treatment make sense?

  8. Rationale • Preserves intention-to-treat approach • Many CTN trials are pragmatic trials • NOT “Does treatment work if perfectly delivered”? • but RATHER • “Is this a good treatment strategy or policy”? OR • “What happens once treatment starts or is recommended?”

  9. Rationale • Delivery of medicine deals with people in the real world • A 100% efficacious cure for stimulant use is useless for public health if nobody can stand it. • Strive to collect complete data for primary outcome on ALL participants, even in those who do not complete intervention • Too much missing data - > no way result will be believable no matter how sophisticated the statistical method

  10. Why Do We Like It? Weight loss diet • People on the effective arm lose weight and stay in the study • Some on the ineffective arm get discouraged and quit • If we analyzed only the people who stayed in the trial, the ineffective arm would look too good

  11. Approaches to Missing Data • Design and conduct of clinical trial that minimizes missing data • May require trade-offs with generalizability • Apply analysis methods that use information in observed data to help analyze primary outcome data in the presence of missing data

  12. B. Franklin An ounce of prevention is worth a pound of cure

  13. Minimize Missing Data in….. Trial Design • Flexible dose • Target population • Allow rescue therapy for poor responders • Define primary outcomes that are highly ascertainable • Minimize participant burden/reduce follow-up • Number of visits/assessments

  14. Minimize Missing Data in…... Trial Conduct • Explain importance of trial participation during consent process • Emphasize to staff importance of maintaining follow-up even when treatment is refused • Incentives • For participants, need to ensure level is not viewed as coercive

  15. Minimize Missing Data in…... Trial Conduct • Expression of thanks • Written/verbal • Assistance with travel • Reminders before visits • Welcoming staff/friendly environment • Keep locator information current • Monitor and report to investigators extent of missing data

  16. Availability of Primary Outcome: Percent of Measures with Values(N=29 trials)

  17. What’s the big deal? We need N = 400 (based on power analysis) But we expect 20% missing So we set the initial N = 500 So that the final (analyzed) N = 400 National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  18. Technical terms that we can’t escape… Missing at random (MAR) Missing completely at random (MCAR) Missing not at random (MNAR) Ignorable Non-ignorable … but what do they mean? National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  19. Missing Completely at Random (MCAR) (Non-technical) Definition: The fact that Y is missing has nothing to do with the unobserved value of Y, or with other variables Therefore: The set of participants with complete data can be regarded as a simple random (or representative) sample of all participants What to do? Ignore the missing data and analyze the available data National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  20. Missing at Random (MAR) (Non-technical) Definition: The fact that Y is missing can be explained by other observed values of Y, or by other measured variables Therefore: The observed data can be used to account for the missing data What to do? Use Maximum Likelihood or Multiple Imputation approach, and include in the model the other measured variables that explain missingness National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  21. Missing Not at Random (MNAR) (Non-technical) Definition: The fact that Y is missing cannot be explained by other observed values of Y, or by other measured variables Therefore: The observed data cannot be used to account for the missing data; and outside information is needed In simple English: We have a problem National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  22. In Summary… Based on Graham 2009 National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  23. Bottom Line MCAR:No big deal MAR:Use available collected data to “explain” missing mechanism, and use existing statistical methods MNAR:Need outside information to “explain” missing mechanism National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  24. Ignorable & Non-Ignorable (roughly speaking) • Ignorable (available data are sufficient): • Missing Completely At Random (MCAR) • Missing At Random (MAR) • Non-Ignorable (need outside information): • Missing Not At Random (MNAR) National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  25. Missing Data Analysis Methods

  26. Complete Case and Pairwise Deletion CC PD Y1 Y2 Y3 Y1 Y2 Y3 X XXXXX X XXXXX X X - X X- X X - X X- (Correlation Illustration) • Simple, Default in Statistical Software • Potential loss of info and precision • Biased when observation is not MCAR

  27. Single Imputation Impute a single value, i.e. mean, BOCF, LOCF, imputing missing as positive… • Simple, artificially increases sample size • Underestimate SE and incorrect p-values • Most SI methods require MCAR assumptions to hold, while some, such as LOCF, even require very strong and often unrealistic assumptions

  28. Multiple Imputation (MI) Observed Data Imputations 1 2 … m … … … … A simulation based approach to missing data ? ? ? ?

  29. The General Idea IMPUTATIONANALYSISPOOLING (1) (2)(3) Incomplete DataImputed DataAnalysis ResultsFinal Results

  30. (1) IMPUTATION Models • The imputation model should include primary predictive variables and other variables associated with missingness • Multiple Imputation method is robust even with approximate imputation models

  31. (2) ANALYSIS Models • Regression Model • General Linear Model • Generalized Linear Model (Logistic Regression, Poisson Regression)

  32. (3) Rules for POOLING … • Confidence Interval for Parameter of Interest is given by • Mean of Estimate + tdf√(Total Variance) Estimate 1 Variance 1 Estimate 2 Variance 2 Estimate 3 Variance 3 Estimate ‘m’ Variance ‘m’ Mean of Estimate Within Variance + Between Variance = Total Variance

  33. Desirable Features • MI gives approximately unbiased estimates of all parameters • MI provides good estimates of the standard errors • MI can be used with many kinds of data and analyses without specialized software Requires MAR assumption

  34. Maximum likelihood • Basic idea • Given some data, • Try to guess the parameter(s) of the probability distribution that generated the data • MLE of a parameter is the value that maximizes the probability of the data you already have

  35. Example: • Flip a coin, get 45 heads, 36 tails • We don’t know p, but whatever it is: • Pr(45 H in 81 tosses) = K p45(1-p)36 • How to guess p? • Pick the value of p that maximizes the probability of what already happened • Pick p to maximize L = p45(1-p)36 • Best guess turns out to be 45/81

  36. Maximum likelihood estimates have nice properties • Consistent • Asymptotically • Normal • Unbiased • minimum variance • etc.

  37. New problem • H = 45 • T = 36 • ? = 19 • Now how to guess p? • If we knew how many missing were H and how many T, we would know what to do. • But we don’t. • What to do?

  38. A solution • If data are MAR, • you can get MLE’s by • maximizing the (conditional) likelihood for the nonmissing data • ignoring the missing data mechanism.

  39. Important Application • Longitudinal analysis • Participant 1, visit 1, 2, 3, … • Participant 2, visit 1, 2, 3, … • For each visit, y = a + b1 x1 + b2 x2 + … • First approach: • Treat all visits as independent • Do the regression on all visits together • Wrong, because visits from a single participant are related, not independent

  40. Important Application (cont’d) • Second approach • The visits from a single participant have covariance • Use a mixed model • It used to be that you had to have all visits nonmissing for this analysis • But modern software (SAS MIXED, GLIMMIX) ignores the missing-data mechanism and gets MLE’s from only the nonmissing data, even if some visits are missing. • If data are MAR, this is fine!

  41. Visit 1 2 3 4 1 2 Participant 3 4 Complete visit 5 Incomplete visit Modern longitudinal ML software uses more data Neither old nor new method can use this visit Older CC analysis would use only these cases

  42. Another application • Survival analysis • Example: time to relapse • For some people, you have the time • For others, you don’t because • Study ended • People died • People dropped out • etc. • People without relapse times are said to be CENSORED

  43. Another application (cont’d) • For censored people, you don’t know the relapse time, but you know it is after the censor time • Survival analysis handles censored data, but • You have to make the assumption that censoring is noninformative. • If people drop out because they know they are going to relapse the next day, the censoring is informative. • Informative censoring gives biased survival time estimates • The “noninformative censoring” assumption is basically an MAR assumption.

  44. What if data are not MAR? • When the missing data are nonignorable (i.e., MNAR), standard statistical models can yield badly biased results • Cannot test MAR versus MNAR

  45. Sensitivity Analysis • The missing data mechanism is not identifiable from observed data • We don’t know what we don’t know • One or more analyses can be performed using different assumptions • Example: Worst Case Analysis • (won’t work with a lot of missing data)

  46. Goals of Sensitivity Analysis • Consider a range of potential associations between missingness and response • Assess the degree to which conclusion can be influenced by the missingness mechanism • If the conclusion is largely unchanged the result may be considered robust • Otherwise, the conclusion should be interpreted cautiously and may be misleading

  47. MNAR models • Use of non-ignorable models can be helpful in conducting a sensitivity analysis • Not necessarily a good idea to rely on a single MNAR model, because the assumptions about the missing data are impossible to assess with the observed data • One should use MNAR models sensibly, possibly examining several types of such models for a given dataset

  48. Two general classes of MNAR models • Selection Models – use model for the full data response and a selection mechanism • Pattern Mixture Models – use mixture of missing data pattern information in the model

  49. Case Study: CTN0010 - BUP for Adolescents Two groups: Bup/Nal detoxification over 2 weeks vs. Bup/Nal maintenance over 12 weeks N (analyzed) = 152 at 6 community treatment programs Main outcome measure: Opioid-positive urine test result at weeks 4, 8 & 12 Evaluation: weekly for 12 weeks, comprehensive at 4, 8, 12, 24, 36 & 52 weeks National Institute on Drug Abuse ─ National Institutes of Health ─ U.S. Department of Health and Human Services

  50. Woody, JAMA 2008

More Related