320 likes | 335 Views
Statistical Issues in Interpreting Clinical Trials. D. L. DeMets Journal of Internal Medicine 255: 529-537. 2004. “Lies, Damn Lies, and Clinical Statistics” Justin L. Grobe September 1, 2004. Drug Development Paradigm. Medicinal Chemistry Targeted development of new compounds
E N D
Statistical Issues in Interpreting Clinical Trials D. L. DeMets Journal of Internal Medicine 255: 529-537. 2004 “Lies, Damn Lies, and Clinical Statistics” Justin L. Grobe September 1, 2004
Drug Development Paradigm • Medicinal Chemistry • Targeted development of new compounds • Animal Testing • Test efficacy, potency, safety • Human Clinical Trials • Multiple phases to test efficacy, potency, safety, and to compare new intervention to standard
Clinical Trials – Design Paradigm • Randomization • Assignment to treatment group • Order effects • Placebo • “Control” • (Ethical considerations)
Input = Return • “No clever analysis can rescue a flawed design or poorly conducted trial.” • Compliance issues
Five Major Statistical Issues • Intention-to-treat principle • Surrogate outcome measures • Subgroup analyses • Missing data • Noninferiority trials
Statistical Issue 1:Intention-to-treat principle • “… all patients are accounted for in the primary analysis, and primary events observed during the follow-up period are to be accounted for as well.” • Results can be biased if either of these aspects are not adhered to
Myths and examples • Myth: Large trials are free of these concerns • Increased numbers of patients decreases variability of response variable, thereby making detection of differences easier; • EXCEPT, this amplifies biases in the outcome measurement • WHICH MAY cause detection of differences which do not actually exist
To include or not to include… • Two common reasons to drop patient data • Post hoc ineligibility assessment • Lack of patient compliance
TABLE 1: Post-hoc ineligibility assessmentAnturane Reinfarction Trial • 1629 patients who had survived a heart attack • 813 patients received Anturane • 816 patients received placebo • 71 patients deemed “ineligible” for analysis by protocol Table 1 1980 Anturane mortality results Anturane (%) Placebo (%) P-value Randomized 74/813 (9.1) 89/816 (10.9) 0.20 ‘Eligible’ 64/775 (8.3) 85/783 (10.9) 0.07 ‘Ineligible’ 10/38 (26.3) 4/33 (12.1) 0.12 P-values for eligible0.00010.92 versus ineligible Striking statistical comparisons are made by including/excluding patients in each group: thus the results are biased by post hoc exclusions
TABLE 2: Patient complianceCoronary Drug Project • 3885 post-heart attack men were given clofibrate or placebo • 708 clofibrate and 1813 placebo patients were at least 80% compliant Table 2 Coronary drug project 5-year mortality ClofibratePlacebo n % Deaths n % Deaths Total (as reported) 1103 20.0 2782 20.9 By compliance 1065 18.2 2695 19.4 <80% 357 24.6 882 28.2 >80% 708 15.0 1813 15.1 Compliance itself is considered an outcome: thus to base the interpretation of the ‘drug outcome’ on the ‘compliance outcome’ is confounding
Dealing with noncompliance • Larger sample sizes are required to compensate for the dilution effect of noncompliance • 10% noncompliance requires 23% increase in sample size • 20% noncompliance requires a 56% increase in sample size
Statistical Issue 2:Surrogate outcome measures • Outcome measures of primary question must be: • Clinically relevant • Sensitive to intervention • Ascertainable in all patients • Resistant to bias • Result: Large, time-consuming, costly studies • Alternative approach: surrogate outcome measures
Surrogate outcome measure:Assumption • If the intervention will modify surrogate outcome, it will modify the primary clinical outcome
Surrogate outcome measure:Requirements • Surrogate outcome must be predictive of clinical outcome • Surrogate outcome must fully capture the total effect of the intervention on the clinical outcome “Necessary and sufficient”
Surrogate outcome measures:Difficult to obtain and validate • Intervention may modify the surrogate and have no or only partial effect on the clinical outcome • Intervention may modify the clinical outcome without affecting the surrogate (Note: NOT surprisingly, track record for use of surrogate outcome measures is very bad)
Surrogate outcome measures:Example:Cardiac Arrhythmia Suppresion Trial (CAST) • Three drugs tested for suppression of cardiac arrhythmias • All three drugs had been shown to suppress premature cardiac ventricular contractions (surrogate) • Two drugs terminated early (10-15% into study) because both drugs dramatically increased cause-specific sudden death and total mortality Table 3 Cardiac Arrhythmia Suppression Trial Early termination in two drug arms Drugs Placebo Sudden death 33 9 Total mortality 56 22 Clearly the interventions (drugs) had differential effects on the surrogate measure (premature cardiac ventricular contractions) and the clinical outcome (mortality)
Statistical Issue 3:Subgroup analyses • Clinical trials usually try to include as many (diverse) patients as possible for multiple reasons: • Large sample size • Reasonable recruitment time • Assess internal consistency of results • Seemingly logical use of the large data set is to do many post hoc analyses on subgroups
Subgroup analysis:Mathematical problems • Introduction of subgroups increases probability of false positives • 5 subgroups yields greater than 20% chance of at least one (p=0.05) statistically significant difference BY CHANCE
Subgroup analysis:MERIT trial • Beta-blocker (metoprolol) treatment for patients with congestive heart failure • Showed a 34% reduction in mortality overall
Consistency of mortality results across lots of subgroups found with subgroup analysis: Subgroup analysis:MERIT trial
In the USA, total mortality is not reduced, yet total mortality plus any hospitalization is…? Subgroup analysis:MERIT trial
Subgroup analysis:MERIT trial • Two other similar heart failure trials evaluating other beta-blockers showed no regional difference; • THUS, it is likely that the MERIT finding is due to chance alone.
Subgroup analysis:PRAISE-I and PRAISE-II trials • PRAISE-I performed to evaluate amlodipine for the treatment of congestive heart failure • Subgroups: • Ischemia • Nonischemia • Analysis of subgroups separately showed a significant (p<0.001) effect of amlodipine on heart failure in nonischemic patients, but no effect on ischemic patients • Researchers decided to perform PRAISE-II trial on nonischemic patients only
Subgroup analysis:PRAISE-I and PRAISE-II trials • PRAISE-II showed remarkably similar mortality results in the drug and placebo groups • PRAISE-II directly opposed the exciting results of PRAISE-I’s subgroup analysis
Statistical Issue 4:Missing data • Missing data is often simply “dropped” • This violates two rules: • Intention-to-treat rule all patients must be accounted for in primary outcome analysis • Common sense rule if patient is too sick to complete trial, this may be informative!
Missing data • In “time to event” trials (like mortality), data can be missing because the study ends before the event happens • Patients are then “censored” (dropped) • This can introduce serious mathematical bias • (Mortality studies in USA have no excuse death indices allow follow-up without help from patient)
Statistical Issue 5:Noninferiority trials • “New intervention is not worse than the standard” • New intervention may be: • Easier to administer • Better tolerated • Less toxic • Less expensive • Any given study may be a superiority and/or noninferiority trial, depending on results
Noninferiority trials • Three challenges must be met: • Noninferiority trial must be of highest quality to detect clinically meaningful differences • Noninferiority trial must have a strong, effective control intervention (state-of-the-art care) • Margin of indifference is arbitrary, depending on medical importance of treatment and risk-to-benefit tradeoffs
Noninferiority trials:OPTIMAAL Trial • Losartan (angiotensin II receptor blocker) vs captopril (ACE inhibitor) in heart failure patient population • Losartan has fewer (and less severe) side effects than captopril • OPTIMAAL • Designed to detect 20% reduction in relative risk, with 95% power • Margin of indifference set at 1.1 • Thus 95% confidence interval needed to exclude risk of 1.1 to declare losartan “noninferior” to captopril
Noninferiority trials:OPTIMAAL Trial • Mortality results for OPTIMAAL • Relative risk of 1.126 with 95% confidence interval of 1.28 • NEITHER superiority nor noninferiority were achieved • Researchers computed that captopril had (historical data) a relative risk of 0.806 vs. placebo, and thus calculated that losartan must therefore have a relative risk of 0.906 vs. placebo… • The statistically appropriate conclusion at this point is: • NO ACCEPTABLE CONCLUSIONS POSSIBLE FROM THIS DATA
CONCLUSIONS • Statistics can not make up for bad design • Statistics can not make up for poor execution of design • Statistics is very limited in being able to compensate for • Ineligible patients being enrolled • Noncompliance • Unreliable outcome measures • Missing data • Underpowered trials