580 likes | 591 Views
Understand the goal, methodology issues, and possible solutions in clinical studies to obtain reliable treatment effect estimates. Explore endpoint definitions, disease burden measures, and examples in cardiovascular and rare disease studies.
E N D
Translational Data Science L.J. Wei, Harvard University
Many thanks to • Lu Tian, Stanford • TianxiCai, Harvard • Brian Claggett, Harvard • Hajime Uno, Harvard • Takahiro Hasegawa, Shionogi, Japan • Soctt Evans, Harvard • Lihui Zhao, Northwestern • Danyu Lin, UNC • Zhiliang Ying, Columbia • Zhezhen Jin, Columbia • Colleagues at pharmaceutical industry
What is the goal of a clinical study? • To obtain robust, clinically interpretable treatment effect estimate with respect to risk-benefit perspectives at the patient’s level via efficient and reliable quantitative procedures
What are the issues? • The conventional way to conduct trials gives us fragmentary information • Lack of clinically meaningful totality evidence • Difficult to use the trial results for future patient’s management
A Few Methodology Issues 1. Estimation vs. testing • P-value provides little clinical information about treatment effect/risk • The size of the effect matters • Goodness of fit test? Using the prediction to assess model fit
TREAT study for EPO CV safety • If we follow the patients up to 48 month, the control arm's average stroke-free time is 46.9 months and the Darb arm's is 46 months. The difference is 0.9 month with 0.95 CI (0.4, 1.4)m and p<0.001 (very significant). • The p-value can be exaggerated for treatment difference. A small increase of Z-value may drastically decrease p-value. The confidence interval estimate is much stable and interpretable.
What is a clinically meaningful treatment effect via estimation? • Reimbursement issue beyond getting the medical product approved by regulatory agencies. • What is the “estimand?” • If the overall treatment effect is not “clinically impressive,” we may identify a “high value” subgroup via a pre-specified procedure
2. How do we define a primary endpoint with multiple outcomes? • What is current practice? • Define primary endpoints and secondary endpoints • Efficacy and toxicity (how to connect them together?) • Disease burden measure? • The conventional component-specific analysis – informative missing, censoring or competing risks
What is the general clinical practice for treating a patient with cardiovascular diseases? • Following the patient over time • Having periodic clinical/lab exams/tests • Recording the time to multiple clinical/lab outcomes (heart attack, stroke, CV hosp, CV death…BP, HbA1C, toxicity..) • Assessing the disease burden/progression over time via totality of multiple outcomes • Making decision of treatment selections
A typical cardiovascular(CV) study • Comparing a new therapy with standard care • Question is whether new treatment would prevent from having bad CV outcomes/toxcity • Following each patient over time • Times to multiple clinical events are collected
Conventional approaches for clinical trials • Choosing a single outcome (e.g., time to clinical event) as the primary endpoint • Applying univariate analysis for the treatment difference • Figuring out how to handle informative censoring (competing risks) • Considering other outcomes (risk, benefit) as secondary endpoints • Not sure how to treat future patients from study results via those separate summary measures for efficacy/safety
Example : Beta-Blocker Evaluation of Survival (BEST) Trial (NEJM, 2001) • Study • Bucindolol vs. placebo • patients with advanced chronic heart failure -- n = 2707 • Average follow-up: 2 years • Primary endpoint: overall survival • Hazard ratio for death = 0.90 (p-value = 0.1)
Possible solutions? • Using the patient’s disease burden or progression information during the entire followup to define the “responder” • Creating more than one response categories: ordinal categorical response • Brian Claggett’s thesis paper (Published in Biostatistics)
BEST Example: 8 Categories • 1: No events • 2: Alive, non-HF hospitalization only • 3: Alive, 1 HF hosp. • 4: Alive, >1 HF hosp. • 5: Late non-CV death (>12 months) • 6: Late CV death (>12 months) • 7: Early non-CV death (<12 months) • 8: Early CV death (<12 months)
Example: Treatment for HIV infected children • Primary endpoint: viral load reduction • Major secondary endpoint: growth profile over 48 weeks
Example: DMD rare disease • Nonsense mutation Duchenne muscular dystrophy (nmDMD) is a rare, X-linked, neuromuscular, childhood disorder.
Ambulatory Boys with Nonsense Mutation Muscular Dystrophy • Outcomes for quantifying muscle function • 6 MWD • 10-meter walk/run • 4-stair climb • 4-stair descend
Comparative studies for DMD • Two trials done by PTC • The primary endpoint is 6MWD • Various secondary endpoints • Each study was a 48week, multicenter, randomized, double-blind, placebo controlled, compared the efficacy and safety of ataluren vs placebo in ambulatory boys with nmDMD.
Graphical display for patient level data Treatment Placebo 3 3 4 4 5 5 No. No. 1 1 2 2
Treatment Placebo 3 4 5 3 4 5 1 2 1 2
How to analyze multiple outcome data? • For each column (specific outcome), obtaining the treatment difference D • Combining D’s linearly (weighted average) • Evaluating how “unlikely” to get the observed combined statistic • Wei-Lachin (JASA, 1984) and Wei-Johnson (Biometria, 1985) • Powerful if all the test statistics were on the “right direction”
How unlikely to observe this pattern under null hypothesis? Study 007 Study020 FavorsPlacebo FavorsAtaluren ∆ 6MWD Change at Week 48, LS Mean 95% CI(m) FavorsPlacebo FavorsAtaluren ∆ 6MWD Change at Week 48, LS Mean 95% CI(m) -200204060 -200204060 Endpoint Endpoint 6MWD 6MWD 10-meter walk /run 10-meter walk /run 4-stairclimb 4-stairclimb 4-stair descend -2 0 2 4 6 Study 007 ITT Ataluren 10,10,20 mg/kg (N=57) Placebo (N=57) 4-stairdescend -2 0 2 4 6 Study 020 ITT Ataluren (N=114) Placebo (N=114)
Another way to combine • For each outcome, we rank the observations over patients in each treatment group • Add the ranks across each row (for each patient) so each patient has a rank score • Conducting a test using those scores • (O’Brien test)
Limitation of this combination approach • Different outcomes have different scales, so it may be only useful as a powerful test procedure • How to get an overall estimate for treatment effect?
3. Identifying a high value subgroup of patients? • A negative trial does not mean the treatment is no good for anyone • A positive trial does not mean it works for everyone • The usual subgroup analysis is not adequate to address this issue • Need a built-in pre-specified procedure for identifying patients who benefit from treatment • FDA’s guidance on predictive enrichment (2012)
4. How to monitoring trials “quantitatively” via prediction? • The usual practice is to use p-value (O-B stopping et al). • Use conditional power? • Use prediction confidence interval estimate (EAST new version)
5. How to monitor safety? • What is the conventional way? • Component-wise tabulation or analysis? • No information about multiple AE events at the patient level • Graphical method to show the temporal toxicity profile?
6. Quantifying treatment contrast (difference)? • Should be model-free parameter • Using difference of means, median, etc. • For censored data, using a constant hazard ratio (heavily model-based)? • Model-based measure is difficult to interpret or validate
Issues for the hazard ratio estimate • Hazard ratio estimate is routinely used for designing, monitoring and analyzing clinical studies in survival analysis
Model Free Parameter for Treatment Difference * Considering a two-treatment comparison study in “survival analysis” * How do we quantify the treatment difference? • Median failure time (may not be estimable); • t-year survival rate (not an overall measure)? • A constant hazard ratio over time with the log-rank test
Eastern Cooperative Oncology Group • E4A03 trial to compare low- and high-dose dexamethasone for naïve patients with multiple myeloma • The primary endpoint is the survival time • n=445 • The trial stopped early at the second interim analysis; the low dose was superior. • Patients on high-dose arm were then received low-dose and follow-up for overall survival were continued.
A Cancer Study Example Group 1 Group 2
The proportional hazards assumption is not valid • The PH estimator is estimating a quantity which cannot be interpreted and, worse, depends on the study-specific censoring distributions • Any model-based treatment contrast has such issues (need a model-free parameter) • The logrank test is not powerful
Conventional analysis: • Log-rank test: p=0.47 • Hazard Ratio: HR=0.87 (0.60, 1.27)
What is the alternative way for survival analysis? • Using the area under the curve of Kaplan-Meier estimate up to a fixed time point • Restricted mean survival time • Model-free and a global measure of efficacy • Can be estimated even under heavy censoring
The area under Kaplan-Meier as a summary of survival distribution Treated Area under the curve RMST: 33.3 m Area under the curve RMST: 35.4 m
Cancer Study Example Restricted Mean (up to 40 months): • 35.4 months vs. 33.3 months • Δ = 2.1 (0.1, 4.2) months; p=0.04 • Ratio of Survival time = 35.4/33.3 = 1.06 (1.00, 1.13) • Ratio of time lost = 6.7/4.6 = 1.46 (1.02, 2.13)
7. Post-marketing/safety studies ? • It is not appropriate to use an event driven procedure to conduct a safety study. • The event rate is low, the exposure time matters • Requires lot of resources (large or long-term study)
CV safety study for anti-diabetes drugs • Event driven studies, that is, we need to have a pre-specified # of events so the resulting confidence interval for the treatment difference is “narrow” • For example, the upper bound of 95% confidence interval is less than 1.3
The EXAMINE trial (alogliptin) NEJM, October 3, 2013
RMST (24 months): Placebo 21.9 (21.7, 22.2) Alogliptin 22.0 (21.8, 22.3) Difference -0.08 (-0.39, 0.24) Ratio 1.00 (0.98, 1.01) RMST (30 months): Placebo 27.1 (26.7, 27.4) Alogliptin 27.2 (26.9, 27.5) Difference -0.12 (-0.56, 0.33) Ratio 1.00 (0.98, 1.01)
What if a smaller study? 95% confidence intervals for various measures
8. Evaluating new treatment for rare diseases • Utilizing the registry data or natural history data • Single arm trial? • Multiple outcomes? • It is not all clear how to quantify disease burden over time
How to make treatments comparable across studies? • Which patient population are we referring to? • It is not clear using the propensity score procedure. • Using a model relating outcome to covariates with registry data, then move the fitted model to the clinical trial population?
Nissen and Wolski (2007) performed a meta analysis to examine whether Rosiglitazone (Avandia, GSK), a drug for treating type 2 diabetes mellitus, significantly increases the risk of MI or CVD related death.
ExampleEffect of Rosiglitazone on MI or CVD Deaths • Avandia was introduced in 1999 and is widely used as monotherapy or in fixed-dose combinations with either Avandamet or Avandaryl. • The original approval of Avandia was based on its ability in reducing blood glucose and glycated hemoglobin levels. • Initial studies were not adequately powered to determine the effects of this agent on micro- or macro- vascular complications of diabetes, including cardiovascular morbidity and mortality.