The Perils of Subgroups – Concerns, Examples, Alternatives

The Perils of Subgroups – Concerns, Examples, Alternatives FDA/Industry Statistics Workshop 2006 Andreas Sashegyi, PhD Eli Lilly and Company

Introduction • A well-powered study to test a specific scientific hypothesis defines clear limits on the information to be gathered • Subgroup analyses stretch these limits (to a greater or lesser extent) and conclusions from such analyses are prone to increased Type I and II error rates Things are not always as they seem…

European Carotid Surgery Trial • Carotid endarterectomy vs medical intervention in patients with recently symptomatic carotid stenosis • Consider subgroup analysis in patients with ≥ 70% stenosis according to month of birth…

ECST Subgroup Analysis • Figure 3, Rothwell PM, Lancet 2005; 365:176-186 • Treatment-Birthmonth interaction p<.001!

Some Common Pitfalls… • Lack of power to show subgroup effects • Failure to adjust for multiplicity • Potential imbalances in subgroup baseline characteristics • Inability to confirm effects • Information distribution • E.g. constant therapeutic effect on a relative scale implies decreasing absolute risk reduction with decreasing disease severity – limits information in subgroups of lower disease severity

The Conflict Applying overall results of large trials to individual patients without considering determinants of individual response Danger of subgroup analysis to target treatment

Example: Xigris • Activated Protein C for the treatment of adults with severe sepsis • Therapeutic area with a history of failed trials • PROWESS – global pivotal registration trial: • Randomized, double-blind, placebo-controlled • 24 µg/kg/hr Xigris vs placebo for 96 hours • Planned sample size – 2280 patients • Primary endpoint – 28-day all-cause mortality • Trial stopped at 2nd interim analysis

Subgroup Analysis of APACHE II Score N Trt Plc Primary 1690 24.7% 30.8% APACHE II 1st quartile 433 15.1% 12.1% 2nd quartile 440 22.5% 25.7% 3rd quartile 366 23.5% 35.8% 4th quartile 451 38.1% 49.0% 0.5 1.0 2.0 Relative Risk of Death

Observations… • Lower mortality observed for 68 of 70 subgroups • Observed effect in lowest APACHE quartile consistent in that the 95% CI included the overall point estimate • Analysis of other disease severity measures showed consistent survival benefit in less severe patients • Even within the first APACHE quartile… Was there evidence to support a differential drug effect by disease severity?

Some Reasons for Restricted Label Indication restricted to patients with greater disease severity as assessed, e.g. by APACHE • Pre-specification of APACHE II score as an important analysis • If relative risk reduction is constant, absolute risk reduction (i.e. benefit) is greatest in highest-risk patients  Xigris is associated with increased risk of serious bleeding • APACHE II score was the best discriminator of mortality risk • APACHE II score can be used at the bedside • Acknowledgment – hypothesis that benefit is limited to high risk patients is not proven… With no proof that Xigris works in all subgroups, nor that it is ineffective in some, indication focused on practicality and risk/benefit considerations

Example: Tarceva • HER1/EGFR tyrosine kinase inhibitor, indicated for • patients with locally advanced or metastatic NSCLC, after failure of ≥1 prior chemotherapy regimen • (in combination with gemcitabine) first-line treatment of patients with locally advanced, unresectable or metastatic pancreatic cancer • NSCLC – global registration trial: • Randomized, double-blind, placebo-controlled • Tarceva 150mg/day vs placebo, 2:1 randomization (488 vs 243) • Primary endpoint – survival

Subgroup Analysis Reported in PI • Potential differential effects according to epidermal growth factor receptor (EGFR) type and smoking status?

Comments • [An apparently larger effect observed in two subsets] • [Survival prolonged in EGFR +ve subgroup and unmeasured subgroup but did not appear to have an effect in the EGFR –ve subgroup; however CIs are wide and overlap so that a survival benefit in the EGFR –ve subgroup cannot be ruled out] • Would argue for consistent effect • Interpretation of “EFGR unmeasured” subgroup problematic

Comments • [For the subgroup who never smoked, EFGR status also appeared to be predictive of Tarceva survival benefit. EGFR +ve patients who never smoked had a large survival benefit; there were too few EGFR –ve patients who never smoked to reach a conclusion] • Implicitly assumes that EFGR subgroup finding is real • Multi-layered subgroup analysis compounds problems… • Conclusion of a large survival benefit in EFGR +ve patients who never smoked would imply a more moderate effect in EFGR –ve non-smokers (or rather, too few EFGR –ve non-smokers imply non-definitive findings in both subgroups of non-smokers…)

Incorrect conclusions from subgroup analyses… Table 1, Rothwell PM, Lancet 2005; 365:176-186

So… What are we to believe And how should we proceed ?

Some Safeguards • Pre-specify limited numbers of analyses, supported by scientific hypotheses • Consider expected effects and implied power conditions • Stratify randomization by subgroup factors • Focus primarily on treatment-subgroup interaction rather than effects within subgroups • Consider multiplicity adjustments • Minimize post-hoc analysis and interpretation • Consider alternatives directly addressing risk/benefit question…

Beyond Standard Subgroup Analysis… Two alternatives may offer additional insight: • Meta analysis (of trials or trial subgroups) • Patient-level analyses (e.g.: GLMs)  Exploring the first may be helpful if used appropriately  Exploring the latter is essential

Example: Xigris (revisited) • Consider post-marketing commitment ADDRESS • Xigris vs placebo in severe sepsis patients with lower disease severity • Global, randomized, double blind, placebo-controlled trial • Planned sample size ~ 11000 patients • Primary endpoint – 28-day all-cause mortality • Due to label differences in EU vs US, ADDRESS also enrolled some “high risk” patients

ADDRESS • Study stopped at an interim analysis, due to futility • Heterogeneous patient population • Several patient-level factors complicated interpretation • Diagnosis question for surgical patients with a single organ failure • Learning curve – site enrolment sequence effect • Does ADDRESS still leave doubt about the effect in lower-risk patients? • Academically: Perhaps • Practically: No

What About High-Risk Patients? • Can we learn more about the effect in high-risk patients? Consider recent meta analysis by Friedrich et al: (Critical Care 2006, 10:145) • If subgroup effect in PROWESS is real, should not expect same results in PROWESS and ADDRESS • Examine same subgroups in both trials for which effect is expected to be the same and proceed with meta analysis…

Seeking Confirmation… • RR (95% CI) in patients with APACHE II ≥ 25: • 0.71 (0.59, 0.85) PROWESS – 817 patients • 1.19 (0.83, 1.71) ADDRESS – 321 patients • Considerable evidence of heterogeneity… • Meta analysis shows no mortality benefit overall… But is this an appropriate use of meta analysis?

Caution Advised! • Subgroups are not necessarily comparable across trials • Disease severity characteristics suggested “high-risk” patients receiving Xigris in ADDRESS were significantly sicker than placebo patients • Mortality in the ADDRESS subgroup overall much lower than in PROWESS • This analysis was not helpful in providing more insight • The attempt was well-motivated, but in general subgroup analyses of any kind provide an insufficient basis for facilitating decisions for individual patients

Subgroups vs Benefit/Risk Analysis • The “single-objective/hypothesis” paradigm in clinical trials is important for establishing drug effect in a population • Targeting therapy is the next step – best accomplished by comprehensive risk-benefit analysis • Patient treatment decisions revolve around individuals – analyses to inform these decisions should accommodate characteristics of individuals

Subgroups vs Benefit/Risk Analysis • Elements of a unified framework for benefit/risk analysis: • Patient outcomes (efficacy) • Patient outcomes (adverse events) • Patient characteristics • Trade-off considerations – patient-specific  Should lead to systematic decision analysis accounting for uncertainty • Regression or other model-based approaches are well-suited for this effort Finally, in the effort to match the right patient with the right drug, no single analysis can suffice…

The Perils of Subgroups – Concerns, Examples, Alternatives