Subgroup Analyses: Can We ‘Smooth' out the Rough Edges?

Subgroup Analyses: Can We ‘Smooth' out the Rough Edges? Daniel Sargent, PhD Mayo Clinic Sept 28, 2006

Outline • Motivation • Subgroups ARE medicine (especially its future) • Examples • Good and bad conduct • Strategies • Hierarchical models • Smoothing approaches • Conclusion

Subgroups analysis: My Definition & My Bias • Definition: An effort to draw inference on an effect of an intervention in a set of patients smaller than the entire experimental cohort • Bias: Such inferences will be more robust when based on a model using all patients than an analysis restricted to just the cohort of interest

Subgroups are medicine • If all patients were the same, wouldn’t need physicians • Human Genome Project massively expanding knowledge base • Technology, biology, chemistry, etc. allowing manufacture of highly specific, targeted compounds • Patients seek ‘tailored’ treatment recommendations

Example: Colon Cancer: Model-Derived Estimates of 5 year DFS (%) with Surgery plus Adjuvant Therapy Gill, JCO 2004; http://www.mayoclinic.com/calcs

Example: Breast Cancer • Most common cancer in women in the US • The HER-2 gene is overexpressed in 25-30% of breast cancers; associated with worse prognosis. • Trastuzumab, a humanized monoclonal antibody targets the HER-2 receptor; previous trials have demonstrated activity in the treatment of HER-2 overexpressing late stage breast cancer. • Performed a clinical trial testing trastuzumab in subset of HER-2 positive women with early stage breast cancer

Disease-Free Survival Survival AC→T+H →H 62 events AC→T+H →H 134 events AC→T 92 events 94% AC→T 261 events 91% 87% 92% 85% 87% 75% HR=0.48, 2P=3x10-12 HR=0.67, 2P=0.015 67% Years Years Romond et al, NEJM 2005

Avoiding subgroup analysis: Targeted Phase II/III Trials Patient Selection for targeted therapies • Test the recommended dose on patients who are most likely to respond based on their molecular expression levels • May result in a large savings of patients (Simon & Maitournam, Clinical Cancer Research 2004)

Trials in targeted populations • Gains in efficiency depend on marker prevalence and relative efficacy in marker + and marker – patients • Details: Session #13 tomorrow (Simon & Maitournam, CCR 2004)

Case Study: Stage II colon cancer • Colon cancer: Prognosis defined by stage • Prior trials generally enrolled patients with both stage II and III disease • Previous randomized trials uniformly demonstrate benefit of chemotherapy in stage III patients (node positive) • Previous trials & pooled analyses mixed regarding benefit in stage II patients • No single trial powered for modest effect seen in stage II ( ↑ 2-3% in 5 year survival)

Meta-analysis Stage II Adjuvant Therapy N=2,732 RR=0.88 P=0.08 Benson et al. J Clin Oncol. 2004

American Society of Clinical Oncology Guidelines 2004 • Direct evidence from randomized trials does not support routine use of chemotherapy for patients with stage II colon cancer. • Those who accept the relative benefit in stage III disease as adequate indirect evidence of benefit for stage II disease are justified in considering chemotherapy, particularly for patients with high-risk stage II disease. • Ultimate clinical decision should be based on discussions with the patient. Benson et al. J Clin Oncol. 2004

R New therapy: FOLFOX FOLFOX4: LV5FU2+ oxaliplatin 85 mg/m² N=2246 Stage II: 40% Stage III: 60% LV5FU2 Primary end-point: disease-free survival (DFS) de Gramont et al., ASCO 2005

Disease-free Survival (ITT) 1.0 0.9 0.8 6.6% 0.7 0.6 Events FOLFOX4 279/1123 (24.8%) LV5FU2 345/1123 (30.7%) HR [95% CI]: 0.77 [0.65–0.90] DFS probability 0.5 0.4 0.3 p<0.001 0.2 0.1 0.0 0 6 12 18 24 30 36 42 48 54 60 66 Months de Gramont et al., ASCO 2005

Disease-free Survival (ITT) Stage II and Stage III Patients 1.0 0.9 3.5% 0.8 0.7 8.6% 0.6 DFS probability 0.5 0.4 0.3 FOLFOX4 – 451 Stage II LV5FU2 – 448 Stage II FOLFOX4 – 672 Stage III LV5FU2 – 675 Stage III HR [95% CI]: 0.82 [0.60–1.13] Stage II 0.75 [0.62–0.89] Stage III 0.2 0.1 0.0 0 6 12 18 24 30 36 42 48 54 60 66 Months Data cut-off: January 16, 2005 de Gramont et al., ASCO 2005

DFS in high-risk* stage II patients 1.0 0.9 5.4% 0.8 Probability 0.7 HR 0.76 FOLFOX4 – 286 HRStage II LV5FU2 – 290 HR Stage II 0.6 0 6 12 18 24 30 36 42 48 DFS (months) • *T4 and/or bowel obstruction and/or tumor perforation and/or poorly differentiated tumor and/or venous invasion and/or <10 examined LNs • Data cut-off: January 16, 2005 de Gramont et al., ASCO 2005

FDA Action • Approval of FOLFOX therapy only in stage III patients, even though trial designed for stage II and III patients • Possible rationale • Standard chemotherapy vs control not shown beneficial in stage II patients • This trial not significant for experimental vs standard chemotherapy

Stage II trial: QUASAR R A N D O M I Z E Observation (n = 1617) • Colon or rectal cancer • Stage I-III • Complete resection with no evidence of residual disease No clear indicationfor chemotherapy (n = 3239) Chemotherapy (n = 1622)* Gray et al. ASCO 2004. Abstract 3501. At: http://www.asco.org/ac/1,1003,_12-002511-00_18-0026-00_19-0010698,00.asp. Accessed November 2004.

QUASAR: Overall Survival 100 Observation (n=1622) Chemotherapy (n=1617) 80 60 % of Patients 40 P = .02 5-year OS, Observation = 77.4% vs Chemotherapy = 80.3% Relative risk = 0.83 (95% CI, 0.71-0.97) 20 0 0 1 2 3 4 5 6 7 8 9 10 Years Gray et al. ASCO 2004. Abstract 3501. At: http://www.asco.org/ac/1,1003,_12-002511-00_18-0026-00_19-0010698,00.asp.Accessed November 2004.

Implication: Stage II patients • Compared to control, 5-FU provides 2-3% ↑ in OS, statistically significant in a single trial • Debate over clinical relevance • In a large trial, FOLFOX provides 3-4% ↑ in DFS compared to 5-FU, not statistically significant for stage II alone • No hint of interaction between rx and stage, p = 0.77 • On its own, debatable benefit compared to 5-FU • Cross trial comparison: FOLFOX may result in 5-7% improvement vs control, but not approved • No debate about clinical relevance Grothey & Sargent, JCO 2005

Stage II Colon Cancer: Lessons Learned • Decisions based on subgroups may seem rational at the time, but lead to unintended consequences • Results may make further trials impossible (FOLFOX vs control) • Need better approaches to analyze subgroups using modeling (or meta-analyses), not individual trial results

Potential solution for prospectively defined subgroups: Hierarchical models • Goal: Test a treatment in a number of populations • Hypothesis: Effect may depend vary between populations • Example: Targeted cancer therapy • Mechanism of action based therapy • Multiple tumor types express ‘target’, to varying degrees

Basic statistical formulation • Suppose N subgroups, with mean response mi, i=1,...N • Assume mi ~ N(m,s2) • If Bayesian, put a prior on s2 • Depending on estimate of s2, allows heterogeneity between subgroups • Easily extends to non-normal models

Hierarchical Model: Example • Phase II clinical trial of a new agent specifically targeted at patients with a methylated MGMT promoter • Prevalence from 10% to 60% across various cancer types • High prevalence seen in Head and Neck, Esophageal, Colorectal, and Non Small-Cell Lung Cancer • Goal: Determine if overall efficacy > 10%, but efficacy may depend on tumor type

Logistic regression Example • Hierarchical logistic model for tumor response • Stopping rules for each tumor site • P ( Response ratei > 10%) < 10% OR • P (Response ratei > 10%) < 25% & P (Response rateOverall > 10%) < 10% • Simulation for operating characteristics • Benefits • Single trial (opposed to 4) • Use all data formally but flexibly

Survival Example • Survival following chemotherapy for colon cancer • Pooled analysis of 5 trials, suggestion of a study-specific treatment effect (a different type of subgroup) • Fit a random effect Cox model • l(t; x) = l0i(t) exp (xmi) • mi ~ N(m,s2) • Can either model l0 parametrically, or use Cox model

Model Results Prior mean for precision (1/s2) = 50, posterior mean 106, Little evidence of heterogeneity Sargent et al, 2000

Another approach: Modeling Interactions using Shrinkage • Subgroup analyses are fundamentally looking at interactions • In multi-factor experiment, the number of interactions can explode • Well known that shrinkage (or model averaging) provides much better performance than all or nothing approach (stepwise) • Idea: Include interactions in model, but shrink them away if they are not strongly supported by the data

Another approach: Modeling Interactions using shrinkage • Dental Experiment • Dentures are often made with a soft liner between the gums and the hard denture base • Polishing the liner can cause a gap between the liner and the base • Such gaps harbor pathogens like Candida • The experiment • Main interest: new vs. standard soft liner material • Factor M: 2 materials • Factor P: 4 polishing methods • Factor F: 8 finishing methods • Fully crossed design, no replication • Outcome measure: gap btwn liner & base, in log10 mm Pesun, Hodges & Lai (2002) J. Prosthetic Dentistry

Smoothing interactions: Smoothed ANOVA • Fit full ANOVA model (include all interactions) • y = XQ + e • y is 64 x 1, contains log10 gap • e is 64 x 1, normal mean 0, precision h0I64 • X is 64 x 64 • Q is 64 x 1; we will smooth/shrink its elements • 12 main effects, 52 interactions • Model interactions • qk ~ N (0,1/ fk) , k=13, …, 64 • Large fk implies qk shrunk toward 0

Smoothed ANOVA: The model/prior for the fk • How to model the interactions • Each interaction smoothed by its own fk • Each effect's fk are all the same, feffect • All two-way interactions are smoothed by a single f • Mix the above options • Use priors on fk to specify desired operating characteristics for interactions

Use Degrees of Freedom to set priors for the fk • Hodges & Sargent (2001 Biometrika) extended methods for computing DF in standard ANOVA to linear hierarchical models • Hodges et al (Technometrics, 2006) present methodology to use DF to set priors • Example: I want the 51 2-way interactions to share 5 degrees of freedom • See references for technical details • Ongoing work: extending to non-linear (Cox) models

Summary: Smoothed ANOVA • Subgroup analyses are fundamentally looking at interactions • A priori have low probability of a significant interaction, but don’t want to exclude the possibility • Idea: Include interactions in model, but shrink them

Summary • Subgroup analysis is essential to clinical research • People usually perform such analyses with best of intentions • Up-front thought can allow us to • Carefully define population under study • Pre-specify sub-populations to be examined • Hierarchical/Shrinkage models offer attractive possibilities for addressing subgroups, if defined prospectively

Thank You • Acknowledgements • Smoothed ANOVA: Jim Hodges • Colon Cancer: Axel Grothey, Aimery deGramont, Sharlene Gill

Subgroup Analyses: Can We ‘Smooth' out the Rough Edges?