Personalized Predictive Medicine and Genomic Clinical Trials

Personalized Predictive Medicine and Genomic Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

Biometric Research Branch Websitebrb.nci.nih.gov • Powerpoint presentations • Reprints • BRB-ArrayTools software • Web based Sample Size Planning

Personalized Oncology is Here Today and Rapidly Advancing • Key information is in tumor genome, not in inherited genetics • Personalization is based on limited stratification of traditional diagnostic categories, not on individual genomes

Personalized Oncology is Here Today • Estrogen receptor over-expression in breast cancer • Anti-estrogens, aromatase inhibitors • HER2 amplification in breast cancer • Trastuzumab, Lapatinib • OncotypeDx in breast cancer • Low score for ER+ node - = hormonal rx • KRAS in colorectal cancer • WT KRAS = cetuximab or panitumumab • EGFR mutation or amplification in NSCLC • EGFR inhibitor

These Diagnostics Have Medical Utility • They inform therapeutic decision-making leading to improved patient outcome • Tests with medical utility help patients and may reduce medical costs • Tests correlated with outcome that are not actionable may increase medical costs without helping patients

Developing a test and demonstrating medical utility for it is a complex multi-step process that generally requires prospective randomized clinical trials

Although the randomized clinical trial remains of fundamental importance for predictive genomic medicine, some of the conventional wisdom of how to design and analyze rct’s requires re-examination • E.g. The concept of doing a rct of thousands of patients to answer a single question about average treatment effect for a heterogeneous target population no longer has an adequate scientific basis in oncology

Standard Approach is Based on Assumptions • Qualitative treatment by subset interactions are unlikely • i.e. if new treatment T is better than control C on average, it is better for all subsets of patients • “Costs” of over-treatment are less than “costs” of under-treatment

Cancers of a primary site often represent a heterogeneous group of diverse molecular diseases which vary fundamentally with regard to • the oncogenic mutations that cause them, • their responsiveness to specific drugs

How Can We Develop New Drugs in a Manner More Consistent With Modern Tumor Biology and ObtainReliable Information About What Regimens Work for What Kinds of Patients?

Predictive biomarkers • Measured before treatment to identify who will benefit from a particular treatment • Prognostic biomarkers • Measured before treatment to indicate long-term outcome for patients untreated or receiving standard treatment

Prognostic and Predictive Biomarkers in Oncology • Single gene or protein measurement • ER protein expression • HER2 amplification • KRAS mutation • Scalar index or classifier that summarizes expression levels of multiple genes

Prospective Co-Development of Drugs and Companion Diagnostics • Develop a completely specified genomic classifier of the patients likely to benefit from a new drug • Establish analytical validity of the classifier • Use the completely specified classifier to design and analyze a focused clinical trial to evaluate effectiveness of the new treatment and how it relates to the candidate biomarker

Targeted (Enrichment) Design • Restrict entry to the phase III trial based on the binary predictive classifier

Develop Predictor of Response to New Drug Using phase II data, develop predictor of response to new drug Patient Predicted Responsive Patient Predicted Non-Responsive Off Study New Drug Control

Applicability of Targeted Design • Primarily for settings where the classifier is based on a single gene whose protein product is the target of the drug • eg trastuzumab

Evaluating the Efficiency of Targeted Design • Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 • Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005. • reprints and interactive sample size calculations at http://linus.nci.nih.gov

Relative efficiency of targeted design depends on • proportion of patients test positive • effectiveness of new drug (compared to control) for test negative patients • Specificity of treatment • Sensitivity of test • When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients

DevelopPredictor of Response to New Rx Predicted Responsive To New Rx Predicted Non-responsive to New Rx New RX Control New RX Control Stratification Design

Do not use the test to restrict eligibility, but to structure a prospective analysis plan • Having a prospective analysis plan is essential • “Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have tissue available but is not a substitute for a prospective analysis plan • Size the study for adequate evaluation of T vs C separately by marker status • The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier • The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier

R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-93, 2008 • R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008

Analysis Plan B(Limited confidence in test) • Compare the new drug to the control overall for all patients ignoring the classifier. • If poverall ≤ 0.03 claim effectiveness for the eligible population as a whole • Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients • If psubset ≤ 0.02 claim effectiveness for the classifier + patients.

Sample size for Analysis Plan B • To have 90% power for detecting uniform 33% reduction in overall hazard at 3% two-sided level requires 297 events (instead of 263 for similar power at 5% level) • If 25% of patients are positive, then when there are 297 total events there will be approximately 75 events in positive patients • 75 events provides 75% power for detecting 50% reduction in hazard at 2% two-sided significance level • By delaying evaluation in test positive patients, 80% power is achieved with 84 events and 90% power with 109 events

Analysis Plan C • Test for difference (interaction) between treatment effect in test positive patients and treatment effect in test negative patients at an elevated level (e.g. .10) • If interaction is significant at that level then compare treatments separately for test positive patients and test negative patients • Otherwise, compare treatments overall

Sample Size Planning for Analysis Plan C • 88 events in test + patients needed to detect 50% reduction in hazard at 5% two-sided significance level with 90% power • If 25% of patients are positive, when there are 88 events in positive patients there will be about 264 events in negative patients • 264 events provides 90% power for detecting 33% reduction in hazard at 5% two-sided significance level

Does the RCT Need to Be Significant Overall for the T vs C Treatment Comparison? • No • That requirement has been traditionally used to protect against data dredging. It is inappropriate for focused trials of a treatment with a companion test.

Web Based Software for Planning Clinical Trials of Treatments with a Candidate Predictive Biomarker • http://brb.nci.nih.gov

It is difficult to have the right single completely defined predictive biomarker identified and analytically validated by the time the pivotal trial of a new drug is ready to start accrual • Changes in the way we do phase II trials • Adaptive methods for the refinement and evaluation of predictive biomarkers in the pivotal trials in a non-exploratory manner • Use of archived tissues in focused “prospective-retrospective” designs based on randomized pivotal trials

Multiple Biomarker Design • Have identified K candidate binary classifiers B1 , …, BK thought to be predictive of patients likely to benefit from T relative to C • Eligibility not restricted by candidate classifiers • For notation let B0 denote the classifier with all patients positive

Test T vs C restricted to patients positive for Bk for k=0,1,…,K • Let S(Bk) be log partial likelihood ratio statistic for treatment effect in patients positive for Bk (k=1,…,K) • Let S* = max{S(Bk)} , k* = argmax{S(Bk)} • For a global test of significance • Compute null distribution of S* by permuting treatment labels • If the data value of S* is significant at 0.05 level, then claim effectiveness of T for patients positive for Bk*

Let S* = max{S(Bk)} , k* = argmax{S(Bk)} in actual data • The new treatment is superior to control for the population defined by k* • Repeating the analysis for bootstrap samples of cases provides • an estimate of the stability of k* (the indication) • an interval estimate of S* (the size of treatment effect for the size of treatment effect in the target population)

Adaptive Signature Design Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

Adaptive Signature DesignEnd of Trial Analysis • Compare E to C for all patients at significance level α0 (eg 0.04) • If overall H0 is rejected, then claim effectiveness of E for eligible patients • Otherwise

Otherwise: • Using only the first half of patients accrued during the trial, develop a binary classifier that predicts the subset of patients most likely to benefit from the new treatment T compared to control C • Compare T to C for patients accrued in second stage who are predicted responsive to T based on classifier • Perform test at significance level 1- α0 (eg 0.01) • If H0 is rejected, claim effectiveness of T for subset defined by classifier

Treatment effect restricted to subset.10% of patients sensitive, 10 sensitivity genes, 10,000 genes, 400 patients.

Cross-Validated Adaptive Signature Design Freidlin B, Jiang W, Simon R Clinical Cancer Research 16(2) 2010

Prediction Based Analysis of Clinical Trials • Using cross-validation we can evaluate our methods for analysis of clinical trials, including complex subset analysis algorithms, in terms of their effect on improving patient outcome via informing therapeutic decision making • This approach can be used with any set of candidate predictor variables

Define an algorithm A for developing a classifier of whether patients benefit preferentially from a new treatment T relative to C • For patients with covariate vector x, the algorithm predicts preferred treatment • Applying A to a training dataset D provides a classifier model M(A, D) • R(x |M(A, D) ) = T • R(x | D) = C

At the conclusion of the trial randomly partition the patients into K approximately equally sized sets P1 , … , P10 • Let D-i denote the full dataset minus data for patients in Pi • Using K-fold complete cross-validation, omit patients in Pi • Apply the defined algorithm to analyze the data in D-i to obtain a classifier M-i • For each patient j in Pi record the treatment recommendationi.e. Rj=T or Rj=C

Repeat the above for all K loops of the cross-validation • All patients have been classified as what their optimal treatment is predicted to be

Let ST denote the set of patients for whom treatment T is predicted optimal i.e. ST = {i : Rj=T} • Compare outcomes for patients in S who actually received T to those in S who actually received C • Let zT= standardized log-rank statistic • Let SC denote the set of patients for whom treatment C is predicted optimal i.e. SC = {i : Rj=C} • Compare outcomes for patients in SC who actually received T to those in S who actually received C • Let zC = standardized log-rank statistic

Test of Significance for Effectiveness of T vs C • Compute statistical significance of zT and zC by randomly permuting treatment labels and repeating the entire procedure • Do this 1000 or more times to generate the permutation null distribution of treatment effect for the patients in each subset

The significance test based on comparing T vs C for the adaptively defined subset is the basis for demonstrating that T is more effective than C for some patients. • Although there is less certainty about which patients actually benefit, classification may be substantially greater than for the standard clinical trial in which all patients are classified based on results of testing the single overall null hypothesis

70% Response to T in Sensitive Patients25% Response to T Otherwise25% Response to C20% Patients Sensitive

Personalized Predictive Medicine and Genomic Clinical Trials