Design & Analysis of Phase III Trials for Predictive Oncology

Design & Analysis of Phase III Trials for Predictive Oncology Richard Simon Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

How can therapeutics development be successful if tumors contain dozens, hundreds or thousands of mutations, with substantial intra-tumor heterogeneity? • How should we modify our paradigms for clinical development in light of inter and intra tumor genomic heterogeneity? • Closing comments on translational research

Long History of Multi-hit Models of Oncogenesis • Armitage & Doll • Knudson • Moolgavkar • Loeb • Tomlinson & Bodmer • Simon & Zheng • Novack & Michor • others

Data • Age-incidence curves of human tumors by primary site • Steps of oncogenesis in model systems • Sequencing of human tumors

My Synthesis of the Models • A small number (e.g. 2-4) of rate limiting events, occurring approximately at normal mammalian mutation rates (10-9 /base pair /cell division) establish a tumor or 107 or more cells • Models based on 2-4 events occurring at normal mammalian mutation rates account for observed age-incidence curves of carcinomas of many primary sites where they have been evaluated. • The initiated tumor then accumulates additional mutations, some of which may be important to the tumor phenotype, but which are not rate-limiting to the development of an invasive, metastatic tumor • i.e. the initial mutations put in place a process which inevitably leads, over time, to cancers containing numerous additional mutations

Even at normal mutation rates, by the time that there are 109 clonogenic tumor cells, every possible base mutation will occur in some cell with each round of cell division • Mutator phenotypes can accelerate the process and are presumably important in some cases since genes with key functions for ensuring DNA and chromosome fidelity are commonly mutated

Mutational complexity of the tumor at diagnosis is influenced by tumor age (number of generations of replication) • “Old tumors” are more mutationally complex • Treatment effectiveness depends on mutational age at time of treatment • high growth fraction” tumors like pediatric ALL, DLBCL, Burkitt’s lymphoma, germ cell are relatively young tumors

Success, Where Possible, Likely Requires • Inhibiting pathways deregulated by early oncogenic mutations • Using combinations of molecularly targeted drugs • Treating early • Before mutational meltdown • Treating the right tumors with the right drugs

House of Cards ModelP. Workman • “The tumor requires each of the initial oncogenic mutations to power up malignancy; remove any one of the molecular batteries and the cancer cell collapses like a house of cards.”

Oncogene Addiction ModelB. Weinstein • “Subsequent mutations are viable only in the context of the initial oncogenic mutations. The initial mutations lead to the ‘hard-wiring’ of mission critical oncogenic pathways and the loss of alternative or redundant signal transduction pathways.”

Barn Door Model • The initial oncogenic mutations facilitate the acquisition of numerous additional mutations. Once the additional mutations occur, the protein products of the initial oncogenic mutations are no longer key molecular targets because alternative pathways to expansion and invasion have been activated.

Phase II Trials • Find or evaluate predictive biomarkers for identifying patients whose tumors are sensitive to the regimen • Patients tumors should be molecularly characterized • If it works at all, it’s likely to work only in some patients but may work very well for them • Combinations of molecularly targeted agents • Screening multiple combinations of targeted agents in molecularly defined subsets of patients

Phase III Trials • Transition from a culture of broad eligibility phase III trials followed by exploratory subset analysis to targeted phase III trials or trials incorporating focused prospectively defined subset analysis in the primary analysis plan

Roadmap for Co-Development of New Drugs with Companion Diagnostics • Develop during phase II a completely specified genomic classifier of the patients likely to benefit from a new drug Single gene/protein or composite gene expression classifier • Develop an analytic validated assay (reproducibe and robust) for the classifier • Use the completely specified classifier to design and analyze a phase III clinical trial to evaluate effectiveness of the new treatment with a pre-defined analysis plan.

Targeted (Enrichment) Design • Restrict entry to the phase III trial based on the binary predictive classifier

Develop Predictor of Response to New Drug Using phase II data, develop predictor of response to new drug Patient Predicted Responsive Patient Predicted Non-Responsive Off Study New Drug Control

Applicability of Targeted Design • Primarily for settings where the drug effect is specific, the biology of the target is well understood, and an accurate assay is available • Advantage of design is that the target population is clear and trial clearly must be sized for the test+ patients • With a strong biological basis for the test and a drug with potentially serious toxicity, it may be unacceptable to expose test negative patients to the drug • Analytical validation, biological rationale and phase II data provide basis for regulatory approval of the test, if needed

Relative efficiency of targeted design depends on • proportion of patients test positive • effectiveness of new drug (compared to control) for test negative patients • When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients

Develop Predictor of Response to New Rx Predicted Responsive To New Rx Predicted Non-responsive to New Rx New RX Control New RX Control Biomarker Stratified Design

Biomarker Stratified Design • Do not use the diagnostic to restrict eligibility, but to structure a prospective analysis plan • Having a prospective analysis plan for how the test will be used in the analysis and having the trial appropriately sized are essential • “Stratifying” (balancing) the randomization ensures that all randomized patients have tissue available but is not a substitute for a prospective analysis plan • Delaying assay performance provides additional time for assay development but inhibits early termination of accrual of assay negative patients • The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets

R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-93, 2008

Fallback Analysis Plan(Limited confidence in test) • Compare the new drug to the control overall for all patients ignoring the classifier. • If poverall 0.03 claim effectiveness for the eligible population as a whole • Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients • If psubset 0.02 claim effectiveness for the classifier + patients.

Analysis Plan with K Binary Classifiers • Test T vs C restricted to patients positive for Bk for k=1,…,K • Let pk be the p value for treatment effect effect in patients positive for Bk (k=1,…,K) • Let p* = min {p1 , …, pK} • Compute null distribution of p* by permuting treatment labels • If the data value of p* is significant at the 0.02 level, then claim effectiveness of T for patients positive for Bk*

Randomize Perform test and employ test determined treatment Standard of care treatment Marker Strategy Design

Randomize Perform test and employ test determined rx Randomize T C Marker Strategy Design

Phase III RCT of new regimen T vs control C • Multiple candidate predictive biomarkers or whole genome expression profiling • Prospectively specified classifier development algorithm

Partition the patients into K (e.g. 5 or 10) groups V1, V2, …, VK • Form a training set by omitting one of the K parts T1={1,2,…,N} - V1 The omitted part V1 is the validation set • Using the training set, apply the prospectively defined classifier development algorithm to develop a model that classifies patients (based on their measured covariates and biomarkers) as either • Sensitive: Likely to benefit from T more than control C • Not Sensitive: not likely to benefit from T more than C • Using this model, classify the patients in the test set

Repeat this procedure K times, leaving out a different part each time • After this is completed, all patients in the full dataset are classified as sensitive or insensitive • All patients have been classified using a classifier developed on a training set that did not include them

Identify the “sensitive” subset i.e. those predicted as likely to benefit more from T than from C. Also identify the remaining “insensitive” subset. • Sensitive subset analysis • Compare outcomes of patients who received T to outcomes of patients who received C • Compute Kaplan-Meier curves of T vs C and log-rank test statistic LS • Insensitive subset analysis • Compare outcomes of patients who received T to outcomes of patients who received C • Compute Kaplan-Meier curves of T vs C and log-rank test statistic LIS

Generate the null distributions of LS and LIS by permuting the treatment labels and repeating the entire K-fold cross-validation • If significant, claim effectiveness of T for subset defined by classifier

Two-Treatment Classifier Development Algorithm for Binary Endpoint • Develop models in training set of the probability of “success” for a patient based on the covariate vector x • Separate models for treatment group T and control group C • P(X | T) and P(X | C) • Many kinds of model development algorithms can be used • If P(X | T) – P(X | C) > delta • Classify patient in validation set with covariate vector X as likely to benefit more from T than C • Otherwise, classify patient as not likely to benefit more from T

70% Response to T in Sensitive Patients25% Response to T Otherwise25% Response to C20% Patients Sensitive

Classifier for future use is determined by applying the classification development algorithm to the full dataset

Prediction Based Clinical Trials • Using cross-validation we can perform prospective “subset analysis” as part of the primary analysis • Using a prospectively defined model building algorithm we can internally validate the treatment comparison predictions of the model • Using cross-validation we can evaluate new predictive tools • Based on predictive accuracy • With regard to their intended use which is informing therapeutic decision making

Final Comments on Translational Research

“Translational research” is in many cases a misnomer • Many basic research findings do not go far enough to be “translated” • do not provide key drug-able molecular targets. P53, Rb, APC

When the gap is relatively narrow, effective translation takes place, often by industry (large or small) • Broad gaps are in many cases too difficult and high risk to bridge by industry or by investigator initiated research

Breakthrough Around the Corner

Bridging broad gaps may in some cases be accomplished by prioritization and resource mobilization • Penicillin development languished for over a decade until it was stimulated by targeted funding from Rockefeller Foundation and a major project commitment by US govt with over 1000 chemists involved • The atomic bomb would not have been developed without a Manhattan project • Major focused initiatives involving academic investigators, industry, and government may be needed for bridging key roadblocks to progress.

Acknowledgements • Boris Freidlin • Wenyu Jian • Xinan Zhang

Design & Analysis of Phase III Trials for Predictive Oncology