Predictive Analysis of Clinical Trials

Predictive Analysis of Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://linus.nci.nih.gov/brb

Although the randomized clinical trial remains of fundamental importance for 21st century genomics based medicine, some of the conventional wisdom of how to design and analyze rct’s requires re-examination

In most positive phase III clinical trials comparing a new treatment to control, most of the patients treated with the new treatment do not benefit. • Adjuvant breast cancer: • 70% long-term disease-free survival on control • 80% disease-free survival on new treatment. • 70% of patients don’t need the new treatment. Of the remaining 30%, only 1/3rd benefit • Treat 10 to benefit 1

For most broad eligibility clinical trials in oncology, the primary ITT analysis has very low statistical power for the types of alternative hypothesis that are realistic to expect for molecularly targeted drugs based on the heterogeneity of histologic diagnoses

Cancers of a primary site and histologic type often represent a heterogeneous group of molecular diseases which vary fundamentally with regard to • the oncogenic mutations that cause them • their responsiveness to specific drugs

The standard approach to designing phase III clinical trials is based on two assumptions • Qualitative treatment by subset interactions are unlikely • “Costs” of over-treatment are less than “costs” of under-treatment

These assumptions were derived for studies of inexpensive treatments like aspirin for diseases that were presumed to be biologically homogeneous

Oncology therapeutics development is now focused on molecularly targeted drugs that can only be expected to be effective in a restricted set of patients whose tumors are driven by the molecular targets • Most new cancer drugs are very expensive

Standard Clinical Trial Approach • Has led to widespread over-treatment of patients with drugs to which few benefit • Is not scientifically well founded nor economically sustainable for future cancer therapeutics

Keys to developing effective drugs in oncology • The target of the drug must be central to disease invasion and progression for some sub-type of cases • Drug should be selective for the target so that it can be administered at a concentration that totally shuts down the de-regulated pathway • Need a test that identifies the patients who have disease driven by de-regulation of the target

Co-development of a new drug and test increases the complexity of development and presents new challenges for companies and regulators • To avoid unnecessary roadblocks to progress, oncologists and statisticians must • Discard the components of the “aspirin” paradigms that are not science based • Avoid treating all problems as hypothesis testing problems

Traditional Subset Analysis • In the past often studied as un-focused post-hoc analyses • Multiple tests, no control of type I error • To protect type I error, require that overall treatment is significant • Based on implicit assumption that qualitative interactions are unlikely, only evaluate treatment within a subset if there were a significant treatment by subset interaction

Neither current practices of subset analysis nor current practices of ignoring subset analysis are effective for evaluating treatments in biologically heterogeneous diseases

How can we develop new drugs in a manner more consistent with modern tumor biology and obtain reliable information about what regimens work for what kinds of patients?

When the Biology is Clear • Develop a classifier that identifies the patients likely (or unlikely) to benefit from the new drug • Develop an analytically validated test • Measures what it should accurately and reproducibly • Design a focused clinical trial to evaluate effectiveness of the new treatment in test + patients

Companion Test • Single gene or protein measurement • ER protein expression • BCR-ABL translocation • HER2 amplification • EGFR mutation • KRAS mutation • V600E mutation • ALK translocation • CD20 expression • Index or classifier that summarizes expression levels of multiple genes

Off Study Control New Drug Patient Predicted Non-Responsive Develop Predictor of Response to New Drug Patient Predicted Responsive Using phase II data, develop predictor of response to new drug Targeted (Enrichment) Design

Evaluating the Efficiency of Targeted Design • Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 • Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.

Relative efficiency of targeted design depends on • proportion of patients test positive • specificity of treatment effect for test positive patients • When less than half of patients are test positive and the drug has minimal benefit for test negative patients, the targeted design requires dramatically fewer randomized patients than the standard design in which the marker is not used

Intended use of the companion test is identifying patients who have disease subtype for which the drug is proven effective

The Biology is Often Not So Clear • Cancer biology is complex and it is not always possible to have the right single predictive classifier identified with an appropriate cut-point by the time the phase 3 trial of a new drug is ready to start accrual

Has the sponsor selected the right biomarker? • Can we adequately pre-define a cut-point to maker the biomarker binary? • What if we have several candidate markers? • E.g. Is de-regulation of the pathway best measured by protein expression of a receptor or amplification or mutation of the corresponding gene? • Does the sponsor have to establish effectiveness of the drug for every mutation of a specified gene

Develop Predictor of Response to New Rx Predicted Non-responsive to New Rx Predicted Responsive To New Rx New RX Control New RX Control “Stratification Design”“Interaction Design”Can of Worms Design

Can of Worms Design • Invites poor statistical analysis based on standard paradigm • Requiring that overall analysis be significant before evaluating test + subset • Requiring that a significant interaction be demonstrated before evaluating test + subset • Requiring that the randomization be stratified by the marker before evaluating test + subset • Ethically problematic in some cases

What is Important • Study-wise type I error be protected • Trial be sized to have adequate power for the comparisons of a-priori interest • Eg over-all analysis at .01 level • Analysis in test + subset at .04 level • The marker be measured on all (or almost all) patients using an analytically validated test • Sample size and futility analysis to protect test - patients

Principle • If a drug is found safe and effective in a defined patient population, approval should not depend on finding the drug ineffective in some other population

Strong confidence in test: Small r2 and large r1 Weak confidence in test: Small r2 and small r1 p00 selected to control type I error rates

Biomarker Adaptive Threshold Design • Randomized trial of E vs C • Single candidate predictive biomarker score B • No threshold for biomarker determined • Candidate thresholds b1, …, bK • Eligibility not restricted by biomarker

Compute Bootstrap Confidence Intervals for Threshold b

The confidence interval for the cut-point can be used to inform treatment decisions for future patients

Key Points • It can be beneficial not to define a cut-point for the biomarker prior to conducting the phase III clinical trial • The phase II database may be inadequate with regard to number of cases, lack of control group, different endpoint • The only thing that stands in the way of a more informative phase III trial is the aspirin paradigm that the ITT analysis of the eligible population is required to serve as a basis for approval

Learn (the right target population)and Confirm

Adaptive Signature Design

Adaptive Signature Design • Randomized trial comparing E to C • End of trial analysis: • Compare E to C for all patients at reduced significance level p0 (e.g. 0.01) • If overall H0 is rejected, then claim effectiveness of E for eligible patients • Otherwise • Can be used with any set of candidate predictive variables, not just high dimensional genomic measurements

Using only an unbiasedly selected subset of patients of pre-specified size (e.g. 1/3) to be used as a training set T, develop a binary “indication classifier” M of whether a patient is likely to benefit from E relative to C • The classifier may use a single marker selected from candidates or multiple markers • The classifier classifies patients into only 2 subsets; those predicted to benefit from E and those not predicted to benefit from E

Apply the classifier M to classify patients in the validation set V=D-T • Compare E vs C in the subset of V who are predicted to benefit from E using a threshold of significance of 0.05 – p0

The indication classifier is not a binary classifier of whether a patient has good prognosis or poor prognosis • It is a “two sample classifier” of whether the prognosis of a patient on E is better than the prognosis of the patient on C

The indication classifier can be a binary classifier that maps the vector of candidate covariates into {E,C} indicating which treatment is predicted superior for that patient • The classifier need not use all the covariates but variable selection must be determined using only the training set • Variable selection may be based on selecting variables with apparent interactions with treatment, with cut-off for variable selection determined by cross-validation within training set for optimal classification • The indication classifier can be a probabilistic classifier

Key Idea • Replace multiple significance testing by development of one indication classifier and obtain unbiased estimates of the properties of that classifier if used on future patients

This approach can also be used to identify the subset of patients who don’t benefit from E in cases where E is superior to C overall

Predictive Analysis of Clinical Trials