Clinical Trial Designs for Prognostic & Predictive Classifiers

Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

Validation = Fitness for Intended Use

Intended Uses • Prognostic biomarkers • Measured before treatment to indicate long-term outcome for patients untreated or not receiving chemotherapy • Used to determine who who doesn’t need more treatment • Predictive biomarkers • Measured before treatment to identify who will benefit from a particular treatment • Early detection biomarkers • Disease progression biomarkers

Prognostic Biomarkers in Oncology • Most gene expression signatures are developed as prognostic biomarkers. • Like numerous previously developed prognostic markers, most will never be used because they have not been demonstrated to be therapeutically relevant • Most prognostic marker studies are not conducted with an intended use clearly in mind • Most use a convenience sample of heterogeneous patients for whom tissue is available rather than patients selected for evaluating an intended use

Prognostic Markers in Oncology • There is rarely attention to analytical validation • There is rarely a separate validation study that addresses medical utility • Without a defined intended use, validation is meaningless and impossible

Prognostic Biomarkers Can Have Medical UtilityNode Negative ER Positive Breast Cancer • Intended use is to identify patients who are likely to be cured by surgery/radiotherapy and hormonal therapy and therefore are unlikely to benefit from adjuvant chemotherapy • Oncotype Dx recurrence score • MammaPrint

Types of Validation • Analytical validation • Accuracy in measurement of analyte • Robustness and reproducibility • Clinical validation • Correlation of score/classifier with clinical state or outcome • Medical utility • Actionable • Use results in patient benefit

Medical Utility • Benefits patient by improving treatment decisions • Depends on context of use of the biomarker • Treatment options and practice guidelines • Other prognostic factors

Clinical validity vs medical utility • A prognostic signature for patients with breast cancer may correlate with outcome, but does it identify a set of patients who have such good outcome without chemotherapy that they do not require treatment? • A prognostic signature for patients with early NSCLC may correlate with outcome, but does it identify a set of patients who have poor outcome untreated and benefit from chemotherapy?

Developmental vs Validation Studies • Developmental studies screen candidate markers to develop biomarker scores or classifiers • Train classifiers, optimize tuning parameters, set cut-off values for classification • Developmental studies often use cross-validation or split-sample validation to provide a preliminary estimate of the accuracy of the marker/classifier for predicting a clinical outcome • Developmental studies generally address clinical-validity (i.e. prediction accuracy), not medical utility

Developmental vs Validation Studies • Validation studies use a previously developed, completely specified classifiers/scores • Validation studies should use analytically validated tests and focus on medical utility, not predictive accuracy • This often requires a prospective clinical trial

Marker Strategy Design

SOC is Chemorx Marker Strategy Design

Marker Strategy Design • Generally very inefficient because many patients in both randomization groups receive the same treatment • So inefficient as to be an insurmountable roadblock to validation of potentially valuable classifiers

Marker Strategy Design • Sometimes poorly informative • Not measuring marker in control group means that merits of complex marker treatment strategies cannot be dissected • Requires a marker/signature to be used for determining treatment decisions which may result in inferior outcome to the SOC

Marker Strategy Design • Data is not useful for evaluation of other markers or tests • Provides no information not provided by the test-all design

SOC is Chemorx Test-All Design

Develop Predictor of Response to New Drug Using phase II data, develop predictor of response to new drug Patient Predicted Responsive Patient Predicted Non-Responsive Off Study New Drug Control Targeted (Enrichment) Design

Evaluating the Efficiency of Targeted Design • Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006 • Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.

Relative efficiency of targeted design depends on • proportion of patients test positive • effectiveness of new drug (compared to control) for test negative patients • When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients than the standard design in which the marker is not used

Develop Predictor of Response to New Rx Predicted Non-responsive to New Rx Predicted Responsive To New Rx New RX Control New RX Control Stratification Design for New Drug Development with Companion Diagnostic

Develop prospective analysis plan for evaluation of treatment effect and how it relates to biomarker • type I error should be protected for multiple comparisons • Trial sized for evaluating treatment effect overall and in subsets defined by test • Stratifying” (balancing) the randomization may be useful but is not a substitute for a prospective analysis plan.

Fallback Analysis Plan • Compare the new drug to the control overall for all patients ignoring the classifier. • If poverall ≤ 0.01 claim effectiveness for the eligible population as a whole • Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients • If psubset ≤ 0.04 claim effectiveness for the classifier + patients.

In some cases a trial with optimal structure for evaluating a new biomarker will have been previously performed and will have pre-treatment tumor specimens archived • Under certain conditions, a focused analysis based on specimens from the previously conducted clinical trial can provide highly reliable evidence for the medical utility of a prognostic or predictive biomaker • In some cases, it may be the only way of obtaining high level evidence

Prospective-Retrospective Study

Guidelines Proposed by Simon, Paik, HayesProspective-retrospective design • Adequate archived tissue from an appropriately designed phase III clinical trial must be available on a sufficiently large number of patients that the appropriate biomarker analyses have adequate statistical power and that the patients included in the evaluation are clearly representative of the patients in the trial. • The test should be analytically and pre-analytically validated for use with archived tissue. Testing should be perform blinded to the clinical data. • The analysis plan for the biomarker evaluation should be completely specified in writing prior to the performance of the biomarker assays on archived tissue and should be focused on evaluation of a single completely defined classifier. • The results should be validated using specimens from a similar, but separate study involving archived tissues.

Clinical Trial Designs for Prognostic & Predictive Classifiers