Development and Use of Predictive Biomarkers Dr. Richard Simon

Development and Useof Predictive BiomarkersDr. Richard Simon

Potential Conflict of Interest None

Development and Use of Predictive Biomarkers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

Biometric Research Branch Websitehttp://brb.nci.nih.gov • Powerpoint presentations • Reprints • BRB-ArrayTools software • Web based Sample Size Planning

Why are Metastatic Tumors Resistant? • Poor intracellular drug access in bulky tumors • Large tumors have low growth fractions • Old tumors have undergone many generations of replication, harbor many mutations and are mutationally heterogeneous • Metastatic tumors have survived many selection pressures, activated detoxification pathways and de-activated control pathways like apoptosis • …

How Can We Treat More Effectively • Treat early • Treat intensively • Treat with combinations • Treat with drugs that target the key oncogenic mutations that occurred early, are present in all cells of the tumor, drive the invasion of the tumor and to which the tumor is addicted • Characterize the key oncogenic mutations in individual tumors and select the right drugs for that tumor

Prognostic & Predictive Biomarkers • Most cancer treatments currently benefit only a minority of patients to whom they are administered • Being able to predict which patients are likely to benefit would • Save patients from unnecessary toxicity, and enhance their chance of receiving a drug that helps them • Help control medical costs • Improve the success rate of clinical drug development

Personalized Oncology is Here Today • Estrogen receptor over-expression in breast cancer • Anti-estrogens, aromatase inhibitors • HER2 amplification in breast cancer • Trastuzumab, Lapatinib • OncotypeDx in breast cancer • Low score for ER+ node - = hormonal rx • KRAS in colorectal cancer • WT KRAS = cetuximab or panitumumab • EGFR mutation or amplification in NSCLC • EGFR inhibitor

Different Kinds of Biomarkers • Endpoint Biomarkers • A measurement made on a patient before, during and after treatment to determine whether the treatment is working • Predictive biomarkers • Measured before treatment to identify who will benefit from a particular treatment • Prognostic biomarkers • Measured before treatment to indicate long-term outcome for patients untreated or receiving standard treatment

Endpoint Biomarkers • Surrogate Endpoints • It is very difficult to properly validate a biomarker as a surrogate of clinical benefit for use as an alternative endpoint in phase III trials • Partial Surrogate Endpoints • Necessary but not sufficient for clinical benefit • Pharmacodynamic biomarkers can be useful in phase I/II studies as measures of treatment effect • They need not be validated as surrogates for clinical benefit

Types of Validation for Prognostic and Predictive Biomarkers • Analytical validation • Measures accurately what it is supposed to measure • Clinical validation/correlation • Does the biomarker predict the clinical endpoint that it’s supposed to predict for independent data • Medical utility • Does use of the biomarker result in patient benefit • Depends on medical context, other prognostic factors, therapeutic options

Prognostic and Predictive Biomarkers in Oncology • Single gene or protein measurement • ER protein expression • HER2 amplification • KRAS mutation • Scalar index or classifier that summarizes expression levels of multiple genes

Prognostic Markers in Oncology • Most prognostic markers are not used because they are not therapeutically relevant • Most studies do not address medical utility • They use a convenience sample of patients for whom tissue is available • Most prognostic marker studies are not reliable because they are exploratory and not prospectively focused on a single marker

Prognostic Biomarkers Can be Therapeutically Relevant • <10% of node negative ER+ breast cancer patients require or benefit from the cytotoxic chemotherapy that they receive

p<0.0001 338 pts 149 pts 181 pts B-14 Results—Relapse-Free Survival Paik et al, SABCS 2003

Predictive Biomarkers • In the past often studied as un-focused post-hoc subset analyses of RCTs. • Numerous subsets examined • No pre-specified hypotheses • No control of type I error from multiple testing

Prospective Co-Development of Drugs and Companion Diagnostics • Develop a completely specified genomic classifier of the patients likely to benefit from a new drug • Establish analytical validity of the classifier • Use the completely specified classifier to design and analyze a focused clinical trial to evaluate effectiveness of the new treatment and how it relates to the candidate biomarker

Guiding Principle • The data used to develop the classifier should be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier • Developmental studies can be exploratory • Studies on which treatment effectiveness claims are to be based should be definitive studies that test a treatment hypothesis in a patient population completely pre-specified by the classifier

Develop Predictor of Response to New Drug Using phase II data, develop predictor of response to new drug Patient Predicted Responsive Patient Predicted Non-Responsive Off Study New Drug Control Enrichment Design

Enrichment Design • Primarily for settings where the classifier is based on a single gene whose protein product is the target of the drug and the biological evidence that the new treatment in marker negative patients is compelling • eg Herceptin

TrastuzumabHerceptin • Metastatic breast cancer • 234 randomized patients per arm • 90% power for 13.5% improvement in 1-year survival over 67% baseline at 2-sided .05 level • If benefit were limited to the 25% assay + patients, overall improvement in survival would have been 3.375% • 4025 patients/arm would have been required

Evaluating the Efficiency of Enrichment Design • Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004. • Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005. • reprints and interactive sample size calculations at http://linus.nci.nih.gov/brb

Efficiency of Enrichment Design • Depends on • proportion of patients test positive • effectiveness of new drug for test negative patients • When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the enrichment design requires dramatically fewer randomized patients

DevelopPredictor of Response to New Rx Predicted Responsive To New Rx Predicted Non-responsive to New Rx New RX Control New RX Control Stratification Design

Does not use the test to restrict eligibility, but to structure a prospective analysis plan • Having a prospective analysis plan is essential • “Stratifying” (balancing) the randomization is useful to ensure that all randomized patients have tissue available but is not a substitute for a prospective analysis plan • Size the study for adequate evaluation of T vs C separately by marker status • The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier • The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier

R Simon. Using genomics in clinical trial design, Clinical Cancer Research 14:5984-93, 2008 • R Simon. Designs and adaptive analysis plans for pivotal clinical trials of therapeutics and companion diagnostics, Expert Opinion in Medical Diagnostics 2:721-29, 2008

Web Based Software for Planning Clinical Trials of Treatments with a Candidate Predictive Biomarker • http://brb.nci.nih.gov

Use of Archived Specimens in Evaluation of Prognostic and Predictive BiomarkersRichard M. Simon, Soonmyung Paik and Daniel F. Hayes • Claims of medical utility for prognostic and predictive biomarkers based on analysis of archived tissues can be considered to have either a high or low level of evidence depending on several key factors. • Studies using archived tissues, when conducted under ideal conditions and independently confirmed can provide the highest level of evidence. • Traditional analyses of prognostic or predictive factors, using non analytically validated assays on a convenience sample of tissues and conducted in an exploratory and unfocused manner provide a very low level of evidence for clinical utility.

For Level I Evidence • Archived tissue adequate for a successful assay must be available on a sufficiently large number of patients from a phase III trial with a design that enables the appropriate analyses • Adequate statistical power • The patients included in the evaluation are clearly representative of the patients in the trial. • The test should be analytically and pre-analytically validated for use with archived tissue. • The analysis plan for the biomarker evaluation should be completely specified in writing prior to the performance of the biomarker assays on archived tissue and should be focused on evaluation of a single completely defined classifier. • The results of the analysis should be validated using specimens from a similar, but separate, study

Development of Prognostic & Predictive Classifiers using Gene Expression Profiles

Major Flaws Found in 40 Studies Published in 2004 • Inadequate control of multiple comparisons in gene finding • 9/23 studies had unclear or inadequate methods to deal with false positives • 10,000 genes x .05 significance level = 500 false positives • Misleading report of prediction accuracy • 12/28 reports based on evaluating accuracy in training set or using incomplete cross-validation • Misleading use of cluster analysis • 13/28 studies invalidly claimed that expression clusters based on differentially expressed genes could help distinguish clinical outcomes • 50% of studies contained one or more major flaws

Recent Literature Review of Expression Profiling in Early Lung CancerSimon & Subramanian • Most studies relating gene expression profiles to outcome of cancer patients do not address medical utility • The patients included are too heterogeneous with regard to stage • Failure to emphasize predictive accuracy over existing prognostic factors rather than statistical significance • Most publications feature highly misleading claims based on failure to separate the data used for model development from the data used for model evaluation • Sample splitting • Do not evaluate results in the training set! • Complete cross validation

New Challenges in Phase II Trial Design • Evaluating new drugs in molecularly heterogeneous diseases • Treating a sufficient number of patients whose tumors are thought to be good candidates for the drug • Developing a predictive biomarker for identifying target population and a robust test for use in the phase III trial • Development of effective combinations • Reliable use of endpoints other than objective response

Selecting Patients for Phase II Trial • If the phase II trial for a particular primary site is not enriched for patients thought responsive to the drug, an initial stage of 10-15 patients may contain very few appropriate patients • If drug target is thought known, accrual of separate cohort of 25-30 patients whose tumors are thought to be driven by the target gives best chance to evaluate drug • Small phase II trials are generally not adequate for developing or even refining predictive biomarkers

Phase II Designs

Evaluating a New Drug in Combination with Active Agents • For a new drug in combination with active agents, p0 represents the response probability of the active agents without the new drug in the same type of patients being selected for the phase II study of the combination regimen • The effectiveness of the single arm design is limited by the availability of a large number of comparable patients who have been treated with the active agents alone • For combination regimens, unless p0 is based on a large number of patients, the methods of Makuch-Simon or Bayesian Thall-Simon designs should be used instead of the optimal two-stage design. • The Makuch-Simon and Thall-Simon designs require individual patient data for historical controls. This increases focus on comparability and they take into account the actual number of historical controls and the resulting uncertainty in p0

Using Time to Progression or Stable Disease as Endpoint • Requires comparison to progression times for control patients not receiving drug • Proportion of patients with “stable disease” also requires a control group for evaluation to be meaningful • It is difficult to reliably evaluate time to progression endpoint without a randomized control group • With historical controls, specific controls should be used for whom comparability of prognosis and surveillance for progression can be established

Thall-Simon Bayesian Single Arm Phase II Designs Using a Specific Set of Historical Control Patients • Makuch, RW, and Simon, RM.: Sample size considerations for non-randomized comparative studies. J. Chron. Dis. 33: 175-181, 1980. • Dixon, DO, and Simon, R. Sample size considerations for studies comparing survival curves using historical controls. J. Clin. Epidemiology 41: 1209-1214, 1988. • Thall, P F and Simon R. A Bayesian approach to establishing sample size and monitoring criteria for phase II clinical trials. Controlled Clinical Trials 15:463-481, 1994. • Thall PF, Simon R, Estey E: A new statistical strategy for monitoring safety and efficacy in single-arm clinical trials. Journal of Clinical Oncology 14:296-303, 1996.

Randomized Phase II Screening DesignsSimon, Ellenberg, Wittes Cancer Treatment Reports 69:1375,1985 • For evaluating multiple new drugs, regimens or combinations to select most promising • Arm with greatest observed response rate is selected regardless of how small the difference is • Not for comparing a new drug/regimen to control • Randomization ensures uniform patient selection and evaluation • Can be used with time to progression endpoint

Phase 2.5 Trial Design • Randomization to new regimen vs control • E.g. C+X vs C • Endpoint is progression free survival regardless of whether it is an accepted phase III endpoint • Threshold of significance can exceed .05 for sample size planning • Simon R et al. Clinical trial designs for the early clinical development of therapeutic cancer vaccines. Journal of Clinical Oncology 19:1848-54, 2001 • Korn EL et al. Clinical trial designs for cytostatic agents: Are new approaches needed? Journal of Clinical Oncology 19:265-272, 2001

Total Sample SizeRandomized Phase 2.52 years accrual, 1.5 years followup

Acknowledgements • NCI Biometric Research Branch • Boris Freidlin • Yingdong Zhao • Alain Dupuy • Wenyu Jiang • Aboubakar Maitournam • Jyothi Subramanian • Soonmyung Paik, NSABP • Daniel Hayes, U. Michigan

Questions?

Development and Use of Predictive Biomarkers Dr. Richard Simon