1 / 77

Development of Prognostic and Predictive Biomarkers and their Use in Clinical Trial Design

Development of Prognostic and Predictive Biomarkers and their Use in Clinical Trial Design. Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov. BRB Website http://brb.nci.nih.gov. Powerpoint presentations and audio files

sbrower
Download Presentation

Development of Prognostic and Predictive Biomarkers and their Use in Clinical Trial Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Development of Prognostic and Predictive Biomarkers and their Use in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

  2. BRB Websitehttp://brb.nci.nih.gov • Powerpoint presentations and audio files • Reprints & Technical Reports • Sample Size Planning for Targeted Clinical Trials • BRB-ArrayTools software • BRB-ArrayTools Data Archive

  3. Many cancer treatments benefit only a small proportion of the patients to which they are administered • Advantages of targeting treatment to the right patients • Treated patients benefit • Greater chance of success in clinical development • Societal benefit on health care costs

  4. Biomarkers • Prognostic classifiers • Measured before treatment to indicate long-term outcome for patients untreated or receiving standard treatment • Predictive classifiers • Measured before treatment to select good patient candidates for a particular treatment

  5. Biomarkers • Pharmacodynamic endpoint • Measured before and after treatment to monitor effect of treatment • Surrogate endpoints • Measured before and after treatment to determine whether the treatment is providing clinical benefit

  6. Surrogate Endpoints • It is extremely difficult to properly validate a biomarker as a surrogate for clinical outcome. • It requires a series of randomized trials with both the candidate biomarker and clinical outcome measured • Pharmacodynamic endpoints need not be validated surrogates of clinical benefit for use in phase I or phase II studies

  7. Most prognostic factors are not used because they are not therapeutically relevant • Most prognostic factor studies use a convenience sample of patients for whom tissue is available. Often the patients are too heterogeneous to support therapeutically relevant conclusions • Prognostic factors in focused population can be therapeutically useful • Oncotype DX

  8. Validation=Fit for Purpose • FDA terminology of “valid biomarker” and “probable valid biomarker” are not appropriate for prognostic or predictive classifiers • “Validation” should mean fitness for intended purpose and the purpose of prognostic/predictive classifiers are completely different than for surrogate endpoints

  9. Types of Validation for Prognostic and Predictive Biomarkers • Analytical validation • When there is a gold standard • Sensitivity, specificity, ppv, npv, ROC • Pre-analytical and post-analytical robustness • Clinical validation • Does the biomarker predict what it’s supposed to predict for independent data • Clinical utility • Does use of the biomarker result in patient benefit

  10. Clinical Utility • Benefits patient by improving treatment decisions • Depends on context of use of the biomarker • Treatment options and practice guidelines • Other prognostic factors • With new drug of for guiding use of standard treatments

  11. The Roadmap • Develop a completely specified genomic classifier of the patients likely to benefit from a new drug • Pre-clinical, phase I, II studies • Establish analytical validity of the classifier • Use the completely specified classifier to design and analyze a new clinical trial to evaluate effectiveness of the new treatment with a pre-defined analysis plan.

  12. Guiding Principle • The data used to develop the classifier must be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier • Developmental studies are exploratory • Studies on which treatment effectiveness claims are to be based should be definitive studies that test a treatment hypothesis in a patient population completely pre-specified by the classifier

  13. Development of Genomic Classifiers • Single gene or protein based on knowledge of therapeutic target • Empirically determined based on evaluation of a set of candidate classifiers • e.g. EGFR assays • Empirically determined based on genome-wide correlating gene expression or genotype to patient outcome after treatment

  14. New Drug Developmental Strategy (I) • Develop a predictive classifier that identifies the patients likely to benefit from the new drug • Technical validation of the test • Use the test to restrict eligibility to a randomized evaluation of the new drug

  15. Develop Predictor of Response to New Drug Using phase II data, develop predictor of response to new drug Patient Predicted Responsive Patient Predicted Non-Responsive Off Study New Drug Control

  16. Applicability of Design I • Primarily for settings where the classifier is based on a single gene whose protein product is the target of the drug • eg Herceptin • Without strong biological rationale or adequate phase II data on unselected patients, FDA may have difficulty approving the test

  17. Evaluating the Efficiency of Strategy (I) • Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004. • Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005. • reprints and interactive sample size calculations at http://linus.nci.nih.gov/brb

  18. Two Clinical Trial Designs • Un-targeted design • Randomized comparison of T to C without screening for expression of molecular target • Targeted design • Assay patients for expression of target • Randomize only patients expressing target

  19. Efficiency relative to trial of unselected patients depends on proportion of patients test positive, and effectiveness of drug (compared to control) for test negative patients • When less than half of patients are test positive and the drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients

  20. No treatment Benefit for Test - Patientsnstd / ntargeted

  21. Treatment Benefit for Test – Pts Half that of Test + Pts nstd / ntargeted

  22. Trastuzumab • Metastatic breast cancer • 234 randomized patients per arm • 90% power for 13.5% improvement in 1-year survival over 67% baseline at 2-sided .05 level • If benefit were limited to the 25% assay + patients, overall improvement in survival would have been 3.375% • 4025 patients/arm would have been required • If assay – patients benefited half as much, 627 patients per arm would have been required

  23. Web Based Software for Comparing Sample Size Requirements • http://brb.nci.nih.gov

  24. Develop Predictor of Response to New Rx Predicted Responsive To New Rx Predicted Non-responsive to New Rx New RX Control New RX Control Developmental Strategy (II)

  25. Developmental Strategy (II) • Do not use the diagnostic to restrict eligibility, but to structure a prospective analysis plan • Having a prospective analysis plan is essential; “stratifying” (balancing) the randomization ensures that all randomized patients will have tissue available • The purpose of the study is to evaluate the new treatment overall and for the pre-defined subsets; not to modify or refine the classifier • The purpose is not to demonstrate that repeating the classifier development process on independent data results in the same classifier

  26. Analysis Plan A • Compare the new drug to the control for classifier positive patients • If p+>0.05 make no claim of effectiveness • If p+ 0.05 claim effectiveness for the classifier positive patients and • Compare new drug to control for classifier negative patients using 0.05 threshold of significance

  27. Sample size for Analysis Plan A • 88 events in test + patients needed to detect 50% reduction in hazard at 5% two-sided significance level with 90% power • If 25% of patients are positive, then when there are 88 events in positive patients there will be about 264 events in test negative patients • 264 events provides 90% power for detecting 33% reduction in hazard at 5% two-sided significance level • Sequential futility monitoring may enable early cessation of accrual of test negative patients

  28. Study-wise false positivity rate is limited to 5% with analysis plan A • It is not necessary or appropriate to require that the treatment vs control difference be significant overall before doing the analysis within subsets

  29. Analysis Plan B • Compare the new drug to the control overall for all patients ignoring the classifier. • If poverall 0.03 claim effectiveness for the eligible population as a whole • Otherwise perform a single subset analysis evaluating the new drug in the classifier + patients • If psubset 0.02 claim effectiveness for the classifier + patients.

  30. This analysis strategy is designed to not penalize sponsors for having developed a classifier • It provides sponsors with an incentive to develop genomic classifiers

  31. Sample size for Analysis Plan B • To have 90% power for detecting uniform 33% reduction in overally hazard at 3% two-sided level requires 297 events (instead of 263 for similar power at 5% level) • If 25% of patients are test positive, then when there are 297 total events there will be approximately 75 events in positive patients • 75 events provides 75% power for detecting 50% reduction in hazard at 2% two-sided significance level • By delaying evaluation in test positive patients, 80% power is achieved with 84 events and 90% power with 109 events

  32. Analysis Plan C • Test for interaction between treatment effect in test positive patients and treatment effect in test negative patients • If interaction is significant at level int then compare treatments separately for test positive patients and test negative patients • Otherwise, compare treatments overall

  33. Sample Size Planning for Analysis Plan C • 88 events in test + patients needed to detect 50% reduction in hazard at 5% two-sided significance level with 90% power • If 25% of patients are positive, then when there are 88 events in positive patients there will be about 264 events in negative patients • 264 events provides 90% power for detecting 33% reduction in hazard at 5% two-sided significance level

  34. Simulation Results for Analysis Plan C • Using int=0.10, the interaction test has power 93.7% when there is a 50% reduction in hazard in test positive patients and no treatment effect in test negative patients • A significant interaction and significant treatment effect in test positive patients is obtained in 88% of cases under the above conditions • If the treatment reduces hazard by 33% uniformly, the interaction test is negative and the overall test is significant in 87% of cases

  35. Development of Genomic Classifiers • During phase II development or • After failed phase III trial using archived specimens. • Adaptively during early portion of phase III trial.

  36. Adaptive Signature Design An adaptive design for generating and prospectively testing a gene expression signature for sensitive patients Boris Freidlin and Richard Simon Clinical Cancer Research 11:7872-8, 2005

  37. Adaptive Signature DesignEnd of Trial Analysis • Compare E to C for all patients at significance level 0.03 • If overall H0 is rejected, then claim effectiveness of E for eligible patients • Otherwise

  38. Otherwise: • Using only the first half of patients accrued during the trial, develop a binary classifier that predicts the subset of patients most likely to benefit from the new treatment E compared to control C • Compare E to C for patients accrued in second stage who are predicted responsive to E based on classifier • Perform test at significance level 0.02 • If H0 is rejected, claim effectiveness of E for subset defined by classifier

  39. Biomarker Adaptive Threshold Design Wenyu Jiang, Boris Freidlin & Richard Simon JNCI 99:1036-43, 2007

  40. Biomarker Adaptive Threshold Design • Randomized phase III trial comparing new treatment E to control C • Survival or DFS endpoint

  41. Biomarker Adaptive Threshold Design • Have identified a predictive index B thought to be predictive of patients likely to benefit from E relative to C • Eligibility not restricted by biomarker • No threshold for biomarker determined

  42. Good Microarray Studies Have Clear Objectives • Class Comparison (gene finding) • Find genes whose expression differs among predetermined classes • Class Prediction • Prediction of response using gene expression profile • Class Discovery • Discover clusters of specimens having similar expression profiles • Discover clusters of genes having similar expression profiles

  43. Class Comparison and Class Prediction • Not clustering problems • Supervised methods

  44. Class prediction methods usually have gene selection as a component • The criteria for gene selection for class prediction and for class comparison are different • For class comparison false discovery rate is important • For class prediction, predictive accuracy is important

  45. Gene Selection • Genes which are univariately most correlated with outcome • Small subset of genes which together give most accurate predictions • Little evidence that complex feature selection methods are useful in microarray problems

More Related