On the Road to Genomic Predictive Medicine An Interim Analysis

On the Road to Genomic Predictive Medicine An Interim Analysis Richard Simon Chief, Biometric Research Branch National Cancer Institute

How I got involved in genomics • In the late 1990’s genomic data was for me the most exciting scientific data of our generation • Analysis of that data shouldn’t be left to amateurs • We had a great cadre of statisticians involved in clinical trials and we know how to do reliable clinical trials, but the drugs are often disappointing • Statisticians should be involved in basic research, pre-clinical target discovery and policy

Biomedical leaders were looking to computer scientists and physicists for help, not to statisticians • Statisticians were viewed as useful for testing hypotheses and computing p values, not for discovery

Many statisticians tend to see themselves as methods developers not as scientists focused on subject matter area

Imatinib chronology • 1960 - Philadelphia chromosome described (P Nowell) • 1973 – Ph characterized as translocation of BCR on chromosome 9 with ABL on chromosome 22 (J Rowley) • 1986 – BCR-ABL fusion gene characterized as constituatively activated kinase (D Baltimore)

Imatinib chronology • 1988 -1995 CIBA-GEIGY develops kinase inhibitors (A Matter, N Lydon, J Zimmermann, E Buchdunger) • 1996 B Drucker (Dana Farber -> Oregon) screens in ex-vivo tumors and normal lymphocytes against compounds provided by Novartis and convinces company to sponsor clinical trials in CML in spite of only 5000 cases/yr in US

Success depended on collaboration between industry and academia • Delayed development resulted from reluctance of field to accept hypothesis that kinases can be selectively inhibited or that inhibiting a single gene could be very effective • Industry involvement dependent on vision of a small leadership group in one company • Clinical translation dependent on vision of one oncologist

Success depends on serendiptiy • Academic medicine (NIH) is a bottom-up system not optimized for risk taking or exploiting scientific leads for translating basic research to clinical products or for mounting large cooperative programs for overcoming bottlenecks in translation • Academic medicine is very dependent on industry but industry has its own constraints

Predictive Medicine • Germline genetics • GWAS • 23andMe • Tumor genomics • Tumor Cell Genome Atlas

Ioannidis et al.JNCI 102:846(2010) • 56 GWAS • 92 statistically significant associations between cancer phenotype and genetic variant • Median OR = 1.22 • IQR OR = 1.15 – 1.36

Cancers of a given histologic diagnosis are genomically heterogeneous • Cancers are mostly caused by somatic mutations not genetic polymorphisms • Most of the information about the disease is in the tumor genome, not the germ-line genome

Biomarkers for Early Detection • Because of the long time between first mutation and clinical diagnosis of human solid tumors, there would seem to be great opportunity for early detection

Phase II trials of early detection have used samples from patients at diagnosis • Effective detection must have long lead time and high specificity for tumors which will evolve to be life threatening

Biomarkers for Informing Treatment Selection • Prognostic biomarkers • Measured before treatment to indicate long-term outcome for patients untreated or receiving standard treatment • To identify which patients have excellent prognosis on conservative treatment • Predictive biomarkers • Measured before treatment to identify who is likely or unlikely to benefit from a particular treatment

Prognostic Markers • Vast literature on prognostic markers • Very few used in practice • Most studies motivated by desire to learn about disease biology • Broad selection of cases • Little focus on intended use • Little focus on analytical validation of assay

Validation of Biomarkers • Analytical validity • Measures what it supposed to • Reproducible • Clinical validity • Correlates with something clinically • Clinical utility • Is actionable • Measuring marker leads to action that benefits patient • Requires clarity on intended use

If you don’t know where you are going, you might not get thereYogi Berra

Prognostic Markers • OncotypeDx: Which patients with node negative ER positive breast cancer who are receiving tamoxifin will have such good prognosis that they do not need cytotoxic chemotherapy? • Analysis focused on whether marker identifies such a subset, not on statistical significance

p<0.0001 338 pts 149 pts 181 pts B-14 Results—Relapse-Free Survival Paik et al, SABCS 2003

Major problems with prognostic studies of gene expression signatures • Inadequate focus on intended use • Cases selected based on availability of specimens rather than for relevance to intended use • Heterogeneous sample of patients with mixed stages and treatments. Attempt to disentangle effects using regression modeling • Overemphasis on statistical significance and hazard ratios. • Over-fitting data

For p>n problems • Fit of a model to the same data used to develop it is no evidence of prediction accuracy for independent data

Validation of Prognostic Model • Completely independent validation dataset • Splitting dataset into training and testing sets • Cross-validation

Partition data set D into K equal parts D1,D2,...,DK • First training set T1=D-D1 • Develop completely specified prognostic model M1 using only data T1 • Compute prognostic score for cases in D1 • Develop model M2 using only T2 and then score cases in D2

Repeat for ... TK -> MK -> DK • Group patients into risk groups (e.g. 2 or more) based on their cross-validated scores • Calculate Kaplan-Meier survival curve for each risk-group

Complete cross Validation • Cross-validation simulates the process of separately developing a model on one set of data and predicting for a test set of data not used in developing the model • All aspects of the model development process must be repeated for each loop of the cross-validation • Feature selection • Tuning parameter optimization

Prediction on Simulated Null DataSimon et al. J Nat Cancer Inst 95:14, 2003 • Generation of Gene Expression Profiles • 20 specimens (Pi is the expression profile for specimen i) • Log-ratio measurements on 6000 genes • Pi ~ MVN(0, I6000) • Can we distinguish between the first 10 specimens (Class 1) and the last 10 (Class 2)? • Prediction Method • Compound covariate predictor built from the log-ratios of the 10 most differentially expressed genes.

Cross Validation • The cross-validated estimate of misclassification error is an estimate of the prediction error for the model fit applying the specified algorithm to full dataset

Statistical significance of the difference in survival among risk groups is usually not the point • But to evaluate significance, the log-rank test cannot be used for cross-validated Kaplan-Meier curves because the survival times are not independent

On the Road to Genomic Predictive Medicine An Interim Analysis

On the Road to Genomic Predictive Medicine An Interim Analysis

Presentation Transcript

Genomic Medicine in France

Steps on the Road to Predictive Medicine

Personalized Predictive Medicine and Genomic Clinical Trials

Steps on the Road to Predictive Medicine

Steps on the Road to Predictive Oncology

Predictive Preventive Personalized Medicine

On the road to

Genomic Medicine and Prevention

Interim Analysis

The Horizons of Predictive Medicine

Genomic medicine and Personalized Medicine

Predictive Analysis

Genomic Analysis

Predictive Analysis

An Interim Report on the OppWireless Project

On the Road to Genomic Predictive Medicine An Interim Analysis

Genomic Medicine Centre Overview

Personalized Predictive Medicine and Genomic Clinical Trials

The Horizons of Predictive Medicine