Statistical Evaluation of Surrogate Markers: Validity, Efficiency and Sensitivity

University of Pennsylvania Annual Conference on Statistical Issues in Clinical Trials Statistical Evaluation of Surrogate Markers: Validity, Efficiency and Sensitivity Yongming Qu, PhD Eli Lilly and CompanyIndianapolis, Indiana April 18, 2012 This is based on previous and ongoing research through collaboration with Michael Case, Somnath Sarkar, Wen Li, and Pandurang M. Kulkarni.

Outline • Introduction • Biomarker, surrogate marker and surrogate endpoint • Validity and efficiency of surrogate marker • Quantities used in statistical validation • Proportion of Treatment Effect (PTE) • General Association • Likelihood reduction factor (LRF) • Proportion of Information Gain (PIG) • Effect of measurement error and adjustment for it • Summary April 18, 2012

Biomarker and Surrogate Endpoint (SE) • Biomarker: "a characteristic that is objectively measured and evaluated as an indicator of normal biologic processes, pathogenic processes, or pharmacologic responses to a therapeutic intervention.” (Clinical Pharmacology Therapy 2001;69:89-95.) • Surrogate endpoint: “a laboratory measurement or a physical sign used as a substitute for a clinically meaningful endpoint that measures directly how a patient feels, functions or services. Changes induced by a therapy on a surrogate endpoint are expected to reflect changes in a clinically meaningful endpoint” (Temple 1995) April 18, 2012

Validation of Surrogate Endpoint (SE) • Surrogate endpoint is intended to replace clinical outcome for any therapy • Surrogate endpoint is independent of therapy • Traditional way of validating surrogate endpoint using treatment is not feasible • Surrogate endpoint needs to be validated • To evaluate the surrogate endpoint, large confirmatory clinical trials need to be conducted for both surrogate and clinical endpoints • If large confirmatory clinical trials are conducted, the drug efficacy should have been established. • There is no need for surrogate endpoint for this drug • The conclusion from this drug cannot be extrapolated to other drugs because different drugs may work through different pathways April 18, 2012

Validation of SE – New Thinking • Validation of SE should be based on the disease mechanism, not the effect of treatment • Hemoglobin A1c (HbA1c) is a widely used SE for the average of glucose • The validation of this SE is not based on any clinical studies involving different treatment • It is based on biochemistry and physiology • Progression-free survival (PFS) is widely used as an SE for cancer survival • The validation should be based on biology of the disease and tumor, not individual drugs April 18, 2012

Surrogate Marker (SM) • Surrogate marker for a drug is a marker which could be used to predict the drug’s efficacy or safety • Example of the usefulness of a surrogate marker • Suppose bone mineral density (BMD) is a surrogate marker for osteoporotic fracture • The long-term effect of an osteoporosis drug on fracture is difficult to know and is too costly to know • A woman takes an osteoporosis drug • Clinicians measure her BMD after 6 months of use • If BMD is increased, this drug works for this woman and she should continue to use this drug • If BMD is not increased, this drug does not work for this woman, and she should switch to a different drug • SM is very useful to monitor patients and identify which drug works best for a patient in the early stage of the disease SM can NOT be used to replace clinical outcome for drug approval! April 18, 2012

SE, SM and Biomarker SE SM Biomarker Question: A particular biomarker is a SM? April 18, 2012

SE Validation (Prentice) • Prentice (Stat. Med. 1989, 8:431–440) proposed a necessary and sufficient condition for a surrogate endpoint f(S|Z) = f(S) f(T|Z)=f(T) • This definition is too stringent: it essentially requires surrogate endpoint is “equivalent” to clinical outcome • Prentice’s key operational criterion f(T|S, Z) = f(T|S) does not guarantee this condition • This condition can be weakened. A marker is said to be a SE if f(S|Z) = f(S) f(T|Z)=f(T) for any Z • Practically, this condition cannot be validated through clinical trials testing drug effect • One can NOT prove a mathematical theory through enumeration! • One can invalidate a SE if the above relationship does not hold for one treatment Z April 18, 2012

Surrogate Marker - Concepts • Validity: A marker S is said to be a valid surrogate marker for a clinical outcome T for a particular treatment if f(T|Z) ≠ f(T) f(T|S, Z) = f(T|S) where Z is the treatment indicator with Z = 1 for the treatment and Z = 0 for control • Efficiency: For two surrogate markers S1 and S2, we say S1 is more efficient than S2 if Var[T|Z, S1] < Var[T|Z, S2] • Validity is a much higher hurdle than efficiency in practice April 18, 2012

Proportion of Treatment Effect • Consider two models T|Z =a0 + aZZ T|S, Z =b0 + bZZ + bsS • The PTE (Freedman et al, Stat. Med. 1992; 11:167-178) is PTE = 1 – bZ/aZ • Drawbacks of PTE • Not bounded by [0,1] • Large variability makes the results not informative April 18, 2012

General Association • Consider two models • Buyse and Molenberghs (Biometrics 1998:54:1014-1029) suggested using the coefficient of determination to evaluate the surrogate marker where April 18, 2012

Artificial Example 1 • Let eS,j= eT,j, then R2 = 1 • The relationship between clinical outcome and marker depends on treatment group • YS,j is not a good surrogate marker! April 18, 2012

Artificial Example 2 • Depending on the parameters, R2 can be any number • The effect of treatment on the clinical outcome acts solely through the marker YS,j • YS,j is a perfect surrogate marker! April 18, 2012

Likelihood Reduction Factor (LRF) • Consider two models T|Z =a0 + aZZ (1) T|S, Z =b0 + bZZ + bsS (2) • Alonso et al. (Biometrics 2004; 60:724-728) defined the likelihood reduction factor (LRF) as where LRT(Z,S:Z) is the likelihood ratio test statistic comparing the two models (2) and (1) • LRF is bounded by [0,1] but may be impossible to reach 1 for some models • The LRF adjusted (LRFa) was proposed April 18, 2012

A Different Approach • Instead of comparing T|Z =a0 + aZZ T|S, Z =b0 + bZZ + bsS • We compare T|S =g0 + gZS T|S, Z =b0 + bZZ + bsS LRFa(Z,S:Z) Alonso, et al New Quantity April 18, 2012

Proportion of Information Gain (PIG) • Consider three models T =c0 (1) T|S =g0 + gZS (2) T|S, Z =b0 + bZZ + bsS (3) • Qu and Case (Biometrics 2007;63:958-963) defined the proportion of information gain (PIG) as where LRT(Z,S:1) is the likelihood ratio test statistic comparing the models (3) and (1), and LRT(S:1) is the likelihood ratio test statistic comparing the models (2) and (1) April 18, 2012

A Simple Simulation logit(Pr(T=1) | S, Z) = -S S = Z + u, u~N(0,s2) • Validity of SE is met • Compare the performance of PTE, LRFa and PIG for various s2 • Sample size = 1,000 (n=500 per group) • 1,000 simulation samples Qu and Case (Biometrics 2007;63:958-963) April 18, 2012

Simulation Results: Mean (SD) Qu and Case (Biometrics 2007;63:958-963) April 18, 2012

Effect of Measurement Error on Evaluation OF Biomarkers April 18, 2012

Measurement Error in Biomarker • Biomarker may be measured with error • W = S + U, S = the true value for the marker, U is the measurement error and W is the observed value • The magnitude of measurement error is generally described by • Proportion of variation due to measurement error: Var(U)÷Var(W) • <30% is considered small • 30-50% is considered moderate • > 50% is considered large • Reliability: Var(S)÷Var(W) • Measurement error could attenuate the estimate for PIG (and in PTE, etc) April 18, 2012

Simulation extrapolation (SIMEX) • PIG(X) is what we want • PIG(W) is the estimate with measurement error • has the same expectation as PIG(X), where U* and U are IID • Above quantity is generally hard to estimate. SIMEX is a method to use simulation to estimate the trend of the bias (often using assuming a quadratic curve) and then extrapolate to obtain a less biased estimator. Cook and Stefanski, JASA1994; 89:1314--1328. Li and Qu, Stat in Med. 2010: 2338–2346 April 18, 2012

Healthy spine Kyphotic spine Bone Mineral Density (BMD) and Fracture Dual-energy x-ray absorptiometry (DEXA) Vertebral Fracture April 18, 2012

Multiple Outcomes of Raloxifene Evaluation (MORE) • MORE study was a 3-year placebo-controlled, double blind, and randomized clinical trial evaluating the treatment effect of raloxifene on vertebral fracture. • Vertebral fracture was assessed at year 2 and 3, or with a symptom of back pain • BMD was measured at baseline and years 1, 2 and 3. Sarkar, et al, J Bone Miner Res 2002;17:1–10 April 18, 2012

Adjustment for Measurement Error in PIG Estimation • Objective: to evaluate if the change in femoral neck BMD is a good surrogate marker for vertebral fracture • Femoral neck BMD was measured twice at baseline • The estimated standard deviation of the measurement error = 0.023 g/cm2 • The proportion of the variability due to measurement error in the observed BMD change was ~70% (Qu, et al. Stat in Med 2007; 26:197--211) • Even adjust for measurement error, change in femoral neck BMD is still not a good surrogate marker Li and Qu, Stat in Med. 2010: 2338–23 April 18, 2012

Summary • New concepts of surrogate marker and surrogate endpoint • Definition of validity and efficiency of a surrogate marker • PIG is so far a very reasonable quantity to evaluate surrogate marker • Measurement error in the marker can attenuate the estimation for PIG • SIMEX is a general method to correct for bias due to measurement error April 18, 2012

April 18, 2012

Abstract Statistical Evaluation of Surrogate Markers: Validity, Efficiency and Sensitivity Yongming Qu, PhD Surrogate markers are important in drug development as they may reduce the development cost and cycle dramatically, as compared to using actual clinical outcomes. Statistical evaluation of surrogate markers can be dated back to thirty years ago. So far, little progress has been made in identifying new surrogate endpoints. Demonstarting treatment effect with clinical outcomes still remain mandatory requirement for clinical drug development for many disease areas. For example, “the FDA approved Avastin for advanced breast cancer in February 2008, after one clinical trial showed that combining Avastin with another drug, paclitaxel, delayed the median time before tumors worsened by 5.5 months, compared with using paclitaxel alone. But the women who got Avastin did not live significantly longer than those who got only paclitaxel, which is also known by its brand name Taxol” (http://www.nytimes.com/2011/06/27/health/27drug.html). In this research, we will discuss the validity, efficiency and sensitivity in statistical evaluation of surrogate markers. New definitions with simulation and examples will be provided. April 18, 2012

Statistical Evaluation of Surrogate Markers: Validity, Efficiency and Sensitivity

Statistical Evaluation of Surrogate Markers: Validity, Efficiency and Sensitivity

Presentation Transcript

Statistical challenges in the validation of surrogate endpoints

Surrogate Markers and its role in the Drug Development Process

Statistical Evaluation of Data

Statistical issues in the validation of surrogate endpoints

CDER Meeting: Surrogate Markers of Immunity

Evaluation of Statistical Reports

Teacher Evaluation: Issues of Validity and Reliability

Assessment Population and the Validity Evaluation

A statistical approach to surrogate data

Statistical Evaluation of Data

Validity Evaluation

Efficiency and Sensitivity for the HALO Detector

Efficiency and Sensitivity Analyses in the Evaluation of University Departments

STATISTICAL EVALUATION

Assessment Population and the Validity Evaluation

Performance evaluation of some clustering algorithms and validity indices

Economic evaluation of outcomes: long term primary and surrogate endpoints

Statistical evaluation of GPS error

Assessment Population and the Validity Evaluation