Analysis of Stratified Trials – Challenging the “Standard” Methods Devan V. Mehrotra Clinical Biostatistics

Analysis of Stratified Trials – Challenging the “Standard” MethodsDevan V. MehrotraClinical Biostatistics Department Seminar Merck Research Laboratories Jan 10, 2008

Outline • Part I: binary response variable > Mantel-Haenszel test > Minimum risk weights > Simulation results > Conclusions • Part II: continuous non-normal response variable > Motivating example > Technical details > Simulation results > Conclusions

Part I Analysis of Binary Data

Stratified Trials with Binary Endpoints • 2 treatments (A and B), number of strata = s Binary response (responder/non-responder) • pij = true (population) proportion for strat i, trt j i = piA - piB = true difference for strat i fi = true (population) relative frequency for strat i = true overall difference • = observed proportion for strat i, trt j nij= observed number of subjects in strat i, trt j

Hypothesis Testing: General FrameworkSuperiority or Non-Inferiority Trials

Mantel-Haenszel Test (1959)Superiority Trials Note: MH test is optimal is constant across strata.

Choice of Variance • Nullvariance[Miettinen & Nurminen 1985, Farrington & Manning, 1990] m.l.e. of under the restriction Note: MH test uses the null variance. • Observed (OBS) variance • Note: With 1:1 randomization, for superiority trials, and usually so (but not always) for non-inferiority trials.

(pA, pB) pairs where Null or Observed Variance is “Better”Non-Inferiority Margin = 15%

(pA, pB) pairs where Null or Observed Variance is “Better”Non-Inferiority Margin = 5%

Choice of Weights • Cochran-Mantel-Haenszel (CMH)weights >> Estimator of  is ~ unbiased. • Minimum Risk (MR) weights[Mehrotra & Railkar, 2000] >> Estimator of  has smallest mean squared error. >> If (optimal weights!)

Choice of Finite Sample Term • With CMH weights (i.e., with MH test): is used. • With MR weights: is recommended.

Choice of Continuity Correction • With CMH weights: is used by original MH test. However, is a less conservative choice. • With MR weights: is recommended. See Mehrotra & Railkar, Stats in Med, 2000

Motivating Example RevisitedTest for Superiority

Simulation ResultsTest for Superiority (2 strata) (f1 = .7, f2 = .3); No TxS interaction on (a) logit, (b) proportion, (c ) square root, and (d) log scales; 100,000 simulations.

Illustrative Example # 2Test for Non-Inferiority

Simulation ResultsTest for Non-inferiority (2 strata)

Simulation Results: PowerTest for Non-inferiority (2 strata)

Summary (Part I) For stratified trials with binary responses: • The popular Mantel-Haenszel test uses sample size (CMH) weights with null variances. It has good power properties if and only if the odds ratio is constant across strata. • Using minimum risk (MR)weights with observed (OBS)variances will usually provide notably more power than CMH weights with null variances for both superiority and non-inferiority trials. • Recommendation: consider MR_OBS as a default, but use simulations to quantify power differences between methods when planning a new trial.

Part II Analysis of Continuous Data Using Ranks

Motivating ExampleHypothetical viral loads of HIV+ subjects (log10 copies/ml)

Motivating Example (continued) • Observed viral load summaries (log10 copies/ml): • Compared to placebo, the VLs for vaccine appear to be “shifted” to the left (i.e., are numerically smaller). Is the shift statistically significant?

Motivating Example (continued) Stratified rank-based analysis: SAS implementation • PROC FREQ; TABLES gender * trt * vload/CMH SCORES=RANK; RUN; • PROC FREQ; TABLES gender * trt * vload/CMH SCORES=MODRIDIT; RUN; • PROC TWOSAMPL; [Part of PROC StatXact module] WI/AS; PO trt; RE vload; ST gender; RUN;

Motivating Example (continued) • 2-tailed p-values using the three “methods”: Different conclusions at =.05 … why? • PROC FREQ > Ranks based on pooled sample within each stratum (“stratum-specific” ranks) > SCORES = RANK  equal stratum weights SCORES = MODRIDIT  unequal stratum weights • PROC TWOSAMPL: Ranks based on overall pooled sample, ignoring strata (“stratum-invariant” ranks), with equal stratum weights.

Technical Details

Technical Details (continued)

Technical Details (continued) Three Popular Rank-Based Tests

Technical Details (continued) • If there is no true treatment by stratum interaction (i=  for all i), the van Elteren test is optimal among all the stratified test, i.e., wi= 1/(ni + 1) are optimal weights. • However, if interaction exists, the van Elteren test can suffer from a power loss. • In general, is there an asymptotically optimal test (with optimal weights) that allows for interaction? YES … we derived it , based on stratum-specific ranks.

Technical Details (continued)

Technical Details (continued) Estimate and 100(1-)% CI for  Obtained by Inverting the Given Test • Let Let p(c) = 1-tailed p-value for test applied to Obtained via a numerical search.

Motivating Example Revisited2-tailed p-values Note: All methods except use stratum-specific ranks

Motivating Example RevisitedEstimates and 95% CIs for  (selected methods)

Simulation Study • 2 treatments, 1:1 randomization per stratum • Number of strata = 2, 4, 6, 8, 10, and 12 • Stratum size (ni): 10*i for stratum i • Different choices of i: • constant for each stratum (no TxS interaction) • positively or negatively associated with stratum size (TxS interaction, with 50% power to detect it) • Four different distributions for Y: • Normal • Log Normal • Mixture of Normals: 0.9N(m,v) + 0.1N(m*,v*) • t3

Simulation ResultsType I Error Rate (nominal  = 5%)Normal Distribution Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

Simulation ResultsType I Error Rate (nominal  = 5%)Lognormal Distribution Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

Simulation ResultsType I Error Rate (nominal  = 5%)Mixture of Normals Distribution Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

Simulation ResultsType I Error Rate (nominal  = 5%)t3 Distribution Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

Simulation Results: Power (%)No T x S interaction (constant

Simulation Results: Power (%)No T x S interaction

Simulation Results: Power (%)Normal Distribution

Simulation Results: Power (%)Lognormal Distribution

Simulation Results: Power (%)Mixture ofNormals

Simulation Results: Power (%)t3 Distribution

Conclusions (Part II) For rank-based analyses of stratified trials: > No single method is uniformly the best  • Recommendation: use the aligned rank test (Talign) or either of the proposed adaptive tests (Tadap1 or Tadap2). Both tests were more powerful than the van Elteren test (TvE) in every case studied, notably so when there was a true (but hard to detect) T x S interaction. > It is time to retire the popular van Elteren test!

References • Brunner, E., Puri, M. L., and Sun, S. (1995). Nonparametric Methods for Stratified Two-Sample Designs with Application to Multiclinic Trials. Journal of American Statistical Association, 90, 1004-1014. • Hodges, J. L. and Lehman, E. C. (1962). Rank Methods for Combination of Independent Experiments in the Analysis of Variance. Annals of Mathematical Statistics, 33, 482-497. • Mehrotra, D.V. and Railkar, R. (2000). Minimum Risk Weights for Comparing Treatments in Stratified Binomial Trials. Statistics in Medicine, 19, 811-825. • Wang, W., Mehrotra, D.V., Chan, I.S.F. and Heyse, J.F. (2006). Non-Inferiority /Equivalence Trials in Vaccine Development. Journal of Biopharmaceutical Statistics, 16, 429-441. • Öhrvik, J. (1999). Aligned Ranks: A Method of Gaining Efficiency in Rank Tests. http://www.stat.fi/isi99/proceedings/arkisto/varasto/hrvi0423.pdf • van Elteren, P. H. (1960). On the Combination of Independent Two Sample Tests of Wilcoxon. Bulletin of the Institute of International Statistics, 37, 351-361.

Analysis of Stratified Trials – Challenging the “Standard” Methods Devan V. Mehrotra Clinical Biostatistics

Analysis of Stratified Trials – Challenging the “Standard” Methods Devan V. Mehrotra Clinical Biostatistics

Presentation Transcript

New Clinical Trials in Prostate Cancer

Lessons From Clinical Trials of Targeted Therapies for Cancer

Radiographic scoring in rheumatoid arthritis - The basics

Targeted MLE for Variable Importance and Causal Effect with Clinical Trial and Observational Data

Everything You Need to Know About Clinical Trials Registration and Results Reporting Requirements

Design and Analysis of Clinical Trials

The NCI Office of Cancer Centers Learning Series Bringing Quantitative Imaging to Cancer Center Clinical Trials

Orientation for New Clinical Research PERSONNEL Module 2

CPH Exam Review Biostatistics

National Drug Abuse Treatment Clinical Trials Network

AMCS/CS 340: Data Mining

COMPARTMENTAL ANALYSIS OF DRUG DISTRIBUTION

Data Mining: Concepts and Techniques Cluster Analysis Li Xiong

CAELYX CLINICAL TRIALS Metastatic Breast Cancer (MBC)

Everything You Need to Know About Clinical Trials Registration and Results Reporting Requirements

Introduction to Biostatistics for Clinical and Translational Researchers

Third Annual Maria Ricci Lecture Queen’s University and the NCIC CTG Clinical Trials Group

Overview of Monitoring Clinical Trials

Clinical Trials Overview

Practical Applications of Statistical Methods in the Clinical Laboratory

Methods for Summarizing the Evidence: Meta-Analyses and Pooled Analyses