1 / 49

Analysis of Stratified Trials – Challenging the “Standard” Methods Devan V. Mehrotra Clinical Biostatistics

Analysis of Stratified Trials – Challenging the “Standard” Methods Devan V. Mehrotra Clinical Biostatistics. Department Seminar Merck Research Laboratories Jan 10, 2008. Outline. Part I: binary response variable > Mantel-Haenszel test > Minimum risk weights > Simulation results

Audrey
Download Presentation

Analysis of Stratified Trials – Challenging the “Standard” Methods Devan V. Mehrotra Clinical Biostatistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Stratified Trials – Challenging the “Standard” MethodsDevan V. MehrotraClinical Biostatistics Department Seminar Merck Research Laboratories Jan 10, 2008

  2. Outline • Part I: binary response variable > Mantel-Haenszel test > Minimum risk weights > Simulation results > Conclusions • Part II: continuous non-normal response variable > Motivating example > Technical details > Simulation results > Conclusions

  3. Part I Analysis of Binary Data

  4. Stratified Trials with Binary Endpoints • 2 treatments (A and B), number of strata = s Binary response (responder/non-responder) • pij = true (population) proportion for strat i, trt j i = piA - piB = true difference for strat i fi = true (population) relative frequency for strat i = true overall difference • = observed proportion for strat i, trt j nij= observed number of subjects in strat i, trt j

  5. Hypothesis Testing: General FrameworkSuperiority or Non-Inferiority Trials

  6. Mantel-Haenszel Test (1959)Superiority Trials Note: MH test is optimal is constant across strata.

  7. Choice of Variance • Nullvariance[Miettinen & Nurminen 1985, Farrington & Manning, 1990] m.l.e. of under the restriction Note: MH test uses the null variance. • Observed (OBS) variance • Note: With 1:1 randomization, for superiority trials, and usually so (but not always) for non-inferiority trials.

  8. (pA, pB) pairs where Null or Observed Variance is “Better”Non-Inferiority Margin = 15%

  9. (pA, pB) pairs where Null or Observed Variance is “Better”Non-Inferiority Margin = 5%

  10. Choice of Weights • Cochran-Mantel-Haenszel (CMH)weights >> Estimator of  is ~ unbiased. • Minimum Risk (MR) weights[Mehrotra & Railkar, 2000] >> Estimator of  has smallest mean squared error. >> If (optimal weights!)

  11. Choice of Finite Sample Term • With CMH weights (i.e., with MH test): is used. • With MR weights: is recommended.

  12. Choice of Continuity Correction • With CMH weights: is used by original MH test. However, is a less conservative choice. • With MR weights: is recommended. See Mehrotra & Railkar, Stats in Med, 2000

  13. Motivating Example RevisitedTest for Superiority

  14. Simulation ResultsTest for Superiority (2 strata) (f1 = .7, f2 = .3); No TxS interaction on (a) logit, (b) proportion, (c ) square root, and (d) log scales; 100,000 simulations.

  15. Illustrative Example # 2Test for Non-Inferiority

  16. Simulation ResultsTest for Non-inferiority (2 strata)

  17. Simulation Results: PowerTest for Non-inferiority (2 strata)

  18. Summary (Part I) For stratified trials with binary responses: • The popular Mantel-Haenszel test uses sample size (CMH) weights with null variances. It has good power properties if and only if the odds ratio is constant across strata. • Using minimum risk (MR)weights with observed (OBS)variances will usually provide notably more power than CMH weights with null variances for both superiority and non-inferiority trials. • Recommendation: consider MR_OBS as a default, but use simulations to quantify power differences between methods when planning a new trial.

  19. Part II Analysis of Continuous Data Using Ranks

  20. Motivating ExampleHypothetical viral loads of HIV+ subjects (log10 copies/ml)

  21. Motivating Example (continued) • Observed viral load summaries (log10 copies/ml): • Compared to placebo, the VLs for vaccine appear to be “shifted” to the left (i.e., are numerically smaller). Is the shift statistically significant?

  22. Motivating Example (continued) Stratified rank-based analysis: SAS implementation • PROC FREQ; TABLES gender * trt * vload/CMH SCORES=RANK; RUN; • PROC FREQ; TABLES gender * trt * vload/CMH SCORES=MODRIDIT; RUN; • PROC TWOSAMPL; [Part of PROC StatXact module] WI/AS; PO trt; RE vload; ST gender; RUN;

  23. Motivating Example (continued) • 2-tailed p-values using the three “methods”: Different conclusions at =.05 … why? • PROC FREQ > Ranks based on pooled sample within each stratum (“stratum-specific” ranks) > SCORES = RANK  equal stratum weights SCORES = MODRIDIT  unequal stratum weights • PROC TWOSAMPL: Ranks based on overall pooled sample, ignoring strata (“stratum-invariant” ranks), with equal stratum weights.

  24. Technical Details

  25. Technical Details (continued)

  26. Technical Details (continued) Three Popular Rank-Based Tests

  27. Technical Details (continued) • If there is no true treatment by stratum interaction (i=  for all i), the van Elteren test is optimal among all the stratified test, i.e., wi= 1/(ni + 1) are optimal weights. • However, if interaction exists, the van Elteren test can suffer from a power loss. • In general, is there an asymptotically optimal test (with optimal weights) that allows for interaction? YES … we derived it , based on stratum-specific ranks.

  28. Technical Details (continued)

  29. Technical Details (continued)

  30. Technical Details (continued)

  31. Technical Details (continued)

  32. Technical Details (continued)

  33. Technical Details (continued)

  34. Technical Details (continued) Estimate and 100(1-)% CI for  Obtained by Inverting the Given Test • Let Let p(c) = 1-tailed p-value for test applied to Obtained via a numerical search.

  35. Motivating Example Revisited2-tailed p-values Note: All methods except use stratum-specific ranks

  36. Motivating Example RevisitedEstimates and 95% CIs for  (selected methods)

  37. Simulation Study • 2 treatments, 1:1 randomization per stratum • Number of strata = 2, 4, 6, 8, 10, and 12 • Stratum size (ni): 10*i for stratum i • Different choices of i: • constant for each stratum (no TxS interaction) • positively or negatively associated with stratum size (TxS interaction, with 50% power to detect it) • Four different distributions for Y: • Normal • Log Normal • Mixture of Normals: 0.9N(m,v) + 0.1N(m*,v*) • t3

  38. Simulation ResultsType I Error Rate (nominal  = 5%)Normal Distribution Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

  39. Simulation ResultsType I Error Rate (nominal  = 5%)Lognormal Distribution Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

  40. Simulation ResultsType I Error Rate (nominal  = 5%)Mixture of Normals Distribution Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

  41. Simulation ResultsType I Error Rate (nominal  = 5%)t3 Distribution Note: 5.00% + 3 std. errors = 5.92% (5000 simulations)

  42. Simulation Results: Power (%)No T x S interaction (constant

  43. Simulation Results: Power (%)No T x S interaction

  44. Simulation Results: Power (%)Normal Distribution

  45. Simulation Results: Power (%)Lognormal Distribution

  46. Simulation Results: Power (%)Mixture ofNormals

  47. Simulation Results: Power (%)t3 Distribution

  48. Conclusions (Part II) For rank-based analyses of stratified trials: > No single method is uniformly the best  • Recommendation: use the aligned rank test (Talign) or either of the proposed adaptive tests (Tadap1 or Tadap2). Both tests were more powerful than the van Elteren test (TvE) in every case studied, notably so when there was a true (but hard to detect) T x S interaction. > It is time to retire the popular van Elteren test!

  49. References • Brunner, E., Puri, M. L., and Sun, S. (1995). Nonparametric Methods for Stratified Two-Sample Designs with Application to Multiclinic Trials. Journal of American Statistical Association, 90, 1004-1014. • Hodges, J. L. and Lehman, E. C. (1962). Rank Methods for Combination of Independent Experiments in the Analysis of Variance. Annals of Mathematical Statistics, 33, 482-497. • Mehrotra, D.V. and Railkar, R. (2000). Minimum Risk Weights for Comparing Treatments in Stratified Binomial Trials. Statistics in Medicine, 19, 811-825. • Wang, W., Mehrotra, D.V., Chan, I.S.F. and Heyse, J.F. (2006). Non-Inferiority /Equivalence Trials in Vaccine Development. Journal of Biopharmaceutical Statistics, 16, 429-441. • Öhrvik, J. (1999). Aligned Ranks: A Method of Gaining Efficiency in Rank Tests. http://www.stat.fi/isi99/proceedings/arkisto/varasto/hrvi0423.pdf • van Elteren, P. H. (1960). On the Combination of Independent Two Sample Tests of Wilcoxon. Bulletin of the Institute of International Statistics, 37, 351-361.

More Related