1 / 41

Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample.

Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant Breast and Bowel Project (NSABP) University of Pittsburgh, Department of Statistics Joint Work With John Bryant, PhD

kamala
Download Presentation

Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Controlling the Experimentwise Type I Error Rate When Survival Analyses Are Planned for Subsets of the Sample. Greg Yothers, MA National Surgical Adjuvant Breast and Bowel Project (NSABP) University of Pittsburgh, Department of Statistics Joint Work With John Bryant, PhD Director of the NSABP University of Pittsburgh, Departments of Statistics and Biostatistics

  2. This work concerns the design and analysis of clinical trials to compare treatment to control where we wish to test the primary hypothesis on several subgroups in addition to the global test. • Unless steps are taken to control for multiple comparisons, the type I error rate will be inflated in this situation. • Controlling for multiple comparisons generally leads to a loss of power so that subgroup analyses are often avoided. However, subgroup analyses often serve a legitimate scientific purpose, and should not be entirely avoided.

  3. To address this problem, we propose a method whereby a pre-specified experimentwise alpha is “spent” or allocated among the global (stratified) test and the constituent subset (stratum-level) tests. • We find the method to be efficient in terms of experimentwise power when the treatment effect in each stratum is in the same direction and the magnitude of the range of treatment effects between strata is not too great. • The procedure can be used to make the design of a clinical trial robust against the presence of a treatment by strata interaction when a significant interaction is not anticipated.

  4. Outline • Motivating Example - NSABP Protocol B-29. • Define Experimentwise Type I Error Rate . • Common methods of dealing with subgroup testing: How do they control Type I error rate? • Multiple testing approach: Perform all tests at reduced nominal levels of significance so that the experimentwise Type I error rate is controlled. • Exploration of how to spend alpha on the individual tests to achieve ‘good’ operating characteristics for the overall experiment.

  5. NSABP B-29 Schema T1 or T2 or T3; pN0; M0ER-Positive Decision to useChemotherapy* No Chemotherapy Chemotherapy Stratification • Age • Pathologic Tumor Size Stratification • Age • Pathologic Tumor Size Tamoxifen Tamoxifen + Octreotide AC + Tamoxifen AC + Tamoxifen + Octreotide Group 1 Group 2 Group 3 Group 4 * The decision to use AC chemotherapy must be made prior to randomization.

  6. Design Considerations • H0: Relative Risk = 1, • Power  .8 to detect Relative Risk  .75, using a .05-level two-sided stratified log-rank test. • Power requirements and assumptions about rates of accrual dictate the following: i) Accrual of 3,000 patients over 5 years with 3 years additional follow-up. ii) Final analysis following the 400th event.

  7. Physicians involved in the design of the trial thought the effect of Octreotide would be unlikely to materially interact with chemotherapy status. • In planning the trial it was felt to be important to provide for individual tests for the effect of Octreotide in the presence of chemotherapy as well as in its absence. • It was considered unacceptable to treat these subgroup analyses as post-hoc, or exploratory, so it was necessary to design an analysis plan that controlled for the experimentwise error rate.

  8. Definition Experimentwise Type I Error Rate The probability of finding a significant difference between treatment and control on either the overall stratified test or any of the stratum-specific tests given that no difference exists.

  9. Common approaches to controlling experimentwise Type I error rate • Unprotected Subgroup Tests – Perform the overall stratified test at level ; follow-up with stratum-specific  level tests. • Protected Subgroup Tests – Perform the overall stratified test at level ; follow-up with stratum-specific  level tests only if treatment-by-strata interaction is significant. • Protected Subgroup Tests – Test for treatment-by-strata interaction at level . If interaction is significant, test for treatment effect individually in each stratum at level . If interaction is not significant, test for overall treatment effect at level .

  10. Equivalence of Protection Schemes • The two alternatives for protecting the stratum specific tests are actually quite similar in operating characteristics, since if both interaction test and the overall stratified test are significant, it is almost certain that at least one stratum level test will also be significant. • It can be shown that this is true with probability one in the case of k = 2 strata.

  11. Experimentwise Level of Significance

  12. Range of experimentwise type I error rate for protected and unprotected schemes. All tests performed at  = .05

  13. Multiple testing approach • We now consider a multiple testing approach where one performs an overall test for treatment effect based on the stratified log-rank statistic followed by tests within each stratum. • All tests are carried out at reduced levels of significance so that the experimentwise level of significance is maintained at a specified rate.

  14. Let RRi represent the relative risk in the ith stratum.

  15. Definition Experimentwise Power The probability of detecting at least one significant difference during the multiple testing procedure given the true RR in each stratum. When the true RR in each stratum is 1, we refer to the power as the Type I error rate.

  16. The experimentwise power against a specific alternate hypothesis can be written as: Where  denotes the standard normal density, and the integral is taken over the acceptance region defined by:

  17. Using the simplified region of integration, we can rewrite the power as follows: Where  is the CDF of the standard normal distribution.

  18. These results generalize to k strata as follows:

  19. The multiple integral in the previous equation can be difficult to evaluate when the number of strata goes beyond about 3 or 4. • Fortunately there is a recursive representation of the power function that facilitates computation when there are many strata.

  20. An S-Plus function implementing the recursive method of calculating power is available.

  21. How should we spend alpha? • The question arises as to how the type I error rates should be divided between the overall and the stratum-specific tests, or rather, how much alpha should be spent on the stratum-specific tests. • For k = 2 strata and exper = 0.05, the table and figure which follow show a variety of combinations of the nominal size of the overall test(0)and the nominal size of the within stratum tests(1 & 2). • For simplicity, we only consider the case where 1= 2. The possibilities form a continuum between (.05, 0) (no stratum specific tests) to (0, .0253) (no overall test). • Given exper, 0, and the constraint 1 = 2, the common value of 1 & 2is a function of a (the proportion of events in the first stratum), however the effect of varying a is weak.

  22. Conclusion • The alpha spending approach described here is very efficient and effective when the treatment effect is in the same direction in each stratum and there may or may not be small to moderate differences in the size of the effect between strata. • The method is also sensitive to the balance of allocation of patients (events) to the two strata. When the sizes of the stratum level tests are equal, the approach seems to be quite effective when the balance is no worse than about 3 to 1. We suggest spending more alpha on the stratum with the most patients (events) when the number of patients is out of balance.

  23. Spending between ½ and 1 percent of alpha (setting 0 equal to .045 to .04) would seem to be a prudent choice for k = 2 strata and the range of circumstances explored in this paper when substantial interaction is thought to be unlikely apriori. • When there is no overall effect but there may be offsetting effects between the strata, the alpha spending approach is not very powerful. Designing the trial for a test for interaction would be much more effective in this situation. If one were to use the multiple testing procedure in this situation, most of the alpha should be spent on the within strata tests.

  24. In the design of NSABP protocol B-29, we expected little or no interaction and nearly equal accrual to the two stratum levels. Given our design assumptions in B-29, we spent about ½ % of alpha on stratum level tests and set the size of the stratum level tests equal. If we had anticipated unequal accrual to the strata or significant interaction, we likely would have altered our choices. Our choice of alpha spending (0  0.045, 1 = 2  0.006), proved to preserve power in the presence of mild perturbations of design assumptions.

  25. The tools described in this paper can be adapted to the design of other potential trials. Given prior beliefs regarding the likelihood of significant treatment-strata interaction, balance of accrual to the stratum levels, and other factors, one can explore the sensitivity of power to design assumptions and parameters much as we have in the latter part of this paper.

More Related