290 likes | 317 Views
Learn how Factorial Trials are used to simultaneously test multiple interventions, yielding valuable insights into treatment interactions and balancing effects to prevent CNS relapse in ALL patients.
E N D
Case Study 7Multi-tasking via Factorial Trials (1):Preventing CNS relapse in ALLSeeTubergen, et al. and colleagues in the (former) Children’s Cancer Group; JCO11: 520-26, 1993.Multi-tasking via Factorial Trials (2):The Linxian Cataract Prevention TrialSee the General Population Trial in Sperduto, et al.; Arch. Ophthalmol. 111: 1246-1253, 1993.
Materials Preventing CNS Relapse in ALL: Tubergen paper Cataract Prevention: Sperduto paper Background on Factorials: Encyclopedia of Biostatistics article Key Words: Factorial trials; Design; power calculations; odds ratios; subgroups; multiple comparisons Rick Chappell, Ph.D. Professor, Department of Biostatistics and Medical Informatics University of Wisconsin Medical School Stat 542 – Spring 2018, Week ??
A. Background Definition of Factorial Trial: A Factorial Design is a Parallel Armtrial (each patient receives a single combination of treatments, as opposed to Crossover trials, in which they receive 2 or more) which simultaneously examines the effect of 2 or more interventions (Factors). Each factor has two or more possible values (Levels). Factorial designs are always Complete (“fully-crossed”), meaning every combination of factors is present; and usually at least approximately Balanced, meaning all combinations have about the same sample size.
A. Background Definition of Factorial Trial: A Factorial Design is a Parallel Armtrial (each patient receives a single combination of treatments, as opposed to Crossover trials, in which they receive 2 or more) which simultaneously examines the effect of 2 or more interventions (Factors). Each factor has two or more possible values (Levels). Factorial designs are always Complete (“fully-crossed”), meaning every combination of factors is present; and usually at least approximately Balanced, meaning all combinations have about the same sample size. Why might they not be exactly balanced?
Our examples: The Linxian General Population Trial has 4 dietary supplement factors each with 2 levels, abbreviated as a “2 x 2 x 2 x 2 factorial” with potentially 16 treatment combinations but actually only 8. See below. The CCG ALL trial has 2 CNS and 4 systemic regimens, making it a “2 x 4 factorial” with 8 combinations. Factors are often abbreviated “A, B, ...”; Levels are often abbreviated “- / +” or “1 / 2 / ...”.
Advantages of Factorial Trials: We get 2 (or more) trials for the price of one; that is, two or more interventions are simultaneously tested on the same patients. We get important information on interactions or synergies between the treatments. An important - perhaps the most important - result of the Tubergen ALL study is the clinical consequence of simultaneous CNS prophylaxis and intensive systemic therapy. Each level of each factor is balanced with respect to the other factors and so treatment effects can be simply computed. E.g., for a 2 x 2 trial, each level of treatment A has the same 50% - 50% composition of Treatment B’s levels. There is no confounding.
Disadvantages of Factorial Trials: Logistically complex. In the Linxian study “A factorial trial of each of the nine nutrients was impractical [p. 1247]”. Interpretation may be artificial. In a 2 x 2 study with factors A and B, we will estimate the overall effect of A ”in a population which has approximately 50% in each of the two levels of B”. Subgroup analyses of the effect of A in patients receiving Level 1 of B (etc.) are possible, but they use only half the patients and have lower power. If there is a negative interaction (negative synergy) then power can be greatly reduced.
If there is a negative interaction (negative synergy), power can be greatly reduced: Suppose we have a 2 x 2 trial with no interaction; the effect of A is the same regardless of the level of B and conversely: A Level 1, B Level 1: 50% response A Level 2, B Level 1: 60% A Level 1, B Level 2: 55% A Level 2, B Level 2: 65% Then the average effect of A is [(60% - 50%) + (65% - 55%)]/2 = 10%. The effect of B, averaged over levels of A, is [(55% - 50%) + (65% - 60%)]/2 = 5%.
If there is a positive interaction (effect of combining two treatments is greater than the sums of the individual effects) then we have no problem: A Level 1, B Level 1: 50% response A Level 2, B Level 1: 60% A Level 1, B Level 2: 55% A Level 2, B Level 2: 75% Then the average effect of A is [(60% - 50%) + (75% - 55%)]/2 =15%. The effect of B, averaged over levels of A, is [(55% - 50%) + (75% - 60%)]/2 = 10%. Power is higher than in the no-interaction case.
If there is a negative interaction (effect of combining two treatments is less than the sums of the individual effects) then our power can be greatly reduced: A Level 1, B Level 1: 50% A Level 2, B Level 1: 60% A Level 1, B Level 2: 55% A Level 2, B Level 2: 60% Then the average effect of A is [(60% - 50%) + (60% - 55%)]/2 = 7.5%. The effect of B, averaged over levels of A, is [(55% - 50%) + (60% - 60%)]/2 = 2.5%. Since sample size is proportional to 1/effect squared, we no need a (5/2.5)2 = 4 times larger n to detect the effect of B.
B. The Linxian General Population Trial The Linxian General Population Trial has 4 dietary supplement factors each with 2 levels, abbreviated as a “2 x 2 x 2 x 2 factorial”. From the Abstract Objective: “To determine whether the vitamin/mineral supplements used in two cancer intervention trials affected the risk of developing age-related cataracts.” Participants: “... 3,249 ... 45 to 74 years. ... [a] nutritionally deprived population ... ”. Main Outcome Measures: “[5-6 year] prevalence of nuclear, cortical, and posterior subcapsular cataracts in treatment groups at end of trials.”
From the Abstract Interventions: “... factorial design to test the effect of four different vitamin/mineral combinations in trial 2 ...” This implies 2 x 2 x 2 x 2 = 16 combinations. However [see Table 2 for treatment definitions], they only administered 8 combinations: Placebo, AB, AC, AD, BC, BD, and ABCD. Thus it is an incomplete factorial.
From the Abstract Interventions: “... factorial design to test the effect of four different vitamin/mineral combinations in trial 2 ...” This implies 2 x 2 x 2 x 2 = 16 combinations. However [see Table 2 for treatment definitions], they only administered 8 combinations: Placebo, AB, AC, AD, BC, BD, CD, and ABCD. Thus it is an incomplete factorial. Why didn’t they administer any single or triplet factor combinations? Why did they combine nutrients at all?
From the Abstract Interventions: “... factorial design to test the effect of four different vitamin/mineral combinations in trial 2 ...” This implies 2 x 2 x 2 x 2 = 16 combinations. However [see Table 2 for treatment definitions], they only administered 8 combinations: Placebo, AB, AC, AD, BC, BD, CD, and ABCD. Thus it is an incomplete factorial. Why didn’t they administer any single or triplet factor combinations? Why did they combine nutrients at all? Remember the “double-dummy” logistical problems in the Binkley Vitamin K case study!
From the Abstract Interventions: “... factorial design to test the effect of four different vitamin/mineral combinations in trial 2 ...” This implies 2 x 2 x 2 x 2 = 16 combinations. However [see Table 2 for treatment definitions], they only administered 8 combinations: Placebo, AB, AC, AD, BC, BD, CD, and ABCD. Thus it is an incomplete factorial. Why didn’t they administer any single or triplet factor combinations? What information is lost by their absence?
Remember the disadvantage: If there is a negative interaction (negative synergy) then power can be greatly reduced. How does this apply to the Linxian trial? Suppose that any supplement prevents cataracts to the same extent, but that the effects aren’t cumulative: more than one supplement doesn’t help beyond the effects of a single one (negative interaction; negative synergy). And Suppose that we want to examine the effect of nutrition factor A:
Suppose that we want to examine the effect of nutrition factor A – we compare the top group of patients (with ”A”) to the bottom group (without “A”): AB (retinol, riboflavin, niacin) AC (retinol, C, Mo) AD (retinol, Se, E, b-Carotene) ABCD (retinol, riboflavin, niacin, C, Mo, Se, E, b-Carotene) Placebo BC (riboflavin, niacin, C, Mo) BD (riboflavin, niacin, Se, E, b-Carotene) CD (C, Mo, Se, E, b-Carotene)
Suppose that we want to examine the effect of nutrition factor A – we compare the top group of patients (with ”A”) to the bottom group (without “A”): AB (retinol, riboflavin, niacin) AC (retinol, C, Mo) AD (retinol, Se, E, b-Carotene) ABCD (retinol, riboflavin, niacin, C, Mo, Se, E, b-Carotene) Placebo BC (riboflavin, niacin, C, Mo) BD (riboflavin, niacin, Se, E, b-Carotene) CD (C, Mo, Se, E, b-Carotene) Under the assumption of “any nutrient helps, but not additively”, 7 of the 8 treatments have the same effect.
Suppose that we want to examine the effect of nutrition factor A – we compare the top group of patients (with ”A”) to the bottom group (without “A”): AB (retinol, riboflavin, niacin) AC (retinol, C, Mo) AD (retinol, Se, E, b-Carotene) ABCD (retinol, riboflavin, niacin, C, Mo, Se, E, b-Carotene) Placebo BC (riboflavin, niacin, C, Mo) BD (riboflavin, niacin, Se, E, b-Carotene) CD (C, Mo, Se, E, b-Carotene) Under the assumption of “any nutrient helps, but not additively”, 7 of the 8 treatments (in red) have the same effect. Thus the effect of A is greatly diluted by the others.
Results of the Linxian Trial: Tables 7 and 8 give general results for Factors A, B, C and D on the 3 types of cataract, broken down by age where significant. Table 6 gives subgroup analyses by age group. The authors used logistic regression, a method to simultaneously adjust for important factors which are more common in one group than another (“confounding”). But because the factorial is balanced with respect to Treatments B & C (which, by design, have equal frequencies in the two levels of Treatment A) and randomization (which approximately balances all other factors), we don't need it.
Results of the Linxian Trial – Odds Ratios: Look at Table 7. It has percentages, which of course are easily interpretable (good!). Odds ratios are often reported because they are convenient for case-control studies and also, unlike percentages, their logarithms have +/- infinite range and so can be modeled. They are easily calculated: Odds = proportion/(1 – proportion). They are interpreted, e.g., as follows: "The odds ratio of nuclear cataract with respect to regimen A (riboflavin/niacin supplementation) vs. no A" = [.120 / (1 - .120)] / [.151 / (1 - .151)] =.77
Results of the Linxian Trial – Multiple Comparisons: To avoid the accusations of “fishing for a low p-value”, thereby inflating the type-I error (a), “a single outcome in a single group” is standard for a trial. This is sometimes impractical. A common (and the simplest) way to adjust for the problem is the Bonferroni Correction, where p-values are multiplied by the number of comparisons. That is rarely done for factorial studies, which may be considered several separate trials.
Results of the Linxian Trial – Multiple Comparisons: Here we have 4 age subgroups plus total = 5 x 3 pathologies plus overall = 4 x 3 interventions = 60 p-values or 20 for each intervention. Under the "grand null hypothesis" of no effect (anywhere) tested at the .05 level, we would expect on average approximately 60 x .05 = 3 false positives or 20 x .05 = 1 false positive per intervention. They didn’t use any corrections, even for the subgroup analyses.
Results of the Linxian Trial – Multiple Comparisons: Moral: Beware subgroup analyses, especially those which weren’t pre-specified. Even of those which were pre-specified unless some sort of formal adjustment is made (as in biomarker-specific trials).
Results of the Linxian Trial – Multiple Comparisons: Moral: Beware subgroup analyses, especially those which weren’t pre-specified. Even of those which were pre-specified unless some sort of formal adjustment is made (as in biomarker-specific trials). Do we adjust for multiple comparisons in a trial of an estrogen agonist in women with & without ER positive breast Ca?
Results of the Linxian Trial – Multiple Comparisons: Moral: Beware subgroup analyses, especially those which weren’t pre-specified. Even of those which were pre-specified unless some sort of formal adjustment is made (as in biomarker-specific trials). Do we adjust for multiple comparisons in a trial of an estrogen agonist in women with & without ER positive breast Ca? Do we adjust for multiple comparisons when we look at both efficacy and toxicity of a treatment?
C. The (Former) Children’s Cancer Group Trial of CNS Relapse in ALL: Background: the brain is a common site of relapse in leukemia patients; however, whole-brain radiation causes learning deficits, especially in young patients (~10 points IQ) Design: 2 x 4 factorial. Randomized 1,388 children with intermediate-risk ALL to intrathecal methotrexate (IT MTX) vs. cranial radiation (CXRT). Also randomized to 4 systemic therapies (3 ”intensified” BFM and 1 CCG standard). See Fig. 1. From the Abstract: Purpose: “This study (CCG 105) was designed in part to determine whether ... IT MTX could provide protection equivalent to ... CXRT ...”
Results: There were no overall differences between CXRT and ITMTX – see Table 2. However [p. 523], “There was a highly significant difference (p < .0001) caused by the increased proportion of CNS events on standard systemic therapy, especially for those who received IT MTX.” See Figure 4 and Table 5. There also may be an interaction with age. From the Abstract: Conclusion: “IT MTX ... provides protection from CNS relapse in patients with intermediate-risk ALL equivalent to that provided by CXRT if more intensive systemic therapy is given.”
Questions on the CCG ALL Trial • This study showed a negative interaction between CRT (vs. IC XRT) and intensified (vs. standard) systemic therapy. Why is that useful? • What clinical practice would result from separate trials of the two factors? • What kind of Bonferroni correction would make “p < .0001” greater than .05? • Why is age important here?