230 likes | 248 Views
Explore the concepts, effects, and misconceptions of interaction for binary outcomes in logistic regression. Learn about epistasis, effect modification, risk difference, and biologic interaction. Dive into logistic regression advantages and statistical interpretations.
E N D
What is Interaction for A Binary Outcome? Chun Li Department of Biostatistics Center for Human Genetics Research September 19, 2007
What We Have Learned • Little. • Generic. • In linear regression: y = β0 + β1x1 + β2x2 + β3x1x2 • In whatever other regression, the right-hand side is β0 + β1x1 + β2x2 + β3x1x2 • For a binary outcome, we often use logistic regression. For example, the log-odds of cancer risk log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking “main effect” “interaction effect”
Interaction • Introduced by R. A. Fisher to generalize the concept “epistasis” in genetics. • The concept is ubiquitous. • The word sounds easy to understand, and is charismatic in some circles. • Ambiguous without model context. • Hard to interpret and translate to reality for some models, such as logistic regression.
Epistasis • Example: Genotype BB masks the effect of gene A. • It is a very special type of interaction. • Such a phenomenon can be seen in other contexts, e.g. gene-environment interaction.
“No Interaction” ≠ Independence • Interaction is about the joint effect of input variables on an outcome, or how the effect change as the values change at the input variables. • Independence is about the statistical relationship between input variables, irrespective of the outcome or the effect on the outcome. • Using “independent effect” to describe “no interaction” may be confusing.
Interaction = Effect Modification • Effect modification:The effect of one variable on the outcome is modified depending on the values of other variables. • It depends on how “effect” is measured and on what scale. ― Kenneth Rothman, Sander Greenland • For a binary outcome, “effect” can be measured as • risk difference • risk ratio • odds ratio
Equivalent Measuring Effect: Risk Difference “Effect” of smoking: R01 – R00 (in males) R11 – R10 (in females) If gender doesn’t modify the “effect” of smoking, then R01 – R00 = R11 – R10 R11 – R00 = (R10 – R00) + (R01 – R00) RR11 – 1 = (RR10 – 1) + (RR01 – 1) additive decomposition of risk: Rij = ai + bj = R•1 – R•0 (!) = (R1• – R0•) +(R•1 – R•0) , where RRij = Rij / R00
Equivalent Measuring Effect: Risk Ratio “Effect” of smoking: R01 / R00 (in males) R11 / R10 (in females) If gender doesn’t modify the “effect” of smoking, then R01 / R00 = R11 / R10 RR11 = RR10× RR01 RR11 = (R1• /R0•) × (R•1 /R•0) multiplicative decomposition of risk: Rij = ci×dj = R•1 / R•0(!)
Equivalent Measuring Effect: Odds Ratio “Effect” of smoking: O01 / O00 (in males) O11 / O10 (in females) O** = R**/(1 – R**) If gender doesn’t modify the “effect” of smoking, then O01 / O00 = O11 / O10 OR11 = OR10× OR01 , where ORij = Oij /O00 additive decomposition of log-odds ln(Oij) Even if gender doesn’t modify the effect of smoking, smoking’s marginal effect may be different from its gender-specific effect !?! ≠O•1 / O•0 in general (?!?)
Modification Interaction = Effect Measure “No interaction” under one definition often means interaction under another definition. Results from interaction analysis should be always reported with the scale that was used to measure effect. Some effect measures are intuitive, some are not intuitive and even not intrinsically consistent.
Biologic Interaction • Biologic interaction = biologically causal interaction. • Greenland and Rothman argued that “biologic interaction” is reflected by departure from additive risks. • Counterfactual arguments • Causal pie arguments • Additive definition is difficult to test directly in case-control studies.
Advantages of Logistic Regression • For retrospective studies (e.g., case-control studies), risk difference and risk ratio cannot be estimated and analyzed. But odds ratio can! • Odds ratio doesn’t have boundary effect. Both risk difference and risk ratio do: • Interaction effect must exist under some circumstances. • May cause problems computationally. • Odds ratio ≈ risk ratio, when risks are very small.
Misconception 1 Interaction terms are treated the same way as main-effect terms: • Numerical comparison between an interaction coefficient and a main-effect coefficient. • (logistic regression) Power to detect interaction when “interaction explains half of the total effect.” • (logistic regression) “Odds ratio” of the interaction. • Fact: They are oranges and apples.
Misconception Reinforced by Software • Stata output: . logistic case v1 v2 v12 Logistic regression Number of obs = 1530 LR chi2(3) = 12.93 Prob > chi2 = 0.0048 Log likelihood = -878.77373 Pseudo R2 = 0.0073 ------------------------------------------------------------------------------ case | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- v1 | 1.52674 .8978875 0.72 0.472 .4821329 4.83463 v2 | .7779552 .4651644 -0.42 0.675 .2409871 2.511397 v12 | 1.004005 .3277949 0.01 0.990 .5294554 1.903893 ------------------------------------------------------------------------------
β1 Interaction in Logistic Regression μij = log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking μ00=β0 μ01=β0 + β2 μ10=β0 + β1 μ11=β0 + β1 + β2 + β3 β2 Coefficient β exp(β) β1 = μ10 – μ00O10 / O00 β2 = μ01 – μ00O01 / O00 β3 = (μ11 – μ10) – (μ01 – μ00) (O11 / O10) / (O01 / O00) Baseline ORs Ratio of odds ratios
Misconception 2 Interpret main-effect terms when interaction terms are included in the model: • Evaluation of statistical significance of “main-effect”. • Fact: Main-effect term should always be included in the model as long as it is involved in some interaction terms. • A main-effect coefficient is interpreted as the magnitude of “main effect” or “marginal effect”. • Fact: Main-effect coefficient of variable X represents its “baseline effect” when all variables “interacting” with X are zero (i.e. at baseline). • Its interpretation depends on how other variables are coded (i.e. where the baselines are).
Significance of a Main-Effect Term in Logistic Regression μ00=β0 μ01=β0 + β2 μ10=β0 + β1 μ11=β0 + β1 + β2 + β3 Statistical significance of a term ≡ if it can be removed. μij = log(Oij) = β0 + β1×sex + β2×smoking + β3×sex×smoking What would happen if β2 = 0? This means differently when sex is coded differently.
y x G = 1 (group B) G = 0 (group A) a b One Input Variable is Continuous Y = β0 + β1G + β2X + β3G×X A: YA = β0 + β2X B: YB = (β0 + β1) + (β2 + β3)X β1 = YB – YA when X = 0 β2 = slope for group A β3 = difference in slopes (B – A) Not marginal effects β1 = 0 → same Y when X = 0. β2 = 0 → group A is flat. β3 = 0 → equal slopes. often extrapolative and meaningless
Misconception 3 • If a set of variables/genes together with all possible combinations among them (i.e. allowing full interactions) significantly predict the outcome, then we have found interaction among these variables. • Fact: Interaction is about departure from additive effects. The variables may just have additive effects without interaction.
Do We Want Generic Interaction? A gene is identified to metabolize a carcinogen. Allele A is the putative susceptibility allele. Goal: Is the risk elevated for those who have carcinogen exposure and carry the risk allele? Data from Piegorsch et al. (1994) Generic interaction H0: 4 parameters Ha: 6 parameters DF = 2, p = 0.19
Do We Want Generic Interaction? Approach 2 H0: 2 groups Ha: 4 groups DF = 2, p = 0.037 Approach 3 H0: 1 group Ha: 3 groups DF = 2, p = 0.017 Approach 4 H0: 1 group Ha: 2 groups DF = 1, p = 0.0043
Testing for Interaction While Adjusting for Other Covariates μage, 00= (β0 + β4age) μage, 01= (β0 + β4age) + β2 μage, 10= (β0 + β4age) + β1 μage, 11= (β0 + β4age) + β1 + β2 + β3 μage, ij = log(Oage, ij) = β0 + β4age + β1sex + β2smoking + β3sex×smoking We are testing for interaction under the assumption that the effects of sex, smoking, and sex×smoking are the same over the whole ranges of the covariates.