1.19k likes | 1.7k Views
Workshop Moderated Regression Analysis. EASP summer school 2008, Cardiff Wilhelm Hofmann. Overview of the workshop. Introduction to moderator effects Case 1: continuous continuous variable Case 2: continuous categorical variable Higher-order interactions Statistical Power
E N D
Workshop Moderated Regression Analysis EASP summer school 2008, Cardiff Wilhelm Hofmann
Overview of the workshop • Introduction to moderator effects • Case 1: continuous continuous variable • Case 2: continuous categorical variable • Higher-order interactions • Statistical Power • Outlook 1: dichotomous DVs • Outlook 2: moderated mediation analysis
Main resources • The Primer: Aiken & West (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. • Cohen, Aiken, & West (2004). Regression analysis for the behavioral sciences, [Chapters 7 and 9] • West, Aiken, & Krull (1996). Experimental personality designs: Analyzing categorical by continuous variable interactions. Journal of Personality, 64, 1-48. • Whisman & McClelland (2005). Designing, testing, and interpreting interactions and moderator effects in family research. Journal of Family Psychology, 19, 111-120. • This presentation, dataset, syntaxes, and excel sheets available at Summer School webpage!
M X Y What is a moderator effect? • Effect of a predictor variable (X) on a criterion (Z) depends on a third variable (M), the moderator • Synonymous term: interaction effect
Examples from social psychology • Social facilitation: Effect of presence of others on performance depends on the dominance of responses (Zajonc, 1965) • Effects of stress on health dependent on social support (Cohen & Wills, 1985) • Effect of provocation on aggression depends on trait aggressiveness (Marshall & Brown, 2006)
Simple regression analysis Y b1 b0 X
Low M b1 b0 X Multiple regression with additive predictor effects intercept High M • The intercept of regression of Y on X depends upon the specific value of M • Slope of regression of Y on X (b1) stays constant Medium M Y b2
Multiple regression including interaction among predictors X M Y XM
Multiple regression including interaction among predictors intercept slope • The slope and intercept of regression of Y on X depends upon the specific value of M • Hence, there is a different line for every individual value of M (simple regression line) Y High M Medium M Low M X
Regression model with interaction: quick facts • The interaction is carried by the XM term, the product of X and M • The b3 coefficient reflects the interaction between X and M only if the lower order terms b1X and b2M are included in the equation! • Leaving out these terms confounds the additive and multiplicative effects, producing misleading results • Each individual has a score on X and M. To form the XM term, multiply together the individual‘s scores on X and M.
Regression model with interaction • There are two equivalent ways to evaluate whether an interaction is present: • Test whether the increment in the squared multiple correlation (R2) given by the interaction is significantly greater than zero • Test whether the coefficient b3 differs significantly from zero • Interactions work both with continuous and categorical predictor variables. In the latter case, we have to agree on a coding scheme (dummy vs. effects coding) • Workshop Case I: continous continuous var interaction • Workshop Case II: continuous categorical var interaction
Case 1: both predictors (and the criterion) are continuous X: height M: age Y: life satisfaction • Does the effect of height on life satisfaction depend on age? height age Life Sat heightage
Advanced organizer for Case 1 • I) Why median splits are not an option • II) Estimating, plotting, and interpreting the interaction • Unstandardized solution • Standardized solution • III) Inclusion of control variables • IV) Computation of effect size for interaction term
I) Why we all despise median splits: The costs of dichotomization For more details, see Cohen, 1983; Maxwell & Delaney, 1993; West, Aiken, & Krull, 1996) • So why not simply split both X and M into two groups each and conduct ordinary ANOVA to test for interaction? • Disadvantage #1: Median splits are highly sample dependent • Disadvantage #2: drastically reduced power to detect (interaction) effects by willfully throwing away useful information • Disadvantage #3: in moderated regression, median splits can strongly bias results
II) Estimating the unstandardized solution • Unstandardized = original metrics of variables are preserved • Recipe • Center both X and M around the respective sample means • Compute crossproduct of cX and cM • Regress Y on cX, cM, and cX*cM
Why centering the continuous predictors is important • Centering provides a meaningful zero-point for X and M (gives you effects at the mean of X and M, respectively) • Having clearly interpretable zero-points is important because, in moderated regression, we estimate conditional effects of one variable when the other variable is fixed at 0, e.g.: • Thus, b1 is not a main effect, it is a conditional effect at M=0! • Same applies when viewing effect of M on Y as a function of X. • Centering predictors does not affect the interaction term, but all of the other coefficients (b0, b1, b2) in the model • Other transformations may be useful in certain cases, but mean centering is usually the best choice
SPSS Syntax *unstandardized. *center height and age (on grand mean) and compute interaction term. DESC var=height age. COMPUTE heightc = height - 173 . COMPUTE agec = age - 29.8625. COMPUTE heightc.agec = heightc*agec. REGRESSION /STATISTICS = R CHA COEFF /DEPENDENT lifesat /METHOD=ENTER heightc agec /METHOD=ENTER heightc.agec.
SPSS output Do not interpret betas as given by SPSS, they are wrong! b0 b1 b2 b3 Test of significance of interaction
Plotting the interaction • SPSS does not provide a straightforward module for plotting interactions… • There is an infinite number of slopes we could compute for different combinations of X and M • Minimum: We need to calculate values for high (+1 SD) and low (-1 SD) X as a function of high (+1 SD) and low (-1 SD) values on the moderator M
Unstandardized PlotCompute values for the plot either by hand… Effect of height on life satisfaction • 1 SD below the mean of age (M) -1 SD of height: +1 SD of height: • 1 SD above the mean of age (M) -1 SD of height: +1 SD of height:
… or let Excel do the job! Adapted from Dawson, 2006
Interpreting the unstandardized plot: Effect of height moderated by age Intercept; LS at mean of height and age (when both are centered) Simple slope of height at mean age b = .034 Change in the slope of height for eachone-unit increase in age Change in the slope of height for a 1 SDincrease in age b = .034+(-.008*4.9625) = -.0057 Simple slope of age at mean height (difficult to illustrate) 163 173 183 Mean Height
Interpreting the unstandardized plot: Effect of age moderated by height Intercept; LS at mean of age and height (when centered) Simple slope of age at mean height b = .017+(-.008*9.547) = -.059 Change in the slope of age for a 1 SD increase in height Change in the slope of age for each one-unit increase in height b = .017 Simple slope of height at mean age (difficult to illustrate)
Estimating the proper standardized solution • Standardized solution (to get the beta-weights) • Z-standardize X, M, and Y • Compute product of z-standardized scores for X and M • Regress zY on zX, zM, and zX*zM • The unstandardized solution from the output is the correct solution (Friedrich, 1982)!
Why the standardized betas given by SPSS are false • SPSS takes the z-score of the product (zXM) when calculating the standardized scores. • Except in unusual circumstances, zXM is different from zxzm, the product of the two z-scores we are interested in. • Solution (Friedrich, 1982): feed the predictors on the right into an ordinary regression. The Bs from the output will correspond to the correct standardized coefficients.
SPSS Syntax *standardized. *let spss z-standardize height, age, and lifesat. DESC var=height age lifesat/save. *compute interaction term from z-standardized scores. COMPUTE zheight.zage = zheight*zage. REGRESSION /DEPENDENT zlifesat /METHOD=ENTER zheight zage /METHOD=ENTER zheight.zage.
SPSS output • Side note: What happens if we do not standardize Y? →Then we get so-called half-standardized regression coefficients (i.e., How does one SD on X/M affect Y in terms of original units?)
Standardized plot = .240 Change in the beta of height for a 1 SDincrease in age = .240+(-.270*1) = -.030
Simple slope testing • Test of interaction term: Does the relationship between X and Y reliably depend upon M? • Simple slope testing: Is the regression weight for high (+1 SD) or low (-1 SD) values on M significantly different from zero?
-1 SD -1 SD -1 SD 0 0 0 +1 SD +1 SD +1 SD Simple slope testing • Best done for the standardized solution • Simple slope testing for low (-1 SD) values of M • Add +1 (sic!) to M • Simple slope test for high (+1 SD) values of M • Subtract -1 (sic!) from M • Now run separate regression analysis with each transformed score Add 1 SD original scale(centered) Subtract 1 SD
SPSS Syntax ***simple slope testing in standardized solution. *regression at -1 SD of M: add 1 to zage in order to shift new zero point one sd below the mean. compute zagebelow=zage+1. compute zheight.zagebelow=zheight*zagebelow. REGRESSION /DEPENDENT zlifesat /METHOD=ENTER zheight zagebelow /METHOD=ENTER zheight.zagebelow. *regression at +1 SD of M: subtract 1 to zage in order to shift new zero point one sd above the mean. compute zageabove=zage-1. compute zheight.zageabove=zheight*zageabove. REGRESSION /DEPENDENT zlifesat /METHOD=ENTER zheight zageabove /METHOD=ENTER zheight.zageabove.
Illustration = .509, p = .003 = -.030, p = .844
III) Inclusion of control variables • Often, you want to control for other variables (covariates) • Simply add centered/z-standardized continuous covariates as predictors to the regression equation • In case of categorical control variables, effects coding is recommended • Example: Depression, measured on 5-point scale (1-5) with Beck Depression Inventory (continuous)
SPSS COMPUTE deprc =depr – 3.02. REGRESSION /DEPENDENT lifesat /METHOD=ENTER heightc agec deprc /METHOD=ENTER agec.heightc.
A note on centering the control variable(s) • If you do not center the control variable, the intercept will be affected since you will be estimating the regression at the true zero-point (instead of the mean) of the control variable. Depression centered Depression uncentered (intercept estimated at meaningless value of 0 on the depr. scale)
IV) Effect size calculation • Beta-weight () is already an effect size statistic, though not perfect • f2 (see Aiken & West, 1991, p. 157)
Calculating f2 Squared multiple correlation resulting from combined prediction of Y by the additive set of predictors (A) and their interaction (I) (= full model) Squared multiple correlation resulting from prediction by set A only (= model without interaction term) • In words: f2 gives you the proportion of systematic variance accounted for by the interaction relative to the unexplained variance in the criterion • Conventions by Cohen (1988) • f2= .02: small effect • f2= .15: medium effect • f2= .26: large effect
Example small to medium effect
Case 2: continuous categorical variable interaction (on continous DV) • Ficticious example • X: Body height (continuous) • Y: Life satisfaction (continuous) • M: Gender (categorical: male vs. female) • Does effect of body height on life satisfaction depend on gender? Our hypothesis: body height is more important for life satisfaction in males
Advanced organizer for Case 2 • I) Coding issues • II) Estimating the solution using dummy coding • Unstandardized solution • Standardized solution • III) Estimating the solution using unweighted effects coding • (Unstandardized solution) • Standardized solution • IV) What if there are more than two levels on categorical scale? • V) Inclusion of control variables • VI) Effect size calculation
I) Coding options • Dummy coding (0;1): • Allows to compare the effects of X on Y between the reference group (d=0) and the other group(s) (d=1) • Definitely preferred, if you are interested in the specific regression weights for each group • Unweighted effects coding (-1;+1): yields unweighted mean effect of X on Y across groups • Preferred, if you are interested in overall mean effect (e.g., when inserting M as a nonfocal variable); all groups are viewed in comparison to the unweighted mean effect across groups • Results are directly comparable with ANOVA results when you have 2 or more categorical variables • Weighted effects coding: takes also into account sample size of groups • Similar to unweighted effects coding except that the size of each group is taken into consideration • useful for representative panel analyses • Dummy coding (0;1): • Allows to compare the effects of X on Y between the reference group (d=0) and the other group(s) (d=1) • Definitely preferred, if you are interested in the specific regression weights for each group • Unweighted effects coding (-1;+1): yields unweighted mean effect of X on Y across groups • Preferred, if you are interested in overall mean effect (e.g., when inserting M as a nonfocal variable); all groups are viewed in comparison to the unweighted mean effect across groups • Results are directly comparable with ANOVA results when you have 2 or more categorical variables • Weighted effects coding: takes also into account sample size of groups • Similar to unweighted effects coding except that the size of each group is taken into consideration • useful for representative panel analyses
II) Estimating the unstandardized solution using dummy coding • Unstandardized solution • Dummy-code M (0=reference group; 1=comparison group) • Center X cX • Compute product of cX and M • Regress Y on cX, M, and cX*M
SPSS Syntax *Create dummy coding. IF (gender=0) genderd = 0 . IF (gender=1) genderd = 1 . *center height (on grand mean) and compute interaction term. DESC var=height. COMPUTE heightc =height - 173 . *Compute product term. COMPUTE genderd.heightc = genderd*heightc. *Regress lifesat on heightc and genderd, adding the interaction term. REGRESSION /DEPENDENT lifesat /METHOD=ENTER heightc genderd /METHOD=ENTER genderd.heightc.
SPSS output b0 b1 b2 b3