Workshop Moderated Regression Analysis

Workshop Moderated Regression Analysis EASP summer school 2008, Cardiff Wilhelm Hofmann

Overview of the workshop • Introduction to moderator effects • Case 1: continuous  continuous variable • Case 2: continuous  categorical variable • Higher-order interactions • Statistical Power • Outlook 1: dichotomous DVs • Outlook 2: moderated mediation analysis

Main resources • The Primer: Aiken & West (1991). Multiple regression: Testing and interpreting interactions. Newbury Park, CA: Sage. • Cohen, Aiken, & West (2004). Regression analysis for the behavioral sciences, [Chapters 7 and 9] • West, Aiken, & Krull (1996). Experimental personality designs: Analyzing categorical by continuous variable interactions. Journal of Personality, 64, 1-48. • Whisman & McClelland (2005). Designing, testing, and interpreting interactions and moderator effects in family research. Journal of Family Psychology, 19, 111-120. • This presentation, dataset, syntaxes, and excel sheets available at Summer School webpage!

M X Y What is a moderator effect? • Effect of a predictor variable (X) on a criterion (Z) depends on a third variable (M), the moderator • Synonymous term: interaction effect

Examples from social psychology • Social facilitation: Effect of presence of others on performance depends on the dominance of responses (Zajonc, 1965) • Effects of stress on health dependent on social support (Cohen & Wills, 1985) • Effect of provocation on aggression depends on trait aggressiveness (Marshall & Brown, 2006)

Simple regression analysis X Y

Simple regression analysis Y b1 b0 X

Multiple regression with additive predictor effects X M Y

Low M b1 b0 X Multiple regression with additive predictor effects intercept High M • The intercept of regression of Y on X depends upon the specific value of M • Slope of regression of Y on X (b1) stays constant Medium M Y b2

Multiple regression including interaction among predictors X M Y XM

Multiple regression including interaction among predictors intercept slope • The slope and intercept of regression of Y on X depends upon the specific value of M • Hence, there is a different line for every individual value of M (simple regression line) Y High M Medium M Low M X

Regression model with interaction: quick facts • The interaction is carried by the XM term, the product of X and M • The b3 coefficient reflects the interaction between X and M only if the lower order terms b1X and b2M are included in the equation! • Leaving out these terms confounds the additive and multiplicative effects, producing misleading results • Each individual has a score on X and M. To form the XM term, multiply together the individual‘s scores on X and M.

Regression model with interaction • There are two equivalent ways to evaluate whether an interaction is present: • Test whether the increment in the squared multiple correlation (R2) given by the interaction is significantly greater than zero • Test whether the coefficient b3 differs significantly from zero • Interactions work both with continuous and categorical predictor variables. In the latter case, we have to agree on a coding scheme (dummy vs. effects coding) • Workshop Case I: continous  continuous var interaction • Workshop Case II: continuous  categorical var interaction

Case 1: both predictors (and the criterion) are continuous X: height M: age Y: life satisfaction • Does the effect of height on life satisfaction depend on age? height age Life Sat heightage

The Data (available at the summer school homepage)

Descriptives

Advanced organizer for Case 1 • I) Why median splits are not an option • II) Estimating, plotting, and interpreting the interaction • Unstandardized solution • Standardized solution • III) Inclusion of control variables • IV) Computation of effect size for interaction term

I) Why we all despise median splits: The costs of dichotomization For more details, see Cohen, 1983; Maxwell & Delaney, 1993; West, Aiken, & Krull, 1996) • So why not simply split both X and M into two groups each and conduct ordinary ANOVA to test for interaction? • Disadvantage #1: Median splits are highly sample dependent • Disadvantage #2: drastically reduced power to detect (interaction) effects by willfully throwing away useful information • Disadvantage #3: in moderated regression, median splits can strongly bias results

II) Estimating the unstandardized solution • Unstandardized = original metrics of variables are preserved • Recipe • Center both X and M around the respective sample means • Compute crossproduct of cX and cM • Regress Y on cX, cM, and cX*cM

Why centering the continuous predictors is important • Centering provides a meaningful zero-point for X and M (gives you effects at the mean of X and M, respectively) • Having clearly interpretable zero-points is important because, in moderated regression, we estimate conditional effects of one variable when the other variable is fixed at 0, e.g.: • Thus, b1 is not a main effect, it is a conditional effect at M=0! • Same applies when viewing effect of M on Y as a function of X. • Centering predictors does not affect the interaction term, but all of the other coefficients (b0, b1, b2) in the model • Other transformations may be useful in certain cases, but mean centering is usually the best choice

SPSS Syntax *unstandardized. *center height and age (on grand mean) and compute interaction term. DESC var=height age. COMPUTE heightc = height - 173 . COMPUTE agec = age - 29.8625. COMPUTE heightc.agec = heightc*agec. REGRESSION /STATISTICS = R CHA COEFF /DEPENDENT lifesat /METHOD=ENTER heightc agec /METHOD=ENTER heightc.agec.

SPSS output Do not interpret betas as given by SPSS, they are wrong! b0 b1 b2 b3 Test of significance of interaction

Plotting the interaction • SPSS does not provide a straightforward module for plotting interactions… • There is an infinite number of slopes we could compute for different combinations of X and M • Minimum: We need to calculate values for high (+1 SD) and low (-1 SD) X as a function of high (+1 SD) and low (-1 SD) values on the moderator M

Unstandardized PlotCompute values for the plot either by hand… Effect of height on life satisfaction • 1 SD below the mean of age (M) -1 SD of height: +1 SD of height: • 1 SD above the mean of age (M) -1 SD of height: +1 SD of height:

… or let Excel do the job! Adapted from Dawson, 2006

Interpreting the unstandardized plot: Effect of height moderated by age Intercept; LS at mean of height and age (when both are centered) Simple slope of height at mean age b = .034 Change in the slope of height for eachone-unit increase in age Change in the slope of height for a 1 SDincrease in age b = .034+(-.008*4.9625) = -.0057 Simple slope of age at mean height (difficult to illustrate) 163 173 183 Mean Height

Interpreting the unstandardized plot: Effect of age moderated by height Intercept; LS at mean of age and height (when centered) Simple slope of age at mean height b = .017+(-.008*9.547) = -.059 Change in the slope of age for a 1 SD increase in height Change in the slope of age for each one-unit increase in height b = .017 Simple slope of height at mean age (difficult to illustrate)

Estimating the proper standardized solution • Standardized solution (to get the beta-weights) • Z-standardize X, M, and Y • Compute product of z-standardized scores for X and M • Regress zY on zX, zM, and zX*zM • The unstandardized solution from the output is the correct solution (Friedrich, 1982)!

Why the standardized betas given by SPSS are false • SPSS takes the z-score of the product (zXM) when calculating the standardized scores. • Except in unusual circumstances, zXM is different from zxzm, the product of the two z-scores we are interested in. • Solution (Friedrich, 1982): feed the predictors on the right into an ordinary regression. The Bs from the output will correspond to the correct standardized coefficients. 

SPSS Syntax *standardized. *let spss z-standardize height, age, and lifesat. DESC var=height age lifesat/save. *compute interaction term from z-standardized scores. COMPUTE zheight.zage = zheight*zage. REGRESSION /DEPENDENT zlifesat /METHOD=ENTER zheight zage /METHOD=ENTER zheight.zage.

SPSS output • Side note: What happens if we do not standardize Y? →Then we get so-called half-standardized regression coefficients (i.e., How does one SD on X/M affect Y in terms of original units?)

Standardized plot  = .240 Change in the beta of height for a 1 SDincrease in age  = .240+(-.270*1) = -.030

Simple slope testing • Test of interaction term: Does the relationship between X and Y reliably depend upon M? • Simple slope testing: Is the regression weight for high (+1 SD) or low (-1 SD) values on M significantly different from zero?

-1 SD -1 SD -1 SD 0 0 0 +1 SD +1 SD +1 SD Simple slope testing • Best done for the standardized solution • Simple slope testing for low (-1 SD) values of M • Add +1 (sic!) to M • Simple slope test for high (+1 SD) values of M • Subtract -1 (sic!) from M • Now run separate regression analysis with each transformed score Add 1 SD original scale(centered) Subtract 1 SD

SPSS Syntax ***simple slope testing in standardized solution. *regression at -1 SD of M: add 1 to zage in order to shift new zero point one sd below the mean. compute zagebelow=zage+1. compute zheight.zagebelow=zheight*zagebelow. REGRESSION /DEPENDENT zlifesat /METHOD=ENTER zheight zagebelow /METHOD=ENTER zheight.zagebelow. *regression at +1 SD of M: subtract 1 to zage in order to shift new zero point one sd above the mean. compute zageabove=zage-1. compute zheight.zageabove=zheight*zageabove. REGRESSION /DEPENDENT zlifesat /METHOD=ENTER zheight zageabove /METHOD=ENTER zheight.zageabove.

Simple slope testing: Results

Illustration  = .509, p = .003  = -.030, p = .844

III) Inclusion of control variables • Often, you want to control for other variables (covariates) • Simply add centered/z-standardized continuous covariates as predictors to the regression equation • In case of categorical control variables, effects coding is recommended • Example: Depression, measured on 5-point scale (1-5) with Beck Depression Inventory (continuous)

SPSS COMPUTE deprc =depr – 3.02. REGRESSION /DEPENDENT lifesat /METHOD=ENTER heightc agec deprc /METHOD=ENTER agec.heightc.

A note on centering the control variable(s) • If you do not center the control variable, the intercept will be affected since you will be estimating the regression at the true zero-point (instead of the mean) of the control variable. Depression centered Depression uncentered (intercept estimated at meaningless value of 0 on the depr. scale)

IV) Effect size calculation • Beta-weight () is already an effect size statistic, though not perfect • f2 (see Aiken & West, 1991, p. 157)

Calculating f2 Squared multiple correlation resulting from combined prediction of Y by the additive set of predictors (A) and their interaction (I) (= full model) Squared multiple correlation resulting from prediction by set A only (= model without interaction term) • In words: f2 gives you the proportion of systematic variance accounted for by the interaction relative to the unexplained variance in the criterion • Conventions by Cohen (1988) • f2= .02: small effect • f2= .15: medium effect • f2= .26: large effect

Example  small to medium effect

Case 2: continuous  categorical variable interaction (on continous DV) • Ficticious example • X: Body height (continuous) • Y: Life satisfaction (continuous) • M: Gender (categorical: male vs. female) • Does effect of body height on life satisfaction depend on gender? Our hypothesis: body height is more important for life satisfaction in males

Advanced organizer for Case 2 • I) Coding issues • II) Estimating the solution using dummy coding • Unstandardized solution • Standardized solution • III) Estimating the solution using unweighted effects coding • (Unstandardized solution) • Standardized solution • IV) What if there are more than two levels on categorical scale? • V) Inclusion of control variables • VI) Effect size calculation

Descriptives

I) Coding options • Dummy coding (0;1): • Allows to compare the effects of X on Y between the reference group (d=0) and the other group(s) (d=1) • Definitely preferred, if you are interested in the specific regression weights for each group • Unweighted effects coding (-1;+1): yields unweighted mean effect of X on Y across groups • Preferred, if you are interested in overall mean effect (e.g., when inserting M as a nonfocal variable); all groups are viewed in comparison to the unweighted mean effect across groups • Results are directly comparable with ANOVA results when you have 2 or more categorical variables • Weighted effects coding: takes also into account sample size of groups • Similar to unweighted effects coding except that the size of each group is taken into consideration • useful for representative panel analyses • Dummy coding (0;1): • Allows to compare the effects of X on Y between the reference group (d=0) and the other group(s) (d=1) • Definitely preferred, if you are interested in the specific regression weights for each group • Unweighted effects coding (-1;+1): yields unweighted mean effect of X on Y across groups • Preferred, if you are interested in overall mean effect (e.g., when inserting M as a nonfocal variable); all groups are viewed in comparison to the unweighted mean effect across groups • Results are directly comparable with ANOVA results when you have 2 or more categorical variables • Weighted effects coding: takes also into account sample size of groups • Similar to unweighted effects coding except that the size of each group is taken into consideration • useful for representative panel analyses

II) Estimating the unstandardized solution using dummy coding • Unstandardized solution • Dummy-code M (0=reference group; 1=comparison group) • Center X  cX • Compute product of cX and M • Regress Y on cX, M, and cX*M

SPSS Syntax *Create dummy coding. IF (gender=0) genderd = 0 . IF (gender=1) genderd = 1 . *center height (on grand mean) and compute interaction term. DESC var=height. COMPUTE heightc =height - 173 . *Compute product term. COMPUTE genderd.heightc = genderd*heightc. *Regress lifesat on heightc and genderd, adding the interaction term. REGRESSION /DEPENDENT lifesat /METHOD=ENTER heightc genderd /METHOD=ENTER genderd.heightc.

SPSS output b0 b1 b2 b3

Workshop Moderated Regression Analysis

Workshop Moderated Regression Analysis

Presentation Transcript

Regression analysis

Regression Analysis

Regression Analysis Simple Regression

Regression Analysis

Regression Analysis

Regression Analysis

Regression Analysis

Regression Analysis

Regression analysis

Regression Analysis

Moderated Multiple Regression

Regression Analysis

Moderated Multiple Regression

Regression Analysis

Regression Analysis

Regression Analysis

Regression Analysis

Regression Analysis

Regression Analysis:

Regression Analysis Simple Regression

Regression analysis

Moderated Multiple Regression