200 likes | 370 Views
Lecture 8: Multiple Linear Regression Interpretation with different types of predictors. BMTRY 701 Biostatistical Methods II. Interaction. AKA effect modification Allows there to be a different association between two variables for differing levels of a third variable.
E N D
Lecture 8:Multiple Linear RegressionInterpretation with different types of predictors BMTRY 701Biostatistical Methods II
Interaction • AKA effect modification • Allows there to be a different association between two variables for differing levels of a third variable. • Example: In the model with length of stay as an outcome, is there an interaction between medschool and nurse? • Note that ‘adjustment’ is a rather weak form of accounting for a variable. • Allowing an interaction allows much greater flexibility in the model
Interactions • Interactions can be formed between • two continous variables • a binary and a continuous variable • two binary variables • a binary variable and a categorical variable with >2 categories. • Etc. • Three-way interaction: interaction between 3 variables • Four-way, etc.
How does this differ from the model without the interaction? Without the adjustment? • Model 1: • Model 2: • Model 3:
Model 1 > plot(data$INFRISK, data$logLOS, xlab="Infection Risk, %", ylab="Length of Stay, days", pch=16, cex=1.5) > > # Model 1: > reg1 <- lm(logLOS ~ INFRISK, data=data) > abline(reg1, lwd=2) > summary(reg1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.93250 0.04794 40.310 < 2e-16 *** INFRISK 0.07293 0.01053 6.929 2.92e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1494 on 111 degrees of freedom Multiple R-Squared: 0.302, Adjusted R-squared: 0.2957 F-statistic: 48.02 on 1 and 111 DF, p-value: 2.918e-10
Model 2 > reg2 <- lm(logLOS ~ INFRISK + ms, data=data) > infriski <- seq(1,8,0.1) > beta <- reg2$coefficients > yhat0 <- beta[1] + beta[2]*infriski > yhat1 <- beta[1] + beta[2]*infriski + beta[3] > lines(infriski, yhat0, lwd=2, col=2) > lines(infriski, yhat1, lwd=2, col=2) > summary(reg2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.94449 0.04709 41.295 < 2e-16 *** INFRISK 0.06677 0.01058 6.313 5.91e-09 *** ms 0.09882 0.03949 2.503 0.0138 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1459 on 110 degrees of freedom Multiple R-Squared: 0.3396, Adjusted R-squared: 0.3276 F-statistic: 28.28 on 2 and 110 DF, p-value: 1.232e-10
Model 3 > # Model 3: > reg3 <- lm(logLOS ~ INFRISK + ms + ms:INFRISK, data=data) > infriski <- seq(1,8,0.1) > beta <- reg3$coefficients > yhat0 <- beta[1] + beta[2]*infriski > yhat1 <- beta[1] + beta[3] + (beta[2]+beta[4])*infriski > lines(infriski, yhat0, lwd=2, col=4) > lines(infriski, yhat1, lwd=2, col=4) > summary(reg3) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.947942 0.049698 39.195 < 2e-16 *** INFRISK 0.065950 0.011220 5.878 4.6e-08 *** ms 0.059514 0.178622 0.333 0.740 INFRISK:ms 0.007856 0.034807 0.226 0.822 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1466 on 109 degrees of freedom Multiple R-Squared: 0.3399, Adjusted R-squared: 0.3217 F-statistic: 18.71 on 3 and 109 DF, p-value: 7.35e-10
Conclusions • There does not appear to be an interaction between MEDSCHOOL and INFRISK • Both MEDSCHOOL and INFISK are associated with log(LOS), in the presence of each other • the association between INFRISK and log(LOS) is positive: for a 1% increase in infection risk, logLOS is expected to increase by 0.07, adjusting for Med School affiliation • Hospitals with Med School affiliation tend to have longer average length of stay, adjusting for infection risk
Interactions with continuous variables • How to interpret with continuous variables? • Example: Difference between two hospitals with a 1% difference in INFRISK
Interaction with continuous variables > reg4 <- lm(logLOS ~ INFRISK*NURSE, data=data) > summary(reg4) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.067e+00 6.642e-02 31.120 < 2e-16 *** INFRISK 3.164e-02 1.586e-02 1.995 0.04853 * NURSE -1.025e-03 4.657e-04 -2.201 0.02986 * INFRISK:NURSE 2.696e-04 9.727e-05 2.771 0.00657 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1427 on 109 degrees of freedom Multiple R-Squared: 0.3739, Adjusted R-squared: 0.3567 F-statistic: 21.7 on 3 and 109 DF, p-value: 4.284e-11
Interactions between categorical variables • Simple with two binary variables • More complicated to keep track of when more than two categories in one or more variable\ • Example: REGION and MEDSCHOOL • Question: Is there an interaction between REGION and MEDSCHOOL in regards to logLOS? • That is: does the association between MEDSCHOOL and logLOS differ by REGION?
Regression Results > reg5 <- lm(logLOS ~ factor(REGION)*ms, data=data) > > summary(reg5) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.359927 0.030715 76.833 < 2e-16 *** factor(REGION)2 -0.122926 0.042560 -2.888 0.0047 ** factor(REGION)3 -0.163065 0.039769 -4.100 8.16e-05 *** factor(REGION)4 -0.299316 0.049933 -5.994 2.92e-08 *** ms 0.125486 0.072685 1.726 0.0872 . factor(REGION)2:ms -0.007176 0.096181 -0.075 0.9407 factor(REGION)3:ms 0.033145 0.114691 0.289 0.7732 factor(REGION)4:ms 0.082734 0.132974 0.622 0.5352 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1473 on 105 degrees of freedom Multiple R-Squared: 0.3578, Adjusted R-squared: 0.3149 F-statistic: 8.356 on 7 and 105 DF, p-value: 4.356e-08
Association between MS and REGION table(data$REGION, data$ms) 0 1 1 23 5 = 18% 2 25 7 = 22% 3 34 3 = 8% 4 14 2 = 13%