1 / 20

Lecture 8: Multiple Linear Regression Interpretation with different types of predictors

Lecture 8: Multiple Linear Regression Interpretation with different types of predictors. BMTRY 701 Biostatistical Methods II. Interaction. AKA effect modification Allows there to be a different association between two variables for differing levels of a third variable.

ezra-flores
Download Presentation

Lecture 8: Multiple Linear Regression Interpretation with different types of predictors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 8:Multiple Linear RegressionInterpretation with different types of predictors BMTRY 701Biostatistical Methods II

  2. Interaction • AKA effect modification • Allows there to be a different association between two variables for differing levels of a third variable. • Example: In the model with length of stay as an outcome, is there an interaction between medschool and nurse? • Note that ‘adjustment’ is a rather weak form of accounting for a variable. • Allowing an interaction allows much greater flexibility in the model

  3. Interactions • Interactions can be formed between • two continous variables • a binary and a continuous variable • two binary variables • a binary variable and a categorical variable with >2 categories. • Etc. • Three-way interaction: interaction between 3 variables • Four-way, etc.

  4. Example: log(LOS) ~ INFRISK*MS

  5. How does this differ from the model without the interaction? Without the adjustment? • Model 1: • Model 2: • Model 3:

  6. Model 1 > plot(data$INFRISK, data$logLOS, xlab="Infection Risk, %", ylab="Length of Stay, days", pch=16, cex=1.5) > > # Model 1: > reg1 <- lm(logLOS ~ INFRISK, data=data) > abline(reg1, lwd=2) > summary(reg1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.93250 0.04794 40.310 < 2e-16 *** INFRISK 0.07293 0.01053 6.929 2.92e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1494 on 111 degrees of freedom Multiple R-Squared: 0.302, Adjusted R-squared: 0.2957 F-statistic: 48.02 on 1 and 111 DF, p-value: 2.918e-10

  7. Model 1

  8. Model 2 > reg2 <- lm(logLOS ~ INFRISK + ms, data=data) > infriski <- seq(1,8,0.1) > beta <- reg2$coefficients > yhat0 <- beta[1] + beta[2]*infriski > yhat1 <- beta[1] + beta[2]*infriski + beta[3] > lines(infriski, yhat0, lwd=2, col=2) > lines(infriski, yhat1, lwd=2, col=2) > summary(reg2) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.94449 0.04709 41.295 < 2e-16 *** INFRISK 0.06677 0.01058 6.313 5.91e-09 *** ms 0.09882 0.03949 2.503 0.0138 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1459 on 110 degrees of freedom Multiple R-Squared: 0.3396, Adjusted R-squared: 0.3276 F-statistic: 28.28 on 2 and 110 DF, p-value: 1.232e-10

  9. Model 2

  10. Model 3 > # Model 3: > reg3 <- lm(logLOS ~ INFRISK + ms + ms:INFRISK, data=data) > infriski <- seq(1,8,0.1) > beta <- reg3$coefficients > yhat0 <- beta[1] + beta[2]*infriski > yhat1 <- beta[1] + beta[3] + (beta[2]+beta[4])*infriski > lines(infriski, yhat0, lwd=2, col=4) > lines(infriski, yhat1, lwd=2, col=4) > summary(reg3) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.947942 0.049698 39.195 < 2e-16 *** INFRISK 0.065950 0.011220 5.878 4.6e-08 *** ms 0.059514 0.178622 0.333 0.740 INFRISK:ms 0.007856 0.034807 0.226 0.822 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1466 on 109 degrees of freedom Multiple R-Squared: 0.3399, Adjusted R-squared: 0.3217 F-statistic: 18.71 on 3 and 109 DF, p-value: 7.35e-10

  11. Model 3

  12. Conclusions • There does not appear to be an interaction between MEDSCHOOL and INFRISK • Both MEDSCHOOL and INFISK are associated with log(LOS), in the presence of each other • the association between INFRISK and log(LOS) is positive: for a 1% increase in infection risk, logLOS is expected to increase by 0.07, adjusting for Med School affiliation • Hospitals with Med School affiliation tend to have longer average length of stay, adjusting for infection risk

  13. Interactions with continuous variables • How to interpret with continuous variables? • Example: Difference between two hospitals with a 1% difference in INFRISK

  14. Interaction with continuous variables > reg4 <- lm(logLOS ~ INFRISK*NURSE, data=data) > summary(reg4) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.067e+00 6.642e-02 31.120 < 2e-16 *** INFRISK 3.164e-02 1.586e-02 1.995 0.04853 * NURSE -1.025e-03 4.657e-04 -2.201 0.02986 * INFRISK:NURSE 2.696e-04 9.727e-05 2.771 0.00657 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1427 on 109 degrees of freedom Multiple R-Squared: 0.3739, Adjusted R-squared: 0.3567 F-statistic: 21.7 on 3 and 109 DF, p-value: 4.284e-11

  15. Interaction interpretation

  16. Interactions between categorical variables • Simple with two binary variables • More complicated to keep track of when more than two categories in one or more variable\ • Example: REGION and MEDSCHOOL • Question: Is there an interaction between REGION and MEDSCHOOL in regards to logLOS? • That is: does the association between MEDSCHOOL and logLOS differ by REGION?

  17. Interpreting coefficients

  18. Regression Results > reg5 <- lm(logLOS ~ factor(REGION)*ms, data=data) > > summary(reg5) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.359927 0.030715 76.833 < 2e-16 *** factor(REGION)2 -0.122926 0.042560 -2.888 0.0047 ** factor(REGION)3 -0.163065 0.039769 -4.100 8.16e-05 *** factor(REGION)4 -0.299316 0.049933 -5.994 2.92e-08 *** ms 0.125486 0.072685 1.726 0.0872 . factor(REGION)2:ms -0.007176 0.096181 -0.075 0.9407 factor(REGION)3:ms 0.033145 0.114691 0.289 0.7732 factor(REGION)4:ms 0.082734 0.132974 0.622 0.5352 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1473 on 105 degrees of freedom Multiple R-Squared: 0.3578, Adjusted R-squared: 0.3149 F-statistic: 8.356 on 7 and 105 DF, p-value: 4.356e-08

  19. Association between MS and REGION table(data$REGION, data$ms) 0 1 1 23 5 = 18% 2 25 7 = 22% 3 34 3 = 8% 4 14 2 = 13%

More Related