1 / 28

Lecture 6: Multiple Linear Regression Adjusted Variable Plots

Lecture 6: Multiple Linear Regression Adjusted Variable Plots. BMTRY 701 Biostatistical Methods II. Graphical Displays in MLR. No more one simple scatterplot: need to look at multiple pairs of variables “pairs” in R.

kail
Download Presentation

Lecture 6: Multiple Linear Regression Adjusted Variable Plots

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 6:Multiple Linear RegressionAdjusted Variable Plots BMTRY 701Biostatistical Methods II

  2. Graphical Displays in MLR • No more one simple scatterplot: need to look at multiple pairs of variables • “pairs” in R. • but, we can’t look at all covariates in regards to the way they enter the model • solution: adjusted variable plot • aka: partial regression plot

  3. Adjusted Variable Plots • Adjusted variable plots are useful in linear regression for outlier detection and for qualitative evaluation of the fit of a model. • With two covariates: Shows the association between X and Y adjusted for another variable, Z. • With more than two covariates: Shows the association between X and Y adjusted for many other covariates • In our example, association between logLOS and number of nurses, adjusted for number of beds

  4. Approach • Assume we want to look at the association of Y and X, adjusted for Z • Step 1: Regress Y on X and save residuals (res.xy) • Step 2: Regress Z on X and save residuals (res.xz) • Step 3: plot res.xy versus res.xz • Optional step 4: • perform regression of res.xy on res.xz • compare slope to that of MLR of Y on X and Z • MPV: section 4.2.4

  5. SENIC

  6. R pairs(~INFRISK+BEDS+logLOS, data=data, pch=16) # adjusted variable plot approach # look at the association between INFRISK and logLOS, # adjusting for BEDS reg.xy <- lm(logLOS ~ BEDS, data=data) res.xy <- reg.xy$residuals reg.xz <- lm(INFRISK ~ BEDS, data=data) res.xz <- reg.xz$residuals plot(res.xz, res.xy, pch=16) reg.res <- lm(res.xy ~ res.xz) abline(reg.res, lwd=2) reg.infrisk.beds <- lm(logLOS ~ BEDS + INFRISK, data=data)

  7. Why is this important or interesting? • It shows us the ‘adjusted’ relationship • it can help us determine if • it is an important variable (at all) • if another form of X is more appropriate • if the correlation is high vs. low after adjustment • we need to/want to adjust for this variable • It also informs us about why a variable ‘loses’ significance • Most important: check for non-linearity • Example: logLOS ~ NURSE

  8. What about BEDS and NURSE? # why NURSE is not associated, after adjustment for BEDS? reg.nurse <- lm(logLOS ~ NURSE, data=data) reg.nurse.beds <- lm(logLOS ~ NURSE + BEDS, data=data) reg.xy <- lm(logLOS ~ BEDS, data=data) res.xy <- reg.xy$residuals reg.xz <- lm(NURSE ~ BEDS, data=data) res.xz <- reg.xz$residuals plot(res.xz, res.xy, pch=16) reg.res <- lm(res.xy ~ res.xz) abline(reg.res, lwd=2)

  9. What about the other way around? ####################### # what about the other way? what about why BEDS is # assoc after adjustment for NURSE? reg.xy <- lm(logLOS ~ NURSE, data=data) res.xy <- reg.xy$residuals reg.xz <- lm(BEDS ~ NURSE, data=data) res.xz <- reg.xz$residuals plot(res.xz, res.xy, pch=16) reg.res <- lm(res.xy ~ res.xz) abline(reg.res, lwd=2) reg.nurse.beds <- lm(logLOS ~ NURSE + BEDS, data=data)

  10. Interpretation in MLR • “Adjusted for” • “Controlled for “ • “Holding all else constant” • In MLR, you need to include one of these phrases (or something like one of them) when interpreting a regression coefficient

  11. LOS ~ INFRISK + BEDS > reg.infrisk.beds <- lm(LOS ~ BEDS + INFRISK, data=data) > summary(reg.infrisk.beds) Call: lm(formula = LOS ~ BEDS + INFRISK, data = data) Residuals: Min 1Q Median 3Q Max -2.8624 -0.9904 -0.1996 0.6671 8.4219 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.2703521 0.5038751 12.444 < 2e-16 *** BEDS 0.0024747 0.0008236 3.005 0.00329 ** INFRISK 0.6323812 0.1184476 5.339 5.08e-07 *** ---

  12. Hard to interpret with so many decimal places! > data$beds100 <- data$BEDS/100 > reg.infrisk.beds <- lm(LOS ~ beds100 + INFRISK, data=data) > summary(reg.infrisk.beds) Call: lm(formula = LOS ~ beds100 + INFRISK, data = data) Residuals: Min 1Q Median 3Q Max -2.8624 -0.9904 -0.1996 0.6671 8.4219 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.27035 0.50388 12.444 < 2e-16 *** beds100 0.24747 0.08236 3.005 0.00329 ** INFRISK 0.63238 0.11845 5.339 5.08e-07 *** ---

  13. logLOS ~ INFRISK + BEDS > reg.infrisk.beds <- lm(logLOS ~ BEDS + INFRISK, data=data) > summary(reg.infrisk.beds) Call: lm(formula = logLOS ~ BEDS + INFRISK, data = data) Residuals: Min 1Q Median 3Q Max -0.314377 -0.079979 -0.008026 0.072108 0.580675 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.926e+00 4.611e-02 41.767 < 2e-16 *** BEDS 2.407e-04 7.538e-05 3.194 0.00183 ** INFRISK 6.048e-02 1.084e-02 5.579 1.75e-07 *** ---

  14. Hard to interpret with so many decimal places! > data$beds100 <- data$BEDS/100 > reg.infrisk.beds100 <- lm(logLOS ~ beds100 + INFRISK, data=data) > summary(reg.infrisk.beds100) Call: lm(formula = logLOS ~ beds100 + INFRISK, data = data) Residuals: Min 1Q Median 3Q Max -0.314377 -0.079979 -0.008026 0.072108 0.580675 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.926040 0.046114 41.767 < 2e-16 *** beds100 0.024075 0.007538 3.194 0.00183 ** INFRISK 0.060477 0.010840 5.579 1.75e-07 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.1435 on 110 degrees of freedom Multiple R-squared: 0.3612, Adjusted R-squared: 0.3496 F-statistic: 31.1 on 2 and 110 DF, p-value: 1.971e-11

  15. How to interpret? • Pick two values of BEDS • e.g. 100 to 200 • e.g. 400 to 500 • Estimate the difference in logLOS for each value • What do we plug in for INFRISK?

  16. How to interpret? • Remember that our inferences are “holding all else constant” • To compare two hospitals with the same INFRISK, it doesn’t matter what you put in (as long as it is the same)

  17. How to interpret? Comparing two hospitals whose number of beds differ by 100 and assuming the same infection risk in the two hospitals is the same, the ratio of average LOS in the two hospitals is 1.02 with the hospital with more beds having the longer stay.

  18. difference of 400 beds?

  19. When outcome is log transformed • interpretation of coefficients must be made as RATIOS instead of DIFFERENCES • Need to exponentiate the coefficient. • its interpretation is the ratio for a one-unit difference in the predictor.

  20. Why differences do not work • Consider comparing two hospitals with 400 and 300 beds:

  21. Why differences do not work • Consider comparing two hospitals with 800 and 700 beds:

  22. Results in the log scale

  23. Results on the “linear” scale: not huge differences

  24. Differences can be seen on a larger scale plot

More Related