120 likes | 282 Views
Tutorial 6. Thursday February 21 MBP 1010 Kevin Brown. Linear Regression. Requires you to define?. Y – independent variable X – dependent variable(s). Allows you to answer what questions?. Is there an association (same question as the Pearson correlation coefficient)
E N D
Tutorial 6 Thursday February 21 MBP 1010 Kevin Brown
Requires you to define? • Y – independent variable • X – dependent variable(s)
Allows you to answer what questions? • Is there an association (same question as the Pearson correlation coefficient) • What is the association? Measured as the slope.
Assumes • Linearity • Constant residual variance (homoscedasticity) / residuals normal • Errors are independent (i.e. not clustered)
Outputs “estimates” • intercept • slope • standard errors • t values • p-values • residual standard error (SSE – what is this?) • R2
Linear regression example: height vs. weight Extract information: > summary(lm(HW[,2] ~ HW[,1])) Call: lm(formula = HW[, 2] ~ HW[, 1]) Residuals: Min 1Q Median 3Q Max -36.490 -10.297 3.426 9.156 37.385 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.860 18.304 -0.156 0.876 HW[, 1] 42.090 9.449 4.454 5.02e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 Residual standard error: 16.12 on 48 degrees of freedom Multiple R-squared: 0.2925, Adjusted R-squared: 0.2777 F-statistic: 19.84 on 1 and 38 DF, p-value: 5.022e-05
Linear regression example: height vs. weight Extract information: > summary(lm(HW[,2] ~ HW[,1])) Call: lm(formula = HW[, 2] ~ HW[, 1]) Residuals: Min 1Q Median 3Q Max -36.490 -10.297 3.426 9.156 37.385 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -2.860 18.304 -0.156 0.876 HW[, 1] 42.090 9.449 4.454 5.02e-05 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 Residual standard error: 16.12 on 48 degrees of freedom Multiple R-squared: 0.2925, Adjusted R-squared: 0.2777 F-statistic: 19.84 on 1 and 38 DF, p-value: 5.022e-05
Example • Televisions, Physicians and Life Expectancy (World Almanac Factbook 1993) example • Residuals & Outliers • High leverage points & influential observations • Dummy variable coding • Transformations • Take home messages • Regression is a very flexible tool • correlation ≠ causation
Dummy coding • Creates an alternate variable that’s used for analysis • For 2 categories you set values of … • reference level to 0 • level of interest to 1
Do these treatments interact? Standard approach: ANOVA Interaction Treatment #2 Treatment #1