1 / 29

Lecture 9: ANOVA tables F-tests

Lecture 9: ANOVA tables F-tests. BMTRY 701 Biostatistical Methods II. ANOVA. Analysis of Variance Similar in derivation to ANOVA that is generalization of two-sample t-test Partitioning of variance into several parts that due to the ‘model’: SSR that due to ‘error’: SSE

kenley
Download Presentation

Lecture 9: ANOVA tables F-tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 9:ANOVA tablesF-tests BMTRY 701Biostatistical Methods II

  2. ANOVA • Analysis of Variance • Similar in derivation to ANOVA that is generalization of two-sample t-test • Partitioning of variance into several parts • that due to the ‘model’: SSR • that due to ‘error’: SSE • The sum of the two parts is the total sum of squares: SST

  3. Total Deviations:

  4. Regression Deviations:

  5. Error Deviations:

  6. Definitions

  7. Example: logLOS ~ BEDS > ybar <- mean(data$logLOS) > yhati <- reg$fitted.values > sst <- sum((data$logLOS- ybar)^2) > ssr <- sum((yhati - ybar )^2) > sse <- sum((data$logLOS - yhati)^2) > > sst [1] 3.547454 > ssr [1] 0.6401715 > sse [1] 2.907282 > sse+ssr [1] 3.547454 >

  8. Degrees of Freedom • Degrees of freedom for SST: n - 1 • one df is lost because it is used to estimate mean Y • Degrees of freedom for SSR: 1 • only one df because all estimates are based on same fitted regression line • Degrees of freedom for SSE: n - 2 • two lost due to estimating regression line (slope and intercept)

  9. Mean Squares • “Scaled” version of Sum of Squares • Mean Square = SS/df • MSR = SSR/1 • MSE = SSE/(n-2) • Notes: • mean squares are not additive! That is, MSR + MSE ≠SST/(n-1) • MSE is the same as we saw previously

  10. Standard ANOVA Table

  11. ANOVA for logLOS ~ BEDS > anova(reg) Analysis of Variance Table Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 *** Residuals 111 2.90728 0.02619

  12. Inference? • What is of interest and how do we interpret? • We’d like to know if BEDS is related to logLOS. • How do we do that using ANOVA table? • We need to know the expected value of the MSR and MSE:

  13. Implications • mean of sampling distribution of MSE is σ2regardless of whether or not β1= 0 • If β1= 0, E(MSE) = E(MSR) • If β1≠ 0, E(MSE) < E(MSR) • To test significance of β1, we can test if MSR and MSE are of the same magnitude.

  14. F-test • Derived naturally from the arguments just made • Hypotheses: • H0: β1= 0 • H1:β1≠ 0 • Test statistic: F* = MSR/MSE • Based on earlier argument we expect F* >1 if H1 is true. • Implies one-sided test.

  15. F-test • The distribution of F under the null has two sets of degrees of freedom (df) • numerator degrees of freedom • denominator degrees of freedom • These correspond to the df as shown in the ANOVA table • numerator df = 1 • denominator df = n-2 • Test is based on

  16. Implementing the F-test • The decision rule • If F* > F(1-α; 1, n-2), then reject Ho • If F* ≤ F(1-α; 1, n-2), then fail to reject Ho

  17. F-distributions

  18. ANOVA for logLOS ~ BEDS > anova(reg) Analysis of Variance Table Response: logLOS Df Sum Sq Mean Sq F value Pr(>F) BEDS 1 0.64017 0.64017 24.442 2.737e-06 *** Residuals 111 2.90728 0.02619 > qf(0.95, 1, 111) [1] 3.926607 > 1-pf(24.44,1,111) [1] 2.739016e-06

  19. More interesting: MLR • You can test that several coefficients are zero at the same time • Otherwise, F-test gives the same result as a t-test • That is: for testing the significance of ONE covariate in a linear regression model, an F-test and a t-test give the same result: • H0: β1= 0 • H1:β1≠ 0

  20. general F testing approach • Previous seems simple • It is in this case, but can be generalized to be more useful • Imagine more general test: • Ho: small model • Ha: large model • Constraint: the small model must be ‘nested’ in the large model • That is, the small model must be a ‘subset’ of the large model

  21. Example of ‘nested’ models Model 1: Model 2: Model 3: Models 2 and 3 are nested in Model 1 Model 2 is not nested in Model 3 Model 3 is not nested in Model 2

  22. Testing: Models must be nested! • To test Model 1 vs. Model 2 • we are testing that β2 = 0 • Ho: β2 = 0 vs. Ha: β2 ≠ 0 • If β2 = 0 , then we conclude that Model 2 is superior to Model 1 • That is, if we reject the null hypothesis Model 1: Model 2:

  23. R reg1 <- lm(LOS ~ INFRISK + ms + NURSE + nurse2, data=data) reg2 <- lm(LOS ~ INFRISK + NURSE + nurse2, data=data) reg3 <- lm(LOS ~ INFRISK + ms, data=data) > anova(reg1) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.4043 8.115e-10 *** ms 1 12.897 12.897 5.0288 0.02697 * NURSE 1 1.097 1.097 0.4277 0.51449 nurse2 1 1.789 1.789 0.6976 0.40543 Residuals 108 276.981 2.565 ---

  24. R > anova(reg2) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 44.8865 9.507e-10 *** NURSE 1 8.212 8.212 3.1653 0.078 . nurse2 1 1.782 1.782 0.6870 0.409 Residuals 109 282.771 2.594 --- > anova(reg1, reg2) Analysis of Variance Table Model 1: LOS ~ INFRISK + ms + NURSE + nurse2 Model 2: LOS ~ INFRISK + NURSE + nurse2 Res.Df RSS Df Sum of Sq F Pr(>F) 1 108 276.981 2 109 282.771 -1 -5.789 2.2574 0.1359

  25. R > summary(reg1) Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.355e+00 5.266e-01 12.068 < 2e-16 *** INFRISK 6.289e-01 1.339e-01 4.696 7.86e-06 *** ms 7.829e-01 5.211e-01 1.502 0.136 NURSE 4.136e-03 4.093e-03 1.010 0.315 nurse2 -5.676e-06 6.796e-06 -0.835 0.405 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.601 on 108 degrees of freedom Multiple R-squared: 0.3231, Adjusted R-squared: 0.2981 F-statistic: 12.89 on 4 and 108 DF, p-value: 1.298e-08 >

  26. Testing more than two covariates • To test Model 1 vs. Model 3 • we are testing that β3 = 0 AND β4 = 0 • Ho: β3 = β4 = 0 vs. Ha: β3 ≠ 0 or β4 ≠ 0 • If β3 = β4 = 0, then we conclude that Model 3 is superior to Model 1 • That is, if we reject the null hypothesis Model 1: Model 3:

  27. R > anova(reg3) Analysis of Variance Table Response: LOS Df Sum Sq Mean Sq F value Pr(>F) INFRISK 1 116.446 116.446 45.7683 6.724e-10 *** ms 1 12.897 12.897 5.0691 0.02634 * Residuals 110 279.867 2.544 --- > anova(reg1, reg3) Analysis of Variance Table Model 1: LOS ~ INFRISK + ms + NURSE + nurse2 Model 2: LOS ~ INFRISK + ms Res.Df RSS Df Sum of Sq F Pr(>F) 1 108 276.981 2 110 279.867 -2 -2.886 0.5627 0.5713

  28. R > summary(reg3) Call: lm(formula = LOS ~ INFRISK + ms, data = data) Residuals: Min 1Q Median 3Q Max -2.9037 -0.8739 -0.1142 0.5965 8.5568 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.4547 0.5146 12.542 <2e-16 *** INFRISK 0.6998 0.1156 6.054 2e-08 *** ms 0.9717 0.4316 2.251 0.0263 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.595 on 110 degrees of freedom Multiple R-squared: 0.3161, Adjusted R-squared: 0.3036 F-statistic: 25.42 on 2 and 110 DF, p-value: 8.42e-10

  29. Testing multiple coefficients simultaneously • Region: it is a ‘factor’ variable with 4 categories

More Related