1 / 27

Topic 7: Analysis of Variance

Topic 7: Analysis of Variance. Outline. Partitioning sums of squares Breakdown degrees of freedom Expected mean squares (EMS) F test ANOVA table General linear test Pearson Correlation / R 2. Analysis of Variance. Organize results arithmetically Total sum of squares in Y is

betsy
Download Presentation

Topic 7: Analysis of Variance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topic 7: Analysis of Variance

  2. Outline • Partitioning sums of squares • Breakdown degrees of freedom • Expected mean squares (EMS) • F test • ANOVA table • General linear test • Pearson Correlation / R2

  3. Analysis of Variance • Organize results arithmetically • Total sum of squares in Y is • Partition this into two sources • Model (explained by regression) • Error (unexplained / residual)

  4. Total Sum of Squares • MST is the usual estimate of the variance of Y if there are no explanatory variables • SAS uses the term Corrected Total for this source • Uncorrected is ΣYi2 • The “corrected” means that we subtract of the mean before squaring

  5. Model Sum of Squares • dfR = 1 (due to the addition of the slope) • MSR = SSR/dfR • KNNL uses regressionfor what SAS calls model • So SSR (KNNL) is the same as SS Model

  6. Error Sum of Squares • dfE = n-2 (both slope and intercept) • MSE = SSE/dfE • MSE is an estimate of the variance of Y taking into account (or conditioning on) the explanatory variable(s) • MSE=s2

  7. ANOVA Table Source df SS MS Regression 1 SSR/dfR Error n-2 SSE/dfE ________________________________ Total n-1 SSTO/dfT

  8. Expected Mean Squares • MSR, MSE are random variables • When H0 : β1 = 0 is true E(MSR) =E(MSE)

  9. F test • F*=MSR/MSE ~ F(dfR, dfE) = F(1, n-2) • See KNNL pgs 69-71 • When H0: β1=0 is false, MSR tends to be larger than MSE • We reject H0 when F is large If F* F(1-α, dfR, dfE) = F(.95, 1, n-2) • In practice we use P-values

  10. F test • When H0: β1=0 is false, F has a noncentralF distribution • This can be used to calculate power • Recall t* = b1/s(b1) tests H0 : β1=0 • It can be shown that (t*)2 = F* (pg 71) • Two approaches give same P-value

  11. ANOVA Table Source df SS MS F P Model 1 SSM MSM MSM/MSE 0.## Error n-2 SSE MSE Total n-1 **Note: Model instead of Regression used here. More similar to SAS

  12. Examples • Tower of Pisa study (n=13 cases) proc reg data=a1; model lean=year; run; • Toluca lot size study (n=25 cases) proc reg data=toluca; model hours=lotsize; run;

  13. Pisa Output

  14. Pisa Output (30.07)2=904.2 (rounding error)

  15. Toluca Output

  16. Toluca Output (10.29)2=105.88

  17. General Linear Test • A different view of the same problem • We want to compare two models • Yi = β0 + β1Xi + ei (full model) • Yi = β0 + ei (reduced model) • Compare two models using the error sum of squares…better model will have “smaller” mean square error

  18. General Linear Test • Let SSE(F) = SSE for full model SSE(R) = SSE for reduced model • Compare with F(1-α,dfR-dfF,dfF)

  19. Simple Linear Regression • dfR=n-1, dfF=n-2, • dfR-dfF=1 • F=(SSTO-SSE)/MSE=SSR/MSE • Same test as before • This approach is more general

  20. Pearson Correlation • r is the usual correlation coefficient • It is a number between –1 and +1 and measures the strength of the linear relationship between two variables

  21. Pearson Correlation • Notice that • Test H0: β1=0 similar to H0: ρ=0

  22. R2 and r2 • Ratio of explained and total variation

  23. R2 and r2 • We use R2 when the number of explanatory variables is arbitrary (simple and multiple regression) • r2=R2only for simple regression • R2 is often multiplied by 100 and thereby expressed as a percent

  24. R2 and r2 • R2 always increases when additional explanatory variables are added to the model • Adjusted R2 “penalizes” larger models • Doesn’t necessarily get larger

  25. Pisa Output R-Square 0.9880 (SAS) = SSM/SSTO = 15804/15997 = 0.9879

  26. Toluca Output R-Square 0.8215 (SAS) = SSM/SSTO = 252378/307203 = 0.8215

  27. Background Reading • May find 2.10 and 2.11 interesting • 2.10 provides cautionary remarks • Will discuss these as they arise • 2.11 discusses bivariate Normal dist • Similarities and differences • Confidence interval for r • Program topic7.sas has the code to generate the ANOVA output • Read Chapter 3

More Related