1 / 68

F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE

F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE. regression: x is a quantitative explanatory variable. type is a qualitative variable (a factor). Illustration. Company 1: 36 28 32 43 30 21 33 37 26 34 Company 2: 26 21 31 29 27 35 23 33 Company 3:

Download Presentation

F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. F73DA2 INTRODUCTORY DATA ANALYSIS ANALYSIS OF VARIANCE

  2. regression: x is a quantitative explanatory variable

  3. type is a qualitative variable (a factor)

  4. Illustration Company 1: 36 28 32 43 30 21 33 37 26 34 Company 2: 26 21 31 29 27 35 23 33 Company 3: 39 28 45 37 21 49 34 38 44

  5. Explanatory variable qualitative i.e. categorical - a factor • Analysis of variance • linear models for comparative experiments

  6. Using Factor Commands • The display is different if “type” is declared as a factor.

  7. We could check for significant differences between two companies using t tests. • t.test(company1,company2) • This calculates a 95% Confidence Interval for difference between means

  8. Includes 0 so no significant difference

  9. Instead use an analysis of variance technique

  10. Taking all the results together

  11. Taking all the results together We calculate the total variation for the system which is the sum of squares of individual values – 32.59259

  12. We can also work out the sum of squares within each company This sums to 1114.431

  13. The total sum of squares of the situation must be made up of a contribution from variation WITHIN the companies and variation BETWEEN the companies. • This means that the variation between the companies equals 356.0884

  14. This can all be shown in an analysis of variance table which has the format:

  15. Using the R package, the command is similar to that for linear regression

  16. Theory Data: yij is the jth observation using treatment i Model: where the errors ij are i.i.d. N(0,s2)

  17. The response variables Yij are independent Yij ~ N(µ + τi , σ2) Constraint:

  18. Derivation of least-squares estimators

  19. The fitted values are the treatment means

  20. Partitioning the observed total variation SSB SST SSRES SST = SSB + SSRES

  21. The following results hold

  22. Back to the example

  23. Fitted values: Company 1: 320/10 = 32 Company 2: 225/8 = 28.125 Company 3: 335/9 = 37.222 Residuals: Company 1: 1j = y1j- 32 Company 2: 2j= y2j- 28.125 Company 3: 3j = y3j - 37.222

  24. SST = 30152 – 8802/27 = 1470.52 SSB = (3202/10 + 2252/8 + 3352/9) – 8802/27 = 356.09 SSRES = 1470.52 – 356.09 = 1114.43

  25. ANOVA table Source of Degrees of Sum Mean F variation freedom of squares squares Between 2 356.09 178.04 3.83 treatments Residual 24 1114.43 46.44 Total 26 1470.52

  26. Testing H0 : τi= 0 , i = 1,2,3 v H1 : not H0 (i.e. τi 0 for at least one i) Under H0, F = 3.83 on 2,24 df. P-value = P(F2,24 > 3.83) = 0.036 so we can reject H0 at levels of testing down to 3.6%.

  27. Conclusion Results differ among the three companies (P-value 3.6%)

  28. The fit of the model can be investigated by examining the residuals: the residual for response yij is this is just the difference between the response and its fitted value (the appropriate sample mean).

  29. Plotting the residuals in various ways may reveal ● a pattern (e.g. lack of randomness, suggesting that an additional, uncontrolled factor is present) ● non-normality (a transformation may help) ● heteroscedasticity (error variance differs among treatments – for example it may increase with treatment mean: again a transformation – perhaps log - may be required)

  30. In this example, samples are small, but one might question the validity of the assumptions of normality (Company 2) and homoscedasticity (equality of variances, Company 2 v Companies 1/3).

  31. plot(residuals(lm(company~type))~ fitted.values(lm(company~type)),pch=8)

  32. plot(residuals(lm(company~type))~ fitted.values(lm(company~type)),pch=8) • abline(h=0,lty=2)

  33. It is also possible to compare with an analysis using “type” as a qualitative explanatory variable • type=c(rep(1,10),rep(2,8),rep(3,9)) • No “factor” command

  34. Note low R2 The equation is company = 27.666+2.510 x type

More Related