1 / 17

Analysis of Variance with two factors

Analysis of Variance with two factors. (Session 13). Learning Objectives. At the end of this session, you will be able to understand and interpret the components of a linear model with two categorical factors fit a model involving two factors, interpret the output and present the results

slone
Download Presentation

Analysis of Variance with two factors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of Variance with two factors (Session 13)

  2. Learning Objectives At the end of this session, you will be able to • understand and interpret the components of a linear model with two categorical factors • fit a model involving two factors, interpret the output and present the results • understand the difference between raw means and adjusted means • appreciate that a residual analysis is the same with more complex models

  3. Using Paddy again! In the paddy example, there were two categorical factors, variety and village. Here we will look at a model including both factors and the corresponding output. We will also discuss assumptions associated with anova models with categorical factors and procedures to check these assumptions.

  4. A model using two factors Objective here is to compare paddy yields across the 3 varieties and also across villages. A linear model for this takes the form: yij =0+ vi + gj + ij Here 0represents a constant, and the gj (i=1,2,3) represent the variety effect as before. We also have the term vi (i=1,2,3,4) to represent the village effect.

  5. Anova results Above is a two-way anova since there are two factors explaining the variability in paddy yields. Again the Residual M.S. (s2) = 0.3318 describes the variation not explained by village and variety.

  6. Sample sizes --------+-----------------------+------- |Variety| Village | New Old Trad | Total --------+-----------------------+------- KESEN | 0 3 4 | 7 NANDA | 2 7 5 | 14 NIKO | 0 2 3 | 5 SABEY | 2 5 3 | 10 --------+-----------------------+------- Total | 4 17 15 | 36 --------+-----------------------+------- Above shows data is not balanced. Hence need to worry about the order of fitting terms. How then should we interpret the sequential S.S.’s shown in slide 5 anova?

  7. Anova with adjusted SS and MS How may the above results be interpreted? What are your conclusions?

  8. Model estimates What do these results tell us?

  9. Relating estimates to means This is similar to the case with one categorical factor – can make comparisons easily with the “base” level using model estimates. But when sample sizes are unequal across the two categorical factors, results should be reported in terms of adjusted means!

  10. Raw means and adjusted means Model based summaries (adjusted means):

  11. Computing adjusted means The model equation yij =0 + vi + gj + ij can be used to find the variety adjusted means e.g. adjusted mean for traditional variety is: = 5.284+0.25[0+0.718–0.179+0.633]–2.614 = 2.963 Thus the variety adjusted mean is an average over the 4 villages.

  12. Checking model assumptions Anova model with two categorical factors is: yij =0+ gi + vj + ij Model assumptions are associated with the ij. These are checked in exactly the same way as before. A residual analysis is done, looking at plots of residuals in various ways. We give below a residual analysis for the model fitted above.

  13. Histogram to check normality Histogram of standardised residuals after fitting a model of yield on village and variety.

  14. A normal probability plot… Another check on the normality assumption Do you think the points follow a straight line?

  15. Std. residuals versus fitted values Checking assumption of variance homogeneity, and identification of outliers: What can you say here about the variance homogeneity assumption?

  16. Finally… know your software Different software packages impose different constraints on model parameters so need to be aware what this is. For example, Stata and Genstat set the first level of the factor to zero. SPSS and SAS set the last level to zero. Minitab imposes a constraint that sets the sum of the parameter estimates to zero! Check also whether the software produces sequential or adjusted or some other form of sums of squares. The correct interpretation of anova results would depend on this.

  17. Practical work follows to ensure learning objectives are achieved…

More Related