120 likes | 337 Views
BIOL 582. Lecture Set 8 Two-factor Models. BIOL 582. Disclaimer. We learned that ANOVA can be thought of as the comparison of errors between two models (one of which is the “ Null ” model) This is a useful way of thinking about ANOVA
E N D
BIOL 582 Lecture Set 8 Two-factor Models
BIOL 582 Disclaimer • We learned that ANOVA can be thought of as the comparison of errors between two models (one of which is the “Null” model) • This is a useful way of thinking about ANOVA • In fact, from this point on, this course is a severe departure from how different ANOVA models are typically presented • Most courses/texts use always-changing definitions of how to calculate sums of squares. • Here is a link for how one person (like many) approaches multi-factor ANOVA link • We will not worry about formulas; rather we will worry about defining models and “sub-models”, and calculated sums of squares from comparisons of SSE
BIOL 582 Two-factor Model Set-up • Often, biological research is concerned with multiple factors that might explain the variation of a response variable • Consider the pupfish data. There are two factors: sex and population • A two-factor ANOVA is one which allows for the comparison of relative strengths of each factor to explain response variation • As we will see, these factors are additive – they are decomposed parts of a larger factor • First, consider this linear equation • Which has the model • That produces error
BIOL 582 Two-factor Model Set-up • It is easy to see all possible “sub-models” (reduced models) of the full model. They are shown here in terms of decreasing complexity • Imagine that for every model, the SSE can be obtained easily (from residuals of predictions made by estimated model parameters). There are four sets of SSE from the four different models • From model containing: both factors A factor only B factor only only the intercept • All models contain an intercept
BIOL 582 Two-factor Model Set-up • This is where understanding linear models makes this a whole lot easier! Let’s concern ourselves with factor A first. What is the Sums of squares between levels of factor A (with respect to factor B)? • There are two ways to do this! • Type I SS (Sequential) • Consider the null hypothesis, * , which means factor A has no effect • Then it should be true that is another way to say the same thing, because if factor A is meaningless, there would be no improvement over the null model if we included it. Thus, is a measure of model improvement because of factor A. • Likewise, is a measure of improvement because of factor B • * Actually, it is more appropriate to state the null that the effect, α, is equal to 0, as it is a parameter that has some real value, estimated from the observed data. SSA is not a parameter (although it contributes to a population parameter, σ2, among population means).
BIOL 582 Two-factor Model Set-up • This is where understanding linear models makes this a whole lot easier! Let’s concern ourselves with factor A first. What is the Sums of squares between levels of factor A (with respect to factor B)? • There are two ways to do this! • Type III SS (Weighted) • Consider the null hypothesis, , which means factor A has no effect • Then it should be true that is another way to say the same thing, because if factor A is meaningless, there would be no change in model error by excluding it. Thus, is a measure of model detriment by excluding factor A, and this detriment is, therefore, the effect of Factor A. • Likewise, is a measure of the effect of of factor B
BIOL 582 Two-factor Model Set-up • This is where understanding linear models makes this a whole lot easier! Let’s concern ourselves with factor A first. What is the Sums of squares between levels of factor A (with respect to factor B)? • There are two ways to do this! • What, no type II? • There are actually ~6 types of SS. Some (IV-VI) concern missing data. Type II will be explained but it requires factor interactions, before its distinction from type I and type III is apparent. • Types IV-VI will not be discussed (beyond scope of this class)
BIOL 582 Two-factor Model Hypotheses M means “model” – it refers to all effects of the model In general, it is better to use variance in the null hypothesis statement, as effects might contain several parameters. Plus, the alternative hypothesis when using variance is always a one-tail result.
BIOL 582 Two-factor Model Uses and Assumptions • There is not much good use for a two-factor ANOVA except to introduce how to use ANOVA with multiple factors. The next step (next lecture) will be to understand factor interactions. Many research designs use multiple factors with interactions. • Assumptions include • Normally distributed residuals (not data) • Homoscedasticity • Independent observations (i.e., sample sizes don’t contain multiple measurements on the same subjects; different samples or treatments do not contain the same subjects) • These are the assumptions of Linear Models!
BIOL 582 Two-factor Model Evaluation • Summary of ANOVA for two factors (excluding interactions of factors) • Type I (Sequential) – values in blue only necessary for F distribution-determination of P-values. • Type III (Weighted) • k is the number of parameters (coefficients) needed for the effect
BIOL 582 Two-factor Model Evaluation • Example from pupfish-parasite data in R (ignore AIC values for now) > lm.sex.pop<-lm(log.grubs~SEX+POPULATION) > > anova(lm.sex.pop) # Type I SS Analysis of Variance Table Response: log.grubs Df Sum Sq Mean Sq F value Pr(>F) SEX 1 15.554 15.5543 9.4775 0.002685 ** POPULATION 1 1.176 1.1762 0.7167 0.399264 Residuals 100 164.119 1.6412 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 > > > drop1(lm.sex.pop,test="F") # Type III SS Single term deletions Model: log.grubs ~ SEX + POPULATION Df Sum of Sq RSS AIC F value Pr(F) <none> 164.12 53.984 SEX 1 16.6425 180.76 61.932 10.1405 0.001934 ** POPULATION 1 1.1762 165.29 52.719 0.7167 0.399264 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1
BIOL 582 Multiple Comparisons • Whether one uses type I or type III sums of squares is not an issue, because the SSE (called RSS in R) of the full model is the same. • Multiple comparison tests like Tukey’s HSD use the SSE of the full model to calculate standard error. • However, multiple comparisons with a two-factor model are generally unenlightening • Example from pupfish-parasite data in R > pop<-factor(POPULATION) > sex<-factor(SEX) > > aov.two.factor<-aov(log.grubs~sex+pop) > > aov.two.factor Call: aov(formula = log.grubs ~ sex + pop) Terms: sex pop Residuals Sum of Squares 15.55428 1.17617 164.11883 Deg. of Freedom 1 1 100 Residual standard error: 1.281089 Estimated effects may be unbalanced > TukeyHSD(aov.two.factor) Tukey multiple comparisons of means 95% family-wise confidence level Fit: aov(formula = log.grubs ~ sex + pop) $sex diff lwr upr p adj M-F 0.7938817 0.2822641 1.305499 0.002685 $pop diff lwr upr p adj 2-1 0.2101225 -0.2919093 0.7121544 0.4083022