130 likes | 142 Views
ANOVA: Analysis of Variance. Xuhua Xia xxia@uottawa.ca http:// dambe.bio.uottawa.ca. Review of t-test. Parametric Pair-sample t-test: t.test (x1, x2, paired=TRUE ) Unpaired two-sample t-test assuming equal variance: t.test (x1, x2, var.equal=TRUE )
E N D
ANOVA: Analysis of Variance Xuhua Xia xxia@uottawa.ca http://dambe.bio.uottawa.ca
Review of t-test • Parametric • Pair-sample t-test: t.test(x1, x2, paired=TRUE) • Unpaired two-sample t-test • assuming equal variance: t.test(x1, x2, var.equal=TRUE) • when the two variances are not equal (Always do a non-parametric test and use the results of the more sensitive test): t.test(x1, x2) • Consequence of violating the assumption • Nonparametric Man-Whitney-Wilcoxon test (Ensure that x is a 'factor'): wilcox.test(y~x,data=myDat,paired=T|F) • Test equality of variancevar.test(x1,x2)p <- 2*pf(Varsmall/Varlarge,DFsmall,DFlarge)Alternative: rank the variables and perform a regular t-test) • Equivalent methods in EXCEL
Review of Standard Error (SE) • SE is also called standard deviation of means • Illustration: • Generate 50 random variables (50 columns of data, with 200 values each) in EXCEL (normal distribution, mean = 10, Std = 5) • Compute means, variance and SE for each of the 50 columns: SE1 = sqrt(var1/200), SE2= sqrt(var2/200)… • Compute the standard deviation from the 50 means. This is designated as andisexpectedtobethesameSE1, SE2, …
Ronald A. Fisher (1890-1962) Head of the statistics Division at the Rothamsted Experimental Station in Hertfordshire. One of the three founders of theoretical population genetics. Developer of statistical methods, especially the likelihood methods. Published The Genetical Theory of Natural Selection in 1930, in which he proposed the fundamental theory of natural selection. ANOVA was mainly developed by Ronald A. Fisher The F statistic was named after him. “To call in a statistician after the experiment is done may be no more than asking him to perform a postmortem examination; he may be able to say what the experiment died of.”
One-way ANOVA Model xij = + i + ij vs. xij = + ij Is this effect zero? This is the same model for t-test, except that the subscript i is 1 and 2 in t-test, but 1, 2, ..., n in one-way ANOVA
ANOVA Rationale • The essence of ANOVA is to partition the total variation into its components. • Suppose we have three groups (e.g., Control plus two treatment), each with N1=N2=N3=200 test animals. Given the null hypothesis that all three groups do not differ from each other, i.e., they all represent random samples from the same underlying population, we can estimate the population variance in three ways: • From all 600 animals: Var = Total SS/DF • From individual groups: SS1/DF1, SS2/DF2, SS3/DF3VarwithinGroup = (SS1+SS2+SS3)/(DF1+DF2+DF3) • From the three group means: M1, M2, M3 and the grand mean M: SE = sqrt{[(M1-M)2+ (M2-M)2+ (M3-M)2]/2}VarbetweenGroup = SE2*200 = [N1*(M1-M)2+ N2*(M2-M)2+ N3(M3-M)2]/2 • Given the null hypothesis, VarwithinGroup = VarbetweenGroup. So ANOVA is an F-test of the two variances. • In ANOVA termination, VarwithinGroup is MSError and VarbetweenGroup is MSModel.
One-way experimental design Low-fat foodMedium-fat foodHigh-fat food Weight 0 4 8 gain 2 6 10
Numerical Illustration of One-Way ANOVA Assignment: Repeat the ANOVA computation by first replacing 10 in the High-fat food group by two values 9 and 20. Submit this slide with all updated values. Name: ID:
ANOVA Table Dependent variable: Weight Gain Source DF SS MS F p Model 2 64.0 32.0 16.0 0.0251 Error 3 6.0 2.0 Total 5 70.0 The null hypothesis H0: X1 = X2 = X3 is rejected. The three kinds of food differ significantly in their effect on weight gain of rabbits. In particular, Medium-fat and High-fat foods are significantly better than Low-fat food. However, Medium-fat and High-fat foods do not differ in their effect on rabbit weight gain.
ANOVA and t-test • Parametric: • aov(DV~IV1+IV2+… • aov(DV~IV1+IV2+IV1:IV2) or aov(DV~IV1*IV2) • Contrast ANOVA and t-test by using • Mercury2Gr_A.txt and Mercury2Gr_B.txt (same data in two different format, one for t.test and one for aov: • DarwinPlantBreeding_A.txt and DarwinPlantBreeding_B.txt (Ensure that the variable Speies is a factor • Nonparametric: • One-way ANOVA: kruskal.test(DV~IV) • Randomized block design: friedman.test(y~A+B) • Others: • summary(fit) print(model.tables(fit,"means"),digits=3) • boxplot(DV~IV)
Randomized complete blocks Which of the six strains of clover has the highest protein content? The experimenter divided his field into 5 relatively homogenous blocks each with 6 plots, and randomly assigned his 6 strains to the 6 plots within each block. After harvesting, he determined the nitrogen content for each strain in each plot. 1 5 4 2 3 6 6 2 3 1 4 5 5 4 6 2 2 1 3 3 5 6 3 4 1 1 4 5 2 6
R functions md<-read.table("RandCompleteBlock.txt",header=T) attach(md) fit<-aov(Yield~Block+Variety) summary(fit) anova(fit) TukeyHSD(fit) $Block diff lwrupr p adj B2-B1 -1.216667 -4.773553 2.3402194 0.8415528 B3-B1 -1.666667 -5.223553 1.8902194 0.6333566 B4-B1 -4.233333 -7.790219 -0.6764472 0.0149154 B5-B1 -7.200000 -10.756886 -3.6431139 0.0000569 B3-B2 -0.450000 -4.006886 3.1068861 0.9952717 ... $Variety diff lwrupr p adj 3dok13-3dok1 -15.56 -19.65283761 -11.46716239 0.0000000 3dok4-3dok1 -14.18 -18.27283761 -10.08716239 0.0000000 3dok5-3dok1 -4.84 -8.93283761 -0.74716239 0.0148040 3dok7-3dok1 -8.90 -12.99283761 -4.80716239 0.0000160 compos-3dok1 -10.12 -14.21283761 -6.02716239 0.0000024 3dok4-3dok13 1.38 -2.71283761 5.47283761 0.8913398 3dok5-3dok13 10.72 6.62716239 14.81283761 0.0000010 3dok7-3dok13 6.66 2.56716239 10.75283761 0.0006551 ...
Example • A researcher needs to assess the effect of 3 drugs on reduce appetite. Appetite reduction is measured by inter-meal interval (in minutes). The half-life of the drugs is about 3 days. Seven human subjects differ in age, gender, appetite, degree of obesity and potentially many other ways. If the researcher randomly allocates these seven subjects into three groups, then some groups may contain young subjects than others or more males than others, etc., so that any group differences would be confounded by potentially many other factors. • He decided to use randomized complete block design and administer the drugs on Monday in three consecutive weeks. For each subject, he randomized the three drugs into the three Mondays (top right), took an index of appetite, and obtained the data table (bottom right) • Using test subjects as blocks is also called repeated measures ANOVA or within-subject ANOVA • Assignment A: analyze the data and report the effect size and the result of the significance test (in short, what you want to include in a manuscript)