One-Factor Experiments & ANCOVA

One-Factor Experiments & ANCOVA Group 3 Jesse Colton;JunyanSong;Kan He; LijuanKang;MinqinChen; XiaotongLi; Xin Li ; YaqiXue

Outline: History and Introduction Model and Overall F Test Theoretical Background Do ANCOVA by Hand ANCOVA ANOVA Pairwise Test for Group Means Check Assumptions ANOVA Linear Model and Tests Do ANCOVA by SAS

What Is ANCOVA?

Definition • ANOVA stands for Analysis Of Variance. • ANCOVA stands for Analysis Of Covariance. • ANCOVA uses aspects of ANOVA and Linear Regression to compare samples to each other, when there are outside variables involved • “One-Factor Experiment” means we are testing an experiment using only one single treatment factor.

History Like many of the important topics in statistical analysis, elements of ANOVA/ANCOVA come from works of R.A. Fisher, and some from Francis Galton

History Ronald Aylmer Fisher 1890-1962 • British Statistician, Eugenicist, Evolutionary Biologist & Geneticist • Fisher“pioneered the principles of the design of experiments and elaborated his studies of analysis of variance.”(Wikipedia) • He also developed the method of maximum likelihood, and is known for “Fisher’s exact test”

History Sir Francis Galton 1822-1922 • Established the concept of correlation • He “invented the use of the regression line and was the first to describe and explain the common phenomenon of regression toward the mean.”(Wikipedia)

Uses • ANOVA is used to compare the means of two or more groups. • ANCOVA is used in situations where another variable effects the experiment. • While we normally use the T-test for two group means, there are many situations where it is not applicable or as useful. • More than 2 samples • Samples with additional variables • Other factors leading to skewed experimental results

Uses • When conducting an experiment, there is often an initial difference between test groups. • ANCOVA “provides a way of measuring and removing the effects of such initial systematic differences between the samples.” (http://vassarstats.net/textbook/ch17pt2.html) • If you only compare the means, you are not taking into account any previous advantages one group may have

Uses Example: Two methods of teaching a topic are tested on two different groups (A and B). However, in the preliminary data collected, group A is shown to have a higher IQ than group B. The fact that group A had a higher score after learning by one method does not prove the method is better. ANCOVA seeks to eliminate the difference between the groups before the experiment in order test which method is better.

Uses • By merging ANOVA with Linear Regression, ANCOVA controls for the effects that the covariates we are not studying may have on the outcomes ANOVA Linear Regression ANCOVA

Aims of ‘ANOVA’ Models • Linear models with continuous response and one or more categorical predictors • Description: -relation between response variable (Y) and predictor (X) variable(s) • Explanation: - How much of variation in Y explained by different sources of variation (factors or combination of factors)

Completely Randomized Designs • Experimental designs where there is no restriction on random allocation of experimental/sampling units to groups or treatments - single factor and factorial designs

Single factor model Completely randomized design

Terminology • Factor (categorical predictor variable): - usually designed factor A • Number of observations within each group: -ni • Each observation: - y

Data layout

Estimating Model Parameters

Estimating Model Parameters LeastSquare(LS)Estimate

Estimating Model Parameters

Analysis of Variance • Testthehypothesis

Analysis of Variance

Analysis of Variance Test Statistical

Analysis of Variance

Unequal sample sizes • Sums of squares equations provided only work for equal sample sizes - can be modified for unequal samples sizes but very clumsy -model comparison approach simpler (and used by statistical software)

Unequal sample sizes • F-ratio tests less reliable if sample sizes are different, especially if variances also different - bigger difference in sample sizes, less reliable tests become • Use equal or similar sample sizes if possible • But don’t omit data to balance sample sizes!

Anova— Multiple Comparisons of Means Reject , where a is the # of groups Not all means are equal. But which means are significantly different from each other? We need a more detailed comparison! Making multiple test

Anova— Multiple Comparisons of Means Making multiple test Test All Pairwise equality Hypotheses Number of Pairs: Using two sided t-test at level α: Reject if where = is the number of group i, is the mean of the observed value of group i, .

Anova— Multiple Comparisons of Means Least Significant Difference (LSD): The critical value, that the difference must exceed in order to be significant at level .

Anova— Multiple Comparisons of Means Familywise Error Rate (FWE): Type I error probability of declaring at least one pairwise difference to be falsely significant. FWE=P{Reject at least one true null hypothesis} If each test is done at level , then FWE will exceed . Why?

Anova— Multiple Comparisons of Means Let denote rejecting the true null hypothesis in test, where total number of test is k=. P( ) = = type I error. FWE=P( ) =P( ) If is independent to each other, FWE=k*P( )= k Our goal is to control FWE .

Anova— Multiple Comparisons of Means Two Methods: • Bonferroni Method. • Tukey Method.

Anova— Multiple Comparisons of Means Bonferroni Method • Idea: To perform k tests simultaneously, divide the FWE α among the k tests. If the error rate is allocated equally among the k tests, then each test is done at level α/k. For example: α=0.05 and k=10 each test: 0.05/10=0.005

Anova— Multiple Comparisons of Means Bonferroni Method • Test: At FWE= , we reject if where

Anova— Multiple Comparisons of Means Tukey Method At FWE= , we reject if where,

Dummy Variable: A Dummy Variable is an artificial variable created to represent an attribute with two or more distinct categories/levels. How to create a Dummy Variable: The number of dummy variables necessary to represent a single attribute variable is equal to the number of levels(categories)(k) in that variable minus one. (k-1)

Gender: Male & Female Rank: Assistant & Associate & Full

ANOVA Models(A Multiple Regression with all categorical predictors): General Linear Model: Dummy Variables

?Relationship between these Models: constraint

Note: is the Grand Mean, but in the last case it is the mean of Group 3. , is different from those in the last case.

The Interpretation differs depending on which constraint we apply. : Group one mean-Group three mean : Group two mean-Group three mean :Group one mean –Group two mean

?How do we test ANOVA in terms of General Linear Model • Overall F-Test H0: H0: Recall Test for Multiple Regression Coeffcient: Reduced Model: Full Model: P: numbers of parameters in H0. * p=a-1

Recall Test for ANOVA in terms of Model General Linear Model We reject H0 when So the Overall Test of ANOVA for both models are consistent.

2. Test for individual regression coefficient(Pairwise Test for Group Means) H0 differs depending on different coding of the Dummy Variables. For Example: H0: T test F test: Full Model: Reduced Model:

ANCOVA Models(A Multiple Regression with continuous predictors and dummy coded factors) Continuous Dummy Variables Variables

Overall Test for ANCOVA in terms of Linear Model: H0: H0:

One-Factor Experiments & ANCOVA