E N D
Analysis of Covariance The idea of analysis of covariance or ANCOVA is very similar to that of ANOVA. We wish to look for possible differences among group means BUT in ANCOVA we have one added consideration: we have some additional variable we want to “control for”, hold constant, or account for in our analysis. This additional variable is known as the covariate, and we will denote it as X. In ANCOVA we have a covariate X that we’d like to remove from having an influence on our outcome. If we could hold constant the values of X for our subjects, we would have a clearer picture of the differences on our outcome Y.
Analysis of Covariance Suppose we are considering again our motor-skill study with three types of practice. It may be that our motor skill outcome is related to the level of experience of the athletes. We’d like to “hold constant” the level of experience to see whether the differences among practice types are enhanced (or perhaps reduced) when all athletes effectively have the same level of experience. One way to exert such control (before collecting data) is to really hold the covariate constant by forcing all subjects to have the same value of the covariate.
Analysis of Covariance This is often called a “design control” because we are controlling something by the way we design our study. If we think we’d like to control for experience of athletes, we could select only athletes with, say, 5 years of participation in the sport of interest. Another related design control is blocking -- in blocking we can select athletes with different levels of experience, but we would have to assign equal numbers of them to each treatment group. However, both of these can be very hard to accomplish – both require that we know a lot of information about our potential subjects BEFORE we select them.
Analysis of Covariance ANCOVA allows us to control the covariate statistically. This analysis technique gives us results that allow us to estimate what the group means on our outcome WOULD HAVE BEEN if the groups had the same means (or “were equivalent”) on the covariate. These so-called “adjusted means” remove the effects of X from the Y scores of all of our subjects. By now you must realize that the language we are using here is very similar to the words used to talk about regression results – indeed we use a form of regression to do the ANCOVA.
Analysis of Covariance: Selecting X There are several considerations to keep in mind when selecting a potential covariate. 1) First, we the covariate X should be linearly related to the outcome Y, and we sometimes hope (or expect) that the groups of interest will show mean differences on the covariate (though that is not a requirement). 2) If there is a treatment involved, we also have to know that the treatment did not affect X and similarly that X did not affect the treatment. So, for instance, if subjects are assigned to treatment groups on the basis of a variable, that variable would not be a good covariate.
Analysis of Covariance: Selecting X 3) A third key assumption of the ANCOVA is that X relates to Y in exactly the same way for all of the groups in our analysis. Specifically we can say that there should be “no covariate X group interaction”. Some books will say there should be no “Treatment x Covariate” interaction, but ANCOVA is often used when the grouping variable or factor is not a treatment at all, but rather some kind of status variable like gender or location, e.g., urbanicity in the NELS data.
Analysis of Covariance: Selecting X • We have spent a good deal of time discussing interactions and how to detect them. • We have visual techniques, specifically we can use “Set markers by” in SPSS so we can plot regression lines for several groups at once). • Also we have learned to compute interaction variables using dummy variables and to test their significance statistically. • Now we will learn one more (and one quicker) way to test interactions using the SPSS General Linear Models menu.
Analysis of Covariance: Selecting X Another way of saying that there is no covariate X group interaction is to say that the slopes of X (as a predictor of Y) must be equal in all of the groups or levels of the factor. We may write, for k groups that 1 = 2 = … = k = W W is the common “within-groups slope” of the Y on X regression. This is the slope we will estimate in the ANCOVA model. So we need to look at that model.
Analysis of Covariance: The Model The ANCOVA model is Yij = m + aj + bw (Xij - mX) + eij Yij The outcome score of the ith person in the jth group Xij The covariate score of the ith person in the jth group m The grand mean of Y in the population aj The jth treatment effect in the population aj = (mj - m), with the covariate X held constant …PLUS
Analysis of Covariance: The Model The model is Yij = m + aj + bw (Xij - mX) + eij bw The slope of the covariate in the population, which is the predicted change in Y given one unit change in X, with group membership held constant, and eij The residual or unexplained variance for person i in group j The w label on b represents the Within-group slope, that is, all groups must share the same population slope for X predicting Y. Also note that bw is multiplied not only by the X score, but the deviation of the score from the mean of X, mX.
Analysis of Covariance: The Model In the ANCOVA we use several sample quantities ni Number of subjects in group j of k groups n Total number of subjects in the study (the total sample size, n = S nj Mean outcome score for the jth group Mean covariate score for the jth group Mean outcome across the entire sample Mean covariate across the entire sample
Analysis of Covariance: The Model Also we use Adjusted mean Y score for the jth group (adjusted for the covariate X) = - bw ( - ) Note that our book calls the adjusted mean aj The jth sample treatment effect, with X held constant aj =
Analysis of Covariance: The Assumptions • We have already discussed three assumptions (X relates linearly to Y, there is no covariate by groups interaction (allbj = bw , if bj is the slope in group j), and the factor does not influence X). However as usual we also have assumptions about the errors. • We will assume that the errors • are independent, • are normally distributed, and • have homogeneous variances across groups. • Thus we will want to use Levene's test for the variance equality assumption, and we can check normality using the residuals histogram as we did in the ANOVA.
Analysis of Covariance: The Analysis In the ANCOVA the first step is to test whether the covariate has the same slope for each group in the factor (or factors in a multi-way ANCOVA). One of the key assumptions of ANCOVA is that this interaction does not exist, so we must first run a model that is NOT the ANCOVA model to be sure the model will be appropriate. We may begin by plotting the slopes for the different groups in a scatterplot. Then we will work through an example using the SPSS GLM menu.
Analysis of Covariance: The Analysis Suppose we are studying principal leadership using the NELS school data set. We are investigating the presence of location effects (g10urban is our factor) with SES as a covariate. Here are the slopes for the 3 locations in a scatterplot. They are not parallel but it is hard to tell if they are really that different by simple examination of the plot.
Analysis of Covariance: The Analysis Next we will run the model with factor g10urban, the covariate and the interaction. To do this we cannot let SPSS think for us! We will use the GLM menu and click g10urban as the fixed factor and click f1ses into the covariate box. Also we MUST click the model button to specify that we want to test the interaction. If we do not SPSS will run the ANCOVA model (with no group X covariate interaction). We will first check the assumption of equal variances.
Analysis of Covariance: The Analysis We are happy to see that for this model Levene’s test tells us that the error variances appear to be equal across groups. Next we will inspect the full source table to see if the interaction of f1ses and g10urban is significant.
Analysis of Covariance: The Analysis Now we can see that the interaction (g10urban * f1ses) is not significant. We can remove the interaction and run the ANCOVA model.
Analysis of Covariance: The Analysis The general form of the source table for this analysis is Source df SS MS F _________________________________________________________________ Between groups k-1 SSB SSB/(k-1) MSB/MSW Covariate c SSX SSX/(c-1) MSX/MSW GroupxCovariate c(k-1) SSInt SSInt/(k-1) MSInt/MSW Within groups n-k(c+1) SSWithin SSW/(n-k) _________________________________________________________________ Total n-1 SSTotal SST = (n-1) S2Y = SSB + SSX + SSInt + SSW where S2Y is the usual variance for the whole data set.
Analysis of Covariance: The Analysis Rather than explore this model further we will return to SPSS and eliminate the interaction. we can just select the “full factorial” option on the model button or we can keep using the custom model but omit the interaction term. Again here is Levene's test, now for the ANCOVA model:
Analysis of Covariance: The Analysis Here is the source table for the ANCOVA model. Both the factor and the covariate are significant.
Analysis of Covariance: The Analysis The general form of the source table for the ANCOVA is Source df SS MS F _________________________________________________________________ Between groups k-1 SSB SSB/(k-1) MSB/MSW Covariate c SSX SSX/(c-1) MSX/MSW Within groups n-k-c SSWithin SSW/(n-k-c) _________________________________________________________________ Total n-1 SSTotal SST = (n-1) S2Y = SSB + SSX + SSInt + SSW where S2Y is the usual variance for the whole data set.
Analysis of Covariance: The Analysis In running the ANCOVA we need to get some other statistics. If we get both descriptive statistics (via the options button) and estimated means (using the windows at the TOP of the options button window), we will see the difference between the raw and adjusted means. Also to get the slope of the covariate we need to select “parameter estimates” on the options button. The estimated means will have a footnote telling us that they are adjusted, and also telling us what the mean SES value is for our sample. We will only look at the marginal means for the g10urban factor (not overall).
Analysis of Covariance: The Analysis The urban and suburban prinlead adjusted means are slightly lower but the rural adjusted mean is higher than the raw mean.
Analysis of Covariance: The Analysis Here we can draw the urban and suburban prinlead adjusted means on the plot of the raw means (from an ANOVA run). + + Adjusted means ___ + +
Analysis of Covariance: The Analysis Next we need the slope and the estimated model. SPSS gives us: The slope for SES is clear, but what are the other values? These are the slopes we would get if we ran a regression with dummy variables for "urban" and "suburban" schools.
Analysis of Covariance: The Analysis So the estimated model from SPSS is p i = -.933 + 1.436 (f1ses) + 2.007 (urban) + .571 (suburban) Later we will see how we can use this model to get the adjusted means for the three groups.
Analysis of Covariance: The Analysis Here's proof. Top is from regression and below is ANCOVA
Analysis of Covariance: The Analysis Here are the slopes (regression above and ANCOVA below).
Analysis of Covariance: The Analysis Finally recall that the estimated model is = -.933 + 1.436 (f1ses) + 2.007 (urban) + .571 (suburban) We will use this model to do a check on the adjusted means. To do so we substitute for f1ses in the equation the value of the mean ses score. That is = .049. The equation gives us, for rural schools for rural = -.933 + 1.436 (.049) = -.933 + .07 = -.863 This mean is shown in the estimated means on slide 24.
Analysis of Covariance: The Analysis Last we had better check the residuals for normality. We have to use the Save button and save residuals to get this plot. These residuals look pretty normal. Of course we are also assuming that we have the correct model here. From our earlier analyses we may guess that some predictors are missing (e.g., tchcomm), so we may have more work to do to get the best model!