Analysis of Covariance & the Randomized Blocks Design

PSY 4603 Research Methods Analysis of Covariance & the Randomized Blocks Design (Designs With A Control or Concomitant Variable)

Introduction • Completely randomized designs share a common problem: They are relatively deficient in power. • Fortunately, there are other ways to increase the sensitivity of an experiment. • Also, there are many situations in which it is not advisable nor feasible to test subjects more than once. • For these, the procedures I will describe in here are quite reasonable alternatives. • Both procedures we will consider depend on information that is usually available or collected before the start of the experiment. • This information, which we will call a control variableor a concomitant variable, measures some characteristic of individuals that is reasonably correlated with the dependent variable.

The difference between the two procedures I will discuss is when and how this information is incorporated into the experimental design and the statistical analysis. • One procedure uses this information to assign subjects to the experimental conditions. • Subjects are segregated into homogeneous blocks, based on their scores on the concomitant variable. • If the variable is grade-point average (GPA), there will be blocks of subjects with high GPAs, moderate GPAs, low GPAs-the exact definition depending on the number of subjects available, the number of treatment conditions, and the relationship between the concomitant variable and the dependent variable. • The finer the grouping, the more closely matched will be the subjects within a block. • Once the blocks are formed, subjects within any given block are randomly assigned to the treatment conditions. • As a consequence of these procedures, the design becomes a factorial design, in which one factor is the manipulated variable and the other is blocks. • For these reasons, then, the design is variously referred to as a blocking design, a treatments x blocks design, or a randomized block design; I prefer to use the term RBD. • The increased sensitivity usually associated with this design is achieved by basing the error term on subsets of data that are more homogeneous (and less variable) than the undifferentiated groups created by randomly assigning subjects to treatments without regard for the concomitant variable. The final result of this procedure generally is a smaller error term and an increase in power over the single-factor alternative.

The other procedure achieves its increased sensitivity by statistical means rather than by transforming the experiment from a single-factor to a two-factor design. • Although measures on the concomitant variable are obtained before the start of the experiment, they are not used in the assignment of subjects to the treatment conditions, as they are in the blocking design. • Subjects are simply randomly assigned to the treatment groups in the usual manner, and the experiment remains a completely randomized single-factor design. • The statistical analysis does change, however. • More specifically, we use information about the relationship between the concomitant variable and the dependent variable to adjust for chance differences among the treatment groups and to refine our estimate of error variance. • The result of these operations, known as the analysis of covariance, is a smaller error term and a more sensitive and powerful experiment.

The Randomized Blocks Design • The blocking design provides a solution to problems of generalization by including more than one block of homogeneous subjects in the experiment. • Rather than drawing a single group of subjects from one ability level, the blocking design includes groups of homogeneous subjects drawn from two or more ability levels.

The Design and Analysis • Suppose we had a pool of 60 subjects available for an experiment and that there are k= 4 levels of the treatment factor (factor A). • If we were conducting a completely randomized single-factor experiment, we would randomly assign n= 15 subjects to each of the four treatment conditions. • On the basis of information available to us before the start of the experiment, let's assume that we can classify the 60 subjects into three blocks, each containing 20 subjects who are relatively homogeneous on the classification factor. • The blocking design is formed by assigning the subjects within each block to the four experimental conditions, as diagrammed in the upper portion of the table. (The blocking design appears on the right, and the corresponding single-factor design appears on the left.) • We can view the blocking design as consisting of three independent experiments, one containing subjects of high ability, say, a second containing subjects of medium ability, and a third containing subjects of low ability. • In each case, the subjects within each of these blocks are randomly assigned to the four treatment conditions.

Note that the design was constructed in two steps: first an initial grouping of subjects into blocks and then the random assignment of subjects within each block to the different conditions. • In essence, the original single-factor experiment has become a two-factor design, with factor A completely crossing with the blocking factor (factor B in this example). • The error term in the completely randomized experiment of MSS/A, reflects the variability of subjects from populations in which the blocking factor was allowed to vary without control. • In contrast, the error term for the blocking design, MSS/AB, reflects the variability of subjects from populations in which variation within the blocking factor was greatly restricted. • Additionally, any treatments x blocks interaction, which remains undetected in the completely randomized experiment, is isolated and removed from the error term in the blocking design.

The advantages of the blocking design are considerable. • Blocking helps to equate the treatment groups before the start of the experiment more effectively than is accomplished in the completely randomized design. • Moreover, the power is greater because of the smaller error term usually associated with the blocking design. • Additionally, the design allows an assessment of possible interactions between treatment effects and blocks. • If such an interaction is significant, we will know that the effects of the treatments do not generalize across the abilities or classification of subjects represented in the experiment. • If these interactions are not significant, we have achieved a certain degree of generalizability of the results. • I must also mention certain disadvantages of this type of design. • First, there is the cost of introducing the blocking factor. • Second, it may be difficult to find blocking factors that are highly correlated with the dependent variable used in the experiment. • Finally, we must be concerned with the possible loss of power when the blocking factor is poorly correlated with the dependent variable.

An Example …

Correlation Between the Blocking Variable & the Dependent Variable

An Example of the RBD … • It is important for salespeople to be knowledgeable about how people shop for certain products. Suppose that a new car salesperson believes that the sex of a car shopper affects the way he or she makes an offer on a car. He records the initial offers made on a $24,000 Ford Escape. However, he worries about the confounding influence of age. So, he records the age of the customer as well. What can he conclude?

With Blocking Variable - Age

Analysis of Covariance: An Overview • The primary function of an experimental design is to create a setting in which observations can be related to variations in treatments in an unambiguous and unequivocal manner. • The major problem, of course, is the unavoidable presence of error variance, which introduces uncertainty into the outcome of any experiment. • The completely randomized design minimizes systematic bias through the random assignment of subjects to treatments, but it does so at the expense of a relatively insensitive experiment-that is, a large error term. • The blocking design also relies on random assignment to minimize systematic bias, but it introduces the blocking factor, which permits a reduction in the size of the error term and an increase in the sensitivity of the experiment. • The analysis of covariance reduces experimental error by statistical, rather than by experimental, means. • Subjects are first measured on the concomitant variable, usually called the covariate in the context of the analysis of covariance, which consists of some relevant ability or characteristic. • Subjects are then randomly assigned to the treatment groups without regard for their scores on the covariate. • Only at the time of the statistical analysis does this information come into play, when it is used to accomplish two important adjustments: (1) to refine estimates of experimental error and (2) to adjust treatment effects for any differences between the treatment groups that existed before the experimental treatments were administered. • Because subjects were randomly assigned to the treatment conditions, we would expect to find relatively small differences among the treatments on the covariate and considerably larger differences on the covariate among the subjects within the different treatment conditions. • Thus, the analysis of covariance is expected to achieve its greatest benefits by reducing the size of the error term;any correction for preexisting differences produced through random assignment will be small by comparison.

The Basis For Statistical Control: Linear Regression • Linear regression consists of a statistical technique for establishing a linear function-a straight line-relating two variables. • Let's see how the linear function is established for a single set of scores-for example, the scores of subjects in one of the treatment conditions of an experiment-and then how it is used to reduce the variability of the subjects on the dependent variable. • The Linear Regression Equation Revisited • Suppose we let X be a score on the covariate and Y the corresponding score on the dependent variable. • The sign of the slope indicates the direction of the linear relationship. • A positive slope means that the two variables change in the same direction-Y increases as X increases or Y decreases as X decreases - whereas a negative slope means that the two variables change in opposite directions - Y increases as X decreases or Y decreases as X increases.

The Covariate (X) • The main criterion for a covariate is a substantial linear correlation with the dependent variable (Y). • In most cases, the scores on the covariate are obtained before the initiation of the experimental treatment. • There may be a formal pretest of some sort administered to all potential participants in the experiment, or the scores may be available from records of the subjects. • Achievement scores, IQ determinations, and grade-point averages are common examples. • Occasionally, the scores are gathered after the experiment is completed. • Such a procedure is defensible only when it is certain that the experimental treatment did not influence the covariate. • The analysis of covariance is predicated on the assumption that the covariate is independent of the experimental treatments. • Therefore, we should carefully scrutinize any covariate that is obtained following the end of the experiment.

Assumptions Underlying the Analysis of Covariance • The assumptions underlying the analysis of variance continue to apply in the corresponding analysis of covariance. • Several additional assumptions, which apply to the analysis of covariance in particular, are concerned with the nature of the regression between the covariate and the dependent variable. • The Assumption of Linear Regression. One of the assumptions associated with the analysis of covariance is that of linear regression. • The assumption is that the deviations from regression-that is, the residual scores-are normally and independently distributed in the population, with means of zero and homogeneous variances. Since these assumptions concerning the distribution of the residuals will generally not hold if the true regression is not linear, many refer to them as an assumption of linear regression. • What this means is that if linear regression is used in the analysis, whereas the true regression is of another form (for example, curvilinear), adjustments will not be of great benefit. • More important, however, we could question the meaning of the adjusted treatment means, which are also adjusted on the assumption of linear regression.

The Assumption of Homogeneous Group Regression Coefficients. The other assumption specifies homogeneity of regression coefficients for the different treatment populations. Although not obvious from the general formulas, the within-groups regression coefficient (bS/A), which is central to the analysis of covariance, is actually an average of the regression coefficients for each treatment group. • The current wisdom concerning this assumption suggests that the analysis of covariance is robust with regard to the homogeneity issue. • On the other hand, significant differences among the group regression coefficients mean that the effects of the independent variable must be interpreted with caution. • More specifically, what these differences mean is that an interaction is present between the subject characteristic chosen for the covariate and factor A and that we should determine how the magnitude of the effects of factor A depends on the different "levels" of this subject characteristic. It is important to note that this same interaction would be revealed in a corresponding blocking design by the treatments x blocks interaction. • If that interaction were significant, we would probably concentrate our efforts on analyzing the simple effects of the independent variable for the different levels of the blocking factor.

Choosing Between Blocking and the Analysis of Covariance • I began this discussion by drawing a distinction between two general methods for increasing the precision of an experiment, namely, direct (or experimental) methods and purely statistical procedures. • As a reminder, the direct method achieves control of error variance by isolating sources normally included in the error term of the completely randomized design and removing them in the statistical analysis. • In the blocking design, we introduce a blocking factor into an experiment-for example, IQ, reading-proficiency scores, socioeconomic status, and gender-and its inclusion permits us to remove from the error term the effects of the blocking factor and its interaction with the treatment variables. • Statistical control, as exemplified by the analysis of covariance, uses regression analysis to achieve the increase in precision. • Both procedures begin at the same point, namely, with measures on the covariate, but they differ on how this information is utilized in designing the experiment and then analyzing it. • What can we say about these two options? • The blocking design is simpler conceptually and requires fewer assumptions. • On the other hand, the analysis of covariance is easier to administer because the total pool of subjects does not need to be identified and measured on the X variable before the study can begin. This possibility may prove to be a major consideration for many researchers who schedule their subjects for individual sessions and cannot measure them on the X variable until they appear in the laboratory to serve in the experiment; for these researchers, then, the blocking design is simply not feasible. • Both methods increase precision and sensitivity, although they achieve this desirable goal differently. • In addition, both provide information on the interaction of the subject characteristic used as the X variable. • This information is obtained directly from the analysis of the blocking design in the form of a treatments x blocks interaction. • The analogous information in the analysis of covariance is reflected in a comparison of the slopes of the individual regression lines obtained separately for each group.

Suppose your main interest in using either approach is to increase the sensitivity of your experiment and that you can use either method to accomplish this goal. • An early study by Feldt (1958) suggested that the size of the correlation between the X and Y variables was the critical factor. • In a later investigation, Maxwell, Delaney, and Dill (1984) argue that what is most crucial is the form of the relationship between the two variables. • If the regression is not linear, blocking is the preferable procedure. • In most other cases, analysis of covariance is the method of choice.

Examples …

Correlation Between Covariate and Dependent Variable

Testing the Homogeneity of Slopes (Parallelism) Assumption

A Factorial ANCOVA Example … • As an example, consider a study on performance as a function of cigarette smoking. In that study subjects performed either a Pattern Recognition task, a Cognitive task, or a Driving Simulation task. The subjects were divided into three groups. One group (Active Smoking) smoked during or just before the task. A second group (Delayed Smoking) were smokers who had not smoked for three hours, and a third group (NonSmoking) was composed of NonSmokers. The dependent variable was the number of errors on the task. To make this suitable for an analysis of covariance I have added an additional (hypothetical) variable, which is the subject’s measured level of distractibility. (Higher distractibility scores indicate a greater ease at being distracted.) • The data are presented in the next table and represent a 3 X 3 factorial design with one covariate (Distractability).

A Factorial ANCOVA Example … Dataset

Correlation between Covariate & Dependent Variable

Notice that in this analysis we have a significant effect due to Task, which is uninteresting because the tasks were quite different and we would expect that some tasks lead to more errors than others. • We also have a Task*Group interaction, which was what we were seeking because it tells us that smoking makes a difference in certain kinds of situations (which require a lot of cognitive processing) but not in others.

Without the Covariate, notice that we did not have an overall effect due to Smoking Group. • Also, compare the MSerrors -- 107.834 (normal Factorial) vs. 71.539 (Factorial plus Covariate). When we look at our analysis of covariance, one of the first things we see is that MSerror (71.539) is about one-third smaller than it was in the analysis of variance. This is due to the fact that the covariate (Distractability [DS]) was able to explain much of the variability in Errors that had been left unexplained in the analysis of variance.

We could have also calculated partial eta-squared η2for the effects. These effect-size measures can be calculated as the difference between two R2values, divided by (1 - R2reduced).

Another experimenter might be interested in examining the effects of Smoking Group only for the Cognitive task. If we want to examine these simple effects, we would again modify our error term in some way. • This is necessary because we will be looking at Smoking Groups for only some of the data, and the covariate mean of the Cognitive task subjects may differ from the covariate mean for all subjects. • Probably the safest route here would be to run a separate analysis of covariance for only those subjects performing the cognitive task.

Analysis of Covariance & the Randomized Blocks Design