Comprehensive Exam Review

Comprehensive Exam Review Click the LEFT mouse key ONCE to continue

Research and Program Evaluation Part 5 Click the LEFT mouse key ONCE to continue

Analyses of Differences

Recall that for purposes here, an analysis of difference involves at least one continuous variable and at least one discrete variable. In this context, the variable that is continuous is sometimes called the “dependent” variable, and the variable that is discrete is sometimes called the “independent” variable.

The purpose of the analysis is to investigate differences in the continuous variable as a function of the categories in the discrete variable. Think about this possibility for a while. Imagine that the same test was given to a group of people on many occasions, but on each occasion the test was administered, they had not taken the test previously.

Then, imagine that the mean for the test was computed for each occasion it was administered. If a graph was made of the various means for the group against the frequency of occurrence of the respective means, the result would be a normal distribution of the means (because the various factors affecting test performance would come together in different ways on different occasions).

theoretical distribution of sampling means f values of the means This very special distribution is known as the

In other words, there is 100% probability that the mean would fall between these lines on this graph f This distribution represents 100% of the possible means that the group might achieve on any occasion.

f 34% 34% 14% 14% 2% 2% Because this is a “normal distribution,” all of its mathematical properties are known. For example, it is symmetric about the mean and the standard area percentages under the curve are known.

This theoretical distribution can be grounded in reality if one assumption is accepted: That an observed mean (i.e., one from an actually administered test or measurement) is the mean of the theoretical distribution of sampling means. Generally, this assumption is presumed valid unless there is specific information that the assessment situation was something other than “normal.”

f 34% 34% the distance between the observed mean and these points is known. 14% 14% 2% 2% Observed Mean Once the test is given, the areas under the curve can be related to any mean if

f 34% 34% A “standard error” is a standard deviation of a theoretical distribution. 14% 14% 2% 2% -3SE_ X -2SE_ X -1SE_ X +1SE_ X +2SE_ X +3SE_ X Observed Mean These distances are known as the “Standard Error of the Mean.”

There is approximately a two-thirds chance (68%) that the mean for the group will fall between +/- one (1) standard error of the mean on any occasion. f 34% 34% 14% 14% 2% 2% -3SE_ X -2SE_ X -1SE_ X +1SE_ X +2SE_ X +3SE_ X

f 34% 34% 14% 14% 2% 2% -3SE_ X -2SE_ X -1SE_ X +1SE_ X +2SE_ X +3SE_ X Similarly, the probability, or likelihood, of the mean falling between +/- two (2) standard errors of the mean on any occasion is approximately 96%, and so on.

Two of the more useful statements that can be made are: There is a 95% probability that the mean will fall between +/- 1.96 standard errors of the mean on any occasion. There is a 99% probability that the mean will fall between +/- 2.58 standard errors of the mean on any occasion.

99% 95% +/-1.96 SE_ X +/-2.58 SE_ X These “confidence limits” look like this on the theoretical distribution of sampling means.

Now assume a situation in which the same thing is measured (i.e., using the same test or measure) on two different occasions for the same group (a la pre-post testing). If the group did not have exactly the same mean for each testing occasion, there was a difference between the means.

That difference happened either because something caused the difference or by chance. The important question is, “What is the likelihood (i.e., probability) that the difference happened simply by chance?”

95% PRE +/- 1.96 SE_ X Graphically (at the .05 level), it looks like this: The question is, “Is the post mean inside or outside of the 95% confidence limits for the pre mean?”

Non-Significant Significant Significant PRE Confidence limits at +/- 1.96 SE_ X What a statistically significant difference looks like graphically...

The t-test is a statistical significance test that covers this situation. The t-test is used to determine if there is a statistically significant difference between only two means. The t-test is appropriate for use when data from 30 or fewer subjects are being analyzed. The t-test is sometimes referred to as the “Student’s t-test.”

There are two types of t-tests. A dependent, or correlated, t-test is used when the difference between the means of the same group assessed on two occasions is being evaluated (e.g., pre-post). An independent, or uncorrelated, t-test is used when the difference between the means of two separate groups is being evaluated (e.g., males and females).

A t-test yields a statistic called a t value. Computer programs generating the t value also present the (exact) probability of obtaining a t value of that magnitude. The (exact) probability calculated for the t value is compared to the (pre-determined) alpha level for the analysis.

For the t-test, it was noted that the discrete variable (i.e., the one that has categories) is sometimes called the “independent” variable. The discrete variable is also known as a “factor.” It is important to remember that this is a different and distinct use from “factor analysis,” which was a type of analysis of relationships.

In the context of analyses of differences, a factor is a variable that is discrete (i.e., has categories) and is sometimes called the independent variable. In the context of analyses of differences, the categories of a factor are called “levels.” In some ways, again this is a poor choice of words because “levels” implies some type of hierarchy - but that’s not really what it means in this context.

Suppose “gender” as a (discrete or independent) variable is included in a study. In the study, gender would be a “factor” having two “levels” (i.e., male and female). Remember that levels = categories; no hierarchy is necessarily applicable. A t-test would be the appropriate analysis for a study having only one factor that has two levels.

The levels (categories) of the factor may be “uncorrelated” (e.g., gender) or “correlated” (e.g., pre-post). Instead of “correlated,” the phrase “repeated measures” is used to indicate that the levels of a factor are actually two or more measurements on the same group of people as part of a single research study.

Suppose that instead of viewing gender as either male or female, it was considered “sex-role orientation.” The possible categories might then be male, female, and androgynous, which would be the three levels of the “sex-role orientation” factor. Then suppose a measure of “counseling effectiveness” could be obtained for everyone in each of the three groups.

A F M A F M A M F A M F A F M One question might then be, “Are there sta-tistically significant differences among the counseling effectiveness means of the three groups?” Graphically, the possibilities would be:

The appropriate analysis for this situation is a one-way analysis of variance. It’s called “one-way” because there is only one factor involved. This is one of several types of analyses of variance, all of which are abbreviated “ANOVA.”

A one-way ANOVA is appropriate when there is one factor in the study. The factor may have three or more levels. The levels may be either uncorrelated (e.g., three categories of sex-role orientation) or correlated (e.g., pre-post-follow-up for an experimental study). A one-way ANOVA yields an F statistic (or as it is more commonly known, an F value).

Theoretically, a one-way ANOVA works with a factor with as many levels as are relevant and/or desired. Computer programs generate an exact probability for the F value, which can then be compared to the alpha level. A statistically significant F value means that there is at least one statistically significant difference among the means.

However, a statistically significant F value does notindicate which means are significantly different from one another. A “multiple comparison” is a statistical procedure that allows determination of which means are statistically significantly different from another. A multiple comparison is only appropriate following a statistically significant F value.

A F M A F M A M F A M F A F M A multiple comparison allows determination of which of these patterns exists (and more than one may apply):

Multiple comparison procedures range on a continuum of “liberal” to “conservative.” The more liberal the procedure, the smaller the difference needed to be considered statistically significantly different. More conservative procedures reduce the chance for Type I error, but make it more difficult to achieve a statistically significant difference.

Pairwise Comparisons (t-tests) liberal Fisher LSD Duncan Multiple Range Test (Student) Newman-Keuls Tukey HSD Scheffe conservative Some of the multiple comparison methods include:

A factorial analysis of variance (ANOVA) is appropriate when there are two or more factors, each of which has at least two levels. (Again, remember that a factorial ANOVA is not the same as factor analysis). Suppose the research question was, “What are the differences in graduate-level academic aptitude as a function of gender and race?”

The variables might be as follows: The “dependent” variable is GRE Total Score. One factor is “gender,” and it has two levels: male (M) and female (F). Another factor is “race,” and it has three levels: African-American (AA), Hispanic-American (HA), and Caucasian-American (CA).

Race HA AA CA M Gender F GRE-T means Graphically, the research could be shown as:

One F value would be obtained for each factor: Fgender Frace These are known as the “main effects” F values. These F values are independent; the statistical significance of one is unrelated to the statistical significance of the other.

An interaction F value also would be obtained. Fgender by race An interaction F value allows evaluation of whether the effects of one variable are consistent for all levels of the other variable. The interaction F value is independent of the other two.

Race HA AA CA M Fgender Gender F Fgender by race Frace Graphically, it all looks like this:

F M AA HA CA AA HA CA M S M S D M S D M S D M S D M S D D Now suppose another factor is added, such as “academic degree”(Master’s, Specialist, or Doctorate).

There will be one F value for each factor (aka “main effects”). Fgender Frace Fdegree These F values are all independent of one another. If either Frace and/or Fdegree is statistically significant, a multiple comparison would be needed to determine the pattern of significant differences.

There also would be three “two-way interactions.” Fgender by race Frace by degree Fdegree by gender There also would be one “three-way interaction,” which represents the combination of variables three at a time. Fgender by race by degree These F values also are independent of all the others.

The t-test, one-way ANOVA, and factorial ANOVA are known as “univariate” analyses, because only one dependent (e.g., GRE Total score) variable is involved. If a second (or more) dependent variable is added, the appropriate analysis is a multivariate analysis of variance (MANOVA).

A MANOVA also yields an F value. If the Fmultivariate is NOT significant, it means that there are no significant differences anywhere among the sets of means. If the Fmultivariateis statistically significant, appropriate univariate analyses must be computed to determine which means are significantly different from one another.

Graphically, analyses of differences can be summarized as follows: Analysis of Dependent Uncor- Repeated Difference Variables Factors Levels related Measures 1 1 2 Yes Yes (Student’s) t-test 1 3 1 Yes Yes One-way ANOVA 1 2 2 Yes Yes Factorial ANOVA 2 2 2 MANOVA Yes Yes

Nonparametric Statistics

So-called nonparametric statistics are used when the data are nominal or ordinal, or when the data are interval but the assump-tion of a normal distribution of the variable cannot be met. In general, there are nonparametric statistical analyses that “parallel” most parametric statistical analyses.

Comprehensive Exam Review