Research Questions, Variables, and Hypotheses

Research Questions, Variables, and Hypotheses 2/21/06

Review • What are research questions? • What are variables? • Definition • Function • Measurement Scale

Hypotheses • OK, now that we know how to set up a research project by posing research questions and labeling variables, it is time to move on to a more formal way of structuring and interpreting research.

Hypotheses Definitions • Hypotheses are predictions about the relationship among two or more variables or groups based on a theory or previous research (Pittenger, 2003) • Hypotheses are assumptions or theories that a researcher makes and tests. • Why are hypotheses important?

Importance of Hypotheses • Hypotheses: • Direct our observations • Identifies the variables examined and data to be collected • Describe a relationship among variables • Can state that as one variable increases, the other will decrease; as one variables increases, the other will increase, and so on. • Refer to populations • Hypotheses help researchers infer that results of a sample will translate to a population

4 Functions of Hypotheses • Hypotheses can: • Estimate Population Characteristics • Correlate Variables • Display Differences among Two or more populations • Show possible Cause and Effect What research designs relate to each of these 4 functions?

Symbols used in Hypotheses • M= mean • µ(mu: mew)= population mean • Roman Letters (e.g., A, B, C, D) are used to represent statistics • Greek Letters (e.g., α, β) are used to represent parameters • α= significance level; probability of committing a Type I Error (α= .05) • p= probability value (p= .05) • Null Hypothesis= (H0: µ1 - µ2 = 0 or H0:µ1 = µ2) • Alternative Hypothesis= (H1: µ1-µ2 ≠ 0 or H1: µ1 ≠ µ2 ) • Sometimes you may see it noted as HA

Types of Hypotheses • Research Hypotheses • Statistical Hypotheses

Research Hypotheses • Research Hypothesis: A statement of the relationship among two or more variables or groups. • The acceptance or non-acceptance of which is based on resolving a logical alternative with a null hypothesis. • Example: Graduate students who read the text in research methods will score higher on their comprehensive exams than graduate students who did not read their research methods text.

Research Hypotheses Cont. • Research hypotheses can be stated as Directional or Non-directional. • Directional hypotheses predict the specific relationship among two or more variables or groups: • Graduate students who read the text in research methods will score higher on their comprehensive exams than graduate students who did not read their research methods text. H0:µ1<µ2 H1:µ1 > µ2 • IQ scores will correlate in a positive manner with Self Esteem Scores • Cats will bark less frequently than Dogs

Research Hypotheses Cont. • Non-Directional Hypotheses predict that there will be differences among two or more groups, but do not specify the direction of the differences • Men and Women will differ in their recall of phone numbers • The scores on the Geriatric Depression Scale will differ between people with Stroke and people with Alzheimer’s disease • IQ scores will correlate with Self Esteem scores H0:µ1 = µ2 H1:µ1 ≠ µ2

Research Hypotheses Cont. • Your Turn! • Come up with some directional and non-directional research hypotheses • Share with the class

Statistical Hypotheses • Statistical Hypotheses are mathematical, or logical statements that help researchers interpret the results of research • Statistical hypotheses consist of the Null Hypothesis (H0), the hypothesis of no difference and the Alternative Hypothesis (H1 or HA) which is similar in form to the research hypothesis. • Null: (H0: µ1 - µ2 = 0 ) • Alternative: (H1: µ1 - µ2≠ 0)

Written out examples: • Null: There will be no difference in the comprehensive test scores of graduate students who read the text in research methods and those who did not read their research methods text. • Alternative: Graduate students who read the text in research methods will score higher on their comprehensive exams than graduate students who did not read their research methods text.

Statistical Hypotheses Cont. • Remember, and this is important: • The null hypothesis always implies that there is no relation or statistical difference between variables or groups • The alternative hypothesis implies that there is a meaningful relationship among variables or groups

Testing Hypotheses • Is it possible that our observations occurred by chance?

A story…(borrowed from Gonick & Smith, 1993) In the southern US several legal cases examined racial bias in jury selection. In the 50’s & 60’s panels of jurors were drawn from a list of eligible citizens and very few African Americans (roughly 4 out of 80) were found on the jury panels despite the fact that approximately 50% of the eligible citizens were African American. Can this be a result of pure chance?

A story - continued If the selection of potential jurors was random – then the number of African-Americans on an 80 person panel would be X (a binomial random variable) with n=80 trials and p=.5 This means the chances for getting a panel of 80 jurors with only four African Americans is .0000000000000000014 or 1.4 X 10-18. These odds are similar to getting three consecutive royal flushes in poker.

Testing Hypotheses Cont. • We use a variety of statistical procedures to test null hypotheses. The choice of which procedure we use depends on a variety of factors including: • the research hypothesis, • the data, • the sampling strategy, • and what we want to be able to say as a result of our testing.

Types of Tests • Statistical procedures that are commonly used for hypothesis testing include: correlation, analysis of variance (ANOVA), analysis of covariance (ANCOVA), regression, multivariate analysis of variance (MANOVA), t-tests, and Chi-Square. Each of these procedures has an associated test statistic, which is used to determine significance. For example ANOVA, ANCOVA, and regression use F statistics and their associated p-values. • Multivariate procedures, like MANOVA, use a variety of test statistics with interesting names, like Wilk’s lambda. These are then related to a more common test statistic, like F. • The secret here, for the layperson, is that all test statistics are eventually related to a probability distribution and a p-value. These p-values mean the same thing across test statistics.

Error Types • In hypothesis testing, we must contend with two types of errors -- Type I and Type II. • Errors are mistakes that we can make when judging the null hypothesis • Type I error is what happens when the tested hypothesis is falsely rejected. (It is when you say you found something, but that something is really an error.) A type I error is a false positive. • Type II error is what happens when a false tested hypothesis is not rejected (Hays, 1986). (It is when you don’t find something that is, in fact, there.) A type II error is a false negative.

Error Types Cont. • Alpha is the level of probability (pre-set by the researcher) that the tested hypothesis will be falsely rejected. Alpha is the pre-set risk of a Type I error. In other words, alpha is the degree of risk that you accept, in advance of conducting the study, that what you find will be an error. • Beta is the probability (often neglected by the researcher) that a false null hypothesis will not be rejected. Beta is the probability that you won’t find what you are looking for if, in fact, it is really there.

Error Types Cont. • The picture on the next slide, which is modified from Hays (1986) and Ferguson and Takane (1989), illustrates the relationships among Type I and Type II errors and alpha and beta.

Error Types Cont. • Error Types Chart

Example Do we use Null Hypotheses in the real world? “Innocent until Proven Guilty”

Power, Effect Size, and Measurement STATISTICAL POWER • Statistical power is “the probability of rejecting a null hypothesis that is, in fact, false” (Williams, 1986, p. 67). Put more simply, statistical power is the probability of finding relationships or differences that in fact exist (Cohen, 1988). • In our legal example, power is the probability of being found guilty, if you committed a crime. In terms of beta (the probability of a Type II error – odds of getting off despite being guilty) is statistical power = 1 - beta. • So, if you have a 20 percent chance of being found not guilty, even though you committed the crime, the statistical power of the judicial system is: 1 - .20 = .80.

Power • Statistical power is a function of “the preset significance criterion [alpha], the reliability of sample results, and the effect size [the actual size of the difference or strength of the relationship]...” (Cohen, 1988, p. 4). • Considering complex interrelationships of the above criteria, one can say that The researcher can easily set alpha, but cannot easily set beta. • Alpha and beta are directly, but not perfectly related. • Lowering alpha increases beta and lowers the power. • Increasing alpha decreases beta and increases power.

Power, Effect Size, and Measurement Cont. Statistical power is then related to: • Sample size • Effect size • Statistical design (including number of groups, 1- vs. 2-tailed tests) • Significance criteria

Power, Effect Size, and Measurement Cont. EFFECT SIZE • Effect size (ES) refers to the amount of common variance between the independent variable(s) (IV) and the dependent variable(s) (DV), or the degree to which changes in the IV(s) result in changes in the DV(s).

Power, Effect Size, and Measurement Cont. Effect Size Cont. • For example, if I am interested in the differences between male and female UF students concerning the number of parking tickets over two years, my effect size would be the size of the difference between the means of the two groups. • Or, if I wanted to test a specific intervention for students with learning disabilities, and I had a test, which I believed measured the effectiveness of my intervention; then my effect size might be the difference in test scores between an experimental group that received the intervention and a control group that did not receive the intervention.

Power, Effect Size, and Measurement Cont. Effect Size Cont. • Similarly, if I wanted to examine the impact of a specific course on research anxiety, effect size could be the differences in the mean scores of research anxiety between an experimental group who completed the course and a control group who did not.

Power, Effect Size, and Measurement Cont. Effect Size Cont. • Here is a large problem: Effect size depends on what measure we use to operationalize the construct. • For example, effect size depends on the net we use, the test we select, etc. Actual effect sizes may be much larger than observed effect sizes. What might be considered a moderate to large effect in a laboratory situation may appear as a small effect in the real world where you can’t control numerous sources of extraneous variance, e.g., variability in individual characteristics, treatment implementation, environmental characteristics (Cohen, 1988). • Small effect sizes are common and should be expected in ex post facto and quasi experimental situations (Cohen, 1988).

Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER • This is just a conceptual introduction. We will return to validity of measurement in a future lecture. • All research depends on an operational definition of the constructs of interest. In intervention research, the operational definitions of both the treatments and the outcomes influence effect size. As we are all aware, there are a variety of frames of reference regarding interventions and outcomes.

Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. • Consider the elephant fable with the researchers who mapped different parts of the elephant. Their descriptions of the elephant differed considerably. • What we see in research depends, at least in part, on what facet(s) of the construct of interest is (are) operationalized by our outcome measure(s). It is always better to look at the construct in more than one way (more than one facet) in order to limit threats to validity from mono-operational bias. • In other words, looking at the elephant from different angles can improve the degree to which our descriptions of the elephant actually describe the elephant.

Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. • Now, consider measuring the same elephant with portable X-Ray machines. Pictures of each part of the elephant are taken and then compared with each other. Not only do these pictures not resemble each other, but they also don’t resemble the descriptions provided by the previous group of researchers. This chapter of the elephant fable indicates how what we see is indicated by our method of observation or measurement. • Again, a researcher interested in a deeper understanding of the elephant may choose multiple methods of measurement in order to avoid threats to validity from mono-method bias.

Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. • The relationship of measurement, research design, and statistical power means that large treatment effects can actually be observed as small effects. In other words, even if an intervention is very effective, measurement and design complications may make the effect appear small and thus require high statistical power for detection. • The following telescope model depicts the interrelation. The effect is obscured when we only look at part of the construct of interest. The apparent effect size is then attenuated by the extent to which our operational definitions (including our measurement techniques) do not reliably and validly capture the construct of interest (i.e., intervention effectiveness).

Power, Effect Size, and Measurement Cont. • Telescope Model Actual Effect Size Observed Effect Size Statistical Design Research Design Measurement

Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. • Apparent effect size is further attenuated when research design does not fully filter out extraneous sources of variation (e.g., counselor or client differences). Violations of assumptions of statistical procedures can further attenuate effect size. Interestingly, problems in research design and statistical design can also introduce sources of Type 1 error (e.g., dust on the lens or false positive results). • The relationship of effect size, measurement, and design is further complicated by the frame of reference or angle from which one approaches or operationalizes the construct. E.g. the researcher that only looks at the leg of the elephant.

Power, Effect Size, and Measurement Cont. RELATIONSHIP OF MEASUREMENT, RESEARCH DESIGN, ANDSTATISTICAL POWER Cont. • Validity is a key element of the relationship of effect size, measurement, and design. Clearly, qualitative methods can further valid operationalization of constructs. Multiple operational definitions and multiple methods as recommended by Cook and Campbell (1979) can enhance the validity of research, including counseling effectiveness research.

Power, Effect Size, and Measurement Cont. PRE-ANALYSIS STATISTICAL POWER • Pre-analysis statistical power estimation is a recommended technique. The following steps will allow you to consider statistical power in research planning. • Estimate effect size from past research and the type of experimental design planned. When you are unsure, underestimate effect size so as to overestimate power. Also, in quasi-experimental or ex post facto circumstances, it is usually best to estimate a small effect size unless otherwise indicated. • Decide on exact statistical test and significance criterion. • Determine acceptable level of power, .80 is nice but .70 may be acceptable in some circumstances.

Power, Effect Size, and Measurement Cont. PRE-ANALYSIS STATISTICAL POWER Cont. • Use power tables for that statistical test or an appropriate computer program to determine the number of subjects required for the specified significance criterion and desired level of power. • If you have a fixed number of subjects, consider adjusting the significance criterion (alpha) or statistical design if necessary to obtain adequate power. • Recall, the .05 significance criterion is not sacred, especially when it results in a power of less than .30 (i.e., less than a 30% chance of finding differences that actually exist). (Szymanski & Parker, 1992)

Power, Effect Size, and Measurement Cont. ALPHA INFLATION • Multiple comparisons can increase alpha, the probability of a Type I error. • The probability of a Type I error escalates with the number of comparisons made in the study. The experiment-wise alpha is computed as: 1-(1-alpha)n

Power, Effect Size, and Measurement Cont. ALPHA INFLATION • One way to guard against alpha inflation is to use a Bonneferoni-type procedure and to split alpha by the number of comparisons. There are a variety of such procedures that can be used (see e.g., Marasciulo & Serlin, 1988) according to the relative importance of the tested hypotheses. • The problem with reducing alpha is that it inflates beta. In situations in which alpha inflation is accepted due to a problem with power, one must look to replications for confidence in the findings.

Power, Effect Size, and Measurement Cont. • See Power Tables

Test Statistics, Probability, and Significance • In order to test a hypothesis, we compare the obtained value of a test statistic (e.g., the obtained F) to a critical value of the test statistic (e.g., a critical F) that is associated with the preset significance level (alpha). • If the obtained value of the test statistic is greater than the critical value, we determine that there is a significant difference or relationship.

Test Statistics, Probability, and Significance Cont. • Test Statistic: The specific statistic (i.e., the tool) that is chosen to test the null hypothesis. Examples include F, t, r. • Obtained Value: The actual value obtained when applying the test statistic to the data of interest. The probability value associated with the obtained value is p. • Critical Value: The critical value of the test statistic that is associated with the chosen significance level (alpha). If the obtained value is greater that the critical value, the result is significant.

Test Statistics, Probability, and Significance Cont. • Probability Value: The probability that observed relationships or differences are due to chance. • Alpha: Alpha is also known as significance level or rejection region. It is the level of probability set by the researcher as grounds for rejection of the null hypothesis (Williams, 1986, p. 58). Alpha is the probability level associated with the critical value of the test statistic. • In other words, alpha is our predetermined risk that differences that we declare to be real are actually due to chance. • Obtained: This is also known as the obtained probability (p): significance of the test statistic. It is the “probability that the data could have arisen if Ho were true” (Cohen, 1994, p. 998).

Test Statistics, Probability, and Significance Cont. • Significance: What happens when the obtained probability p is less than our predetermined alpha. Significance also occurs when the obtained value of the test statistic is greater than the critical value of the test statistic. • Test Statistic Probability Value • Critical Value Significance Level (alpha) • Obtained Value Obtained or Actual Probability (p) • Note that larger obtained values of test statistics are generally related with smaller values of p. • If Obtained Value > Critical Value, then * Significance * • If p < Alpha, then * Significance *

Revisiting the Bell Curve

Research Questions, Variables, and Hypotheses