EVAL 6970: Experimental and Quasi-Experimental Designs

EVAL 6970:Experimental and Quasi-Experimental Designs Dr. Chris L. S. Coryn Kristin A. Hobson Fall 2013

Agenda Randomized experiments

Important Caveats

Caveats • Not every phenomenon of interest or value can be studied experimentally • Many variables of interest cannot be manipulated or isolated in the way required for experiments • Most trait variables, particularly including gender, race, and ethnicity • These types of variables can still be the subject of cause-probing studies, but cannot be manipulated in the formal sense

Caveats • Many phenomena of interest also cannot be manipulated or isolated for ethical reasons • Can’t withhold potentially effective treatments from participants • Example: Tuskegee syphilis study • Can’t assign participants to potentially harmful conditions • Physiologically - require participants to smoke or expose them to a pathogen • Psychologically – Stanford prison experiments

Theory of Random Assignment

Random Assignment • Random assignment is any procedure by which units are assigned (selected to) conditions based only on chance • Each unit has a known, nonzero probability of being assigned to a condition • This method of assignment reduces the plausibility of many alternative explanations for observed effects—particularly selection • By definition, randomization rules out selection threats; random chance cannot introduce systematic bias into the selection process • However, this works best with large samples • Random assignment attempts to distribute systematic differences (biases) equally over groups on every variable, whether observed or not • This is why random assignment is superior to even pretesting with statistical matching

Random Assignment • Unlike other controls for validity threats (like pretests and nonequivalent dependent variables), random assignment yields unbiased estimates of average treatment effects • Here, unbiased means that any between-group differences are due solely to chance, rather than systematic sources of error • Regression discontinuity also yields unbiased effect estimates, but randomized experiments are more flexible, and their analysis is often more straightforward

Random Assignment versus Random Sampling • Random assignment is not the same thing as random sampling; the two procedures serve entirely different purposes • Random sampling • Places units from the population into the sample • Makes a sample more representative of the population • Strengthens external validity • Random assignment • Places units from the sample into treatment conditions • Make samples equivalent to each other • Strengthens internal validity

Why Randomization Works • Reduces plausibility of threats to validity by distributing them randomly over conditions • It equates groups on the expected value of all variables at pretest, regardless of whether those variables are measured • It allows the selection process to be completely known and completely modeled. This property is unique to randomized experiments and regression discontinuity designs. • Allows valid estimation of error variance that is orthogonal to treatment • It ensures that alternative causes are not confounded with a unit’s treatment condition

Why Randomization Works • Groups are equated before treatment, eliminating pretest selection differences as a plausible cause of posttest differences • The posttest of the control group serves as a very good counterfactual for the treatment group posttest • Threats are randomly distributed over conditions, so both control and treatment units have the same average characteristics • The only remaining systematic difference between conditions is treatment • Note that random assignment equates groups on expectation

Randomization Doesn’t Fix Everything • First and foremost! • Randomization works best in large samples. The smaller the sample, the more likely that significant differences remain between groups • Attrition is the largest threat to randomized experiments (as selection is to quasi-experiments) • Attrition is often differential; there are usually differences between those who remain in a study and those who drop out • Randomization does nothing for maturation effects, and it cannot prevent the possibility of historical events affecting groups (likewise, pretests can still cause a testing effect, and changes in instrumentation can still occur) • However, random assignment does reduce the likelihood that these threats are confounded with treatment effects

Randomization Doesn’t Fix Everything • Randomization can also indirectly affect the required amount of statistical power, because attrition reduces the number of units that remain in a study • A priori power analysis will provide information about how many units are necessary to achieve a minimum detectable effect size (MDES) • Oversampling can help avoid loss of power due to attrition • As a general guideline, include 25%-50% more participants than would be required for minimum power… so that when you lose participants, you can still detect the expected effect

Randomization and Units • A unit can be viewed as an opportunity to apply or withhold treatment • Units can be individuals (like people or animals) or higher order aggregates (like families, job sites, or classrooms) • It is often easier to obtain required power with individual units. Higher-order or nested units sometimes require larger sample sizes, because power is based on the unit of randomization • If the higher order unit were classrooms, for instance, increasing power requires a larger number of classrooms to increase power

Limitations of Randomization Randomized experiments are often considered the gold standard of cause-probing studies However, randomized experiments are most useful for answering questions about local molar causation The valid generalization of results from randomized experiments relies on correspondence between the units sampled and the population of interest

Basic Designs

Basic Designs • Note that none of these designs use pretests • Why would you skip pretests? • There might be a concern over sensitization (i.e., a testing effect) • Administration might be unfeasible • The variable of interest might be a constant, as in studies of mortality (all patients are alive at the start) • Why use pretests? • Pretests allow you to study attrition! • Do those who drop out of one condition differ from those who drop out of another? Basic design Good if treatment A is known to be effective, otherwise no way to determine if both were equally effective or ineffective Two treatments Two treatments and a control Although groups are assumed to be equated…the problem is…

Basic Designs Pretest-posttest These designs can be used for dismantling studies (study of specific components or parts of a treatment) They are also used for dose-response studies (differing doses of the same treatment) Alternative-treatments This design allows investigators to explore attrition. It also results in increased power, by using pretests as covariates in ANCOVA Two treatments and a control

Factorial Designs • In a factorial design, two or more independent variables (factors) are investigated concurrently • Each factor must have at least 2 levels (treatment/control, low dose/high dose, etc.) • The number of factors and levels within factors determine the number of cells in the design • There are 4 cells in a 2 x 2 factorial design • There are 8 cells in a 2 x 2 x 2 design • There are 12 cells in a 3 x 2 x 2 design • The main advantage of factorial designs is that the joint contribution of two or more independent variables can be simultaneously studied (rather than requiring two or more separate studies)

Basic Factorial Design Basic factorial design 2 x 2 factorial design Factor A (Level 1 and Level 2) Factor B (Level 1 and Level 2) Results in four cells: A1B1, A1B2, A2B1, and A2B2

Notation for Factorial Designs The number of numbers here is the number of factors in the design. 3 2 X 3 X 4 3 2 4 The numbers themselves indicate the number of levels in each factor.

Main Effects and Interactions • In factorial designs we also discuss main effects and interactions • In a 2 x 2 design there are two main effects (one for Factor A, and one for Factor B) and one interaction (Factor A x Factor B) • Main effects reflect the separate treatment effects of one independent variable (i.e., factor) averaged over the levels of other independent variables • Interactions occur when treatment effects are not constant, but vary over levels of another factor • The interaction of one factor with another is sometimes referred to as a moderator

Example • As an example, consider a 2 x 3 factorial design • Factor A is Gender • There are 2 levels: male and female • Factor B is Age • There are 3 levels: young, middle-aged, and old • The outcome variable is performance on a mathematical aptitude test • Is this a randomized experiment? What would random assignment of participants to conditions look like? • If the performance of the male group differs as a function of age (that is, males performed worse as age increases), but the performance of the female group is consistent across age groups, then there is an Age x Gender interaction

Longitudinal Designs • Allows investigators to study how effects change over time • Adds power to small sample sizes • However • Attrition is a serious problem • It can be unethical to withhold effective treatment for a long period of time Similar to a time-series, but with fewer pretest and posttest observations Can be used to study different outcomes over time that are causally related aspirations → expectations → achievement → educational success → quality of life (e.g., income, status)

Crossover Designs • Allows counterbalancing and assessment of order effects • The effects of the first treatment must dissipate before another begins (otherwise, future treatments is confounded) • This is essentially a variation of the factorial design A variant of the Latin squares design, in which all units and all possible orders of a treatment are presented in a within-subjects design After the first posttest, units cross over to receive treatment they did not previously get If there were three treatment conditions (A, B, and C) then there would be 6 possible orders (ABC, ACB, BAC, BCA, CAB, and CBA), so subjects would be divided into 6 groups

Factors Conducive to Randomized Designs Demand for a treatment outstrips the supply An innovation cannot be delivered to all units at once Experimental units can be temporally isolated Experimental units are spatially/geographically separated, or communication between units is otherwise low Change is mandated, but the quality or effectiveness of solutions is unknown A tie can be broken, or ambiguity about need can be resolved Some persons (participants) express no preference among alternatives Investigators can create their own organization Investigators have control over experimental units Lotteries are an expected portion of treatment

Inhibiting Factors • Randomized experiments take a lot of time, in both design and execution (a time frame of several years from conceptualization to results is not unusual) • Policymakers and other stakeholders often need answers now • Randomized experiments provide very precise and valid answers about whether a treatment is effective, at substantial cost • Policymakers and other stakeholders may not need such precise answers • Randomized experiments can only provide answers to a fairly narrow set of questions, and the investigator must be able to actively manipulate treatment • Many questions of interest to policy and decision makers are not necessarily causal or cannot be manipulated

Inhibiting Factors • Before a randomized experiment is conducted, investigators must demonstrate (have evidence for, have a reasonable expectation of) all of the following: • Present conditions need improvement • The proposed improvement is of unclear value, or there are several changes whose relationship is unclear • The results of the experiment would clarify the situation • The results would be used to change the policy or practice relating to present conditions • The rights of participants will be protected throughout the process

EVAL 6970: Experimental and Quasi-Experimental Designs