EVAL 6970: Experimental and Quasi-Experimental Designs

EVAL 6970:Experimental and Quasi-Experimental Designs Dr. Chris L. S. Coryn Dr. Anne Cullen Spring 2012

Agenda Statistical power/design sensitivity Statistical conclusion validity and internal validity

Statistical Power/Design Sensitivity

Types of Hypotheses • General forms: • Superiority • Nondirectionalor directional • Equivalence and noninferiority • Within a prespecified bound

Accept-Reject Dichotomy

Type I Error • Type I error (sometimes referred to as a false-positive) is the conditional prior probability of rejecting H0 when it is true, where this probability is typically expressed as alpha (α) • Alpha is a prior probability because it is specified prior to data gathering, and it is a conditional probability because H0 is assumed to be true and can be expressed as α = p (Reject H0 | H0 true)

Type II Error • Power is the conditional prior probability of making the correct decision to reject H0 when it is actually false, where Power = p (Reject H0 | H0 false) • Type II error (often referred to as a false-negative) occurs when the sample result leads to the failure to reject H0 when it is actually false, and it also is a conditional prior probability, where β = p (Fail to reject H0 | H0 false)

Type II Error • Because power and β are complimentary Power + β = 1.00 • Whatever increases power decreases the probability of a Type II error and vice versa • Several factors affect statistical power, including α levels, sample size, score reliability, design elements (e.g., within-subject designs, covariates), and the magnitude of an effect in a population

Determinants of Power • Four primary factors (there are others) that affect design sensitivity/statistical power • Sample size • Alpha level • Statistical tests • Effect size • By lowering α, for example, statistical power is lost, thus reducing the likelihood of a Type I error, which simultaneously increases the probability of a Type II error

Sample Size Statistical significance testing is concerned with sampling error, the discrepancy between sample values and population parameters Sampling error is smaller for larger samples and therefore less likely to obscure real differences and increase statistical power

Alpha Alpha levels influence the likelihood of statistical significance Larger alpha levels make significance easier to attain than smaller levels When the null hypothesis is false, statistical power increases as alpha increases

Statistical Tests Tests of statistical significance are made within the framework of particular statistical tests The test itself is one of the factors affecting statistical power Some tests are more sensitive than others (e.g., analysis of covariance)

Effect Size The larger the true effect, the greater the probability of statistical significance and the greater the statistical power

Basic Approaches to Power • Power determination approach (post hoc) • Begins with an assumption about an effect size • The aim is to determine the power to detect an effect size with a given sample size • Effect size approach (a priori) • Begins with a desired level of power to estimate a minimum detectable effect size (MDES) at a prespecified level of power

Working with Power and Precision 2.0 and 3.0

Construct Validity & External Validity

Construct Validity

Construct Validity The degree to which inferences are warranted from the observed persons, settings, treatments, and outcome (cause-effect) operations sampled within a study to the constructs that these samples represent

Construct Validity Most constructs of interest do not have a natural units of measurement Nearly all empirical studies are studies of specific instances of persons, settings, treatments, and outcomes and require inferences to the higher order constructs represented by sampled instances

Why Construct Inferences are a Problem Names reflect category memberships that have implications about relationships to other concepts, theories, and uses (i.e., nomonological network) In the social sciences it is nearly impossible to establish a one-to-one relationship between the operations of a study and corresponding constructs

Why Construct Inferences are a Problem • Construct validity is fostered by: • Clear explication of person, treatment, setting, and outcome constructs of interest • Careful selection of instances that match constructs • Assessment of match between instances and constructs • Revision of construct descriptions (if necessary)

Assessment of Sampling Particulars All sampled instances of persons, settings, treatments, and outcomes should be carefully assessed using whatever methods (i.e., quantitative, qualitative, etc.) necessary to assure a match between higher order constructs and sampled instances (i.e., careful explication)

A Note about “Operations” • To operationalize is to define a concept or variable in such a way that it can be measured or defined (i.e., operated on) • A operational definition is a description of the way a variable will be observed and measured • It specifies the actions [operations] that will be taken to measure a variable

Threats to Construct Validity Inadequate explication of constructs. Failure to adequately explicate a construct may lead to incorrect inferences about the relationship between operation and construct Construct confounding. Operations usually involve more than one construct, and failure to describe all constructs may result in incomplete construct inferences Mono-operation bias. Any one operationalization of a construct both underrepresents the construct of interest and measure irrelevant constructs, complicating inferences Mono-method bias. When all operationalizations use the same method (e.g., self-report), that method is part of the construct actually studied Confounding construct with levels of constructs. Inferences about the constructs that best represent study operations may fail to describe the limited levels of the construct studied

Threats to Construct Validity Treatment sensitive factorial structure. The structure of a measure may change as a result of treatment, change that may be hidden if the same scoring is always used Reactive self-report changes. Self-reports can be affected by participants motivation to be in a treatment condition, motivation that can change after assignment has been made Reactivity to experimental situation. Participant responses reflect not just treatments and measures but also participants’ perceptions of the experimental situation, and those perceptions are actually part of the treatment construct Experimenter expectancies. The experimenter can influence participant responses by conveying expectations about desirable responses, and those responses are part of the treatment construct Novelty and disruption effects. Participants may respond unusually well to a novel innovation or unusually poorly to one that disrupts their routine, a response that must then be included as part of the treatment construct definition

Threats to Construct Validity Compensatory equalization. When treatment provides desirable goods or services, administrators, staff, or constituents may provide compensatory goods or services to those not receiving treatment, and this action must be included as part of the treatment construct description Compensatory rivalry. Participants not receiving treatment may be motivated to show they can do as well as those receiving treatment, and this must be included as part of the treatment construct Resentful demoralization. Participants not receiving a desirable treatment may be so resentful or demoralized that they respond more negatively than otherwise, and this must be included as part of the treatment construct Treatment diffusion. Participants may receive services from a condition to which they were not assigned, making construct definitions of both conditions difficult

External Validity

External Validity The degree to which inferences about the extent to which a causal relationship holds over variations in persons, settings, treatments, and outcomes

External Validity Inferences to (1) those who were in an experiment or (2) those who were not Narrow to broad Broad to narrow At a similar level To a similar or different kind Random sample to population members

Threats to External Validity Interaction of the causal relationship with units. An effect found when certain kinds of units might not hold if other types of units had been studied Interaction of the causal relationship over treatment variations. An effect found with one treatment variation might not hold with other variations of the treatment, or when that treatment is combined with other treatments, or when only part of a treatment is used Interaction of the causal relationship with outcomes. An effect found on one kind of outcome observation may not hold if other outcome observations were used Interaction of the causal relationship with settings. An effect found in one kind of setting may not holds in other settings Context-dependent mediation. An explanatory mediator of a causal relationship in one context may not mediate in another

Constancy of Effect Size versus Constancy of Causal Direction Arguably, few causal relationships in the social world have consistent effect sizes A better method of generalization is constancy of causal direction

Random Sampling and External Validity Random sampling has benefits for external validity, but poses practical limitations in experiments Random samples of persons not common in experiments, but sometimes feasible Random samples of settings are rare, but increasing with the advent of place-based experiments Random samples of treatments and outcomes are even more rare

The Relationship Between Construct Validity and External Validity • Both are generalizations • Valid knowledge of constructs can provide valuable knowledge about external validity • They differ in the kinds of inferences being made • Construct validity to sampled instances • External validity to whether the size or direction of a causal relationship holds over variations in persons, settings, treatments, and outcomes • Can be right about one and not the other

EVAL 6970: Experimental and Quasi-Experimental Designs