Overlooking Stimulus Variance

Overlooking Stimulus Variance Jake Westfall University of Colorado Boulder Charles M. Judd David A. Kenny University of Colorado Boulder University of Connecticut

Cornfield & Tukey (1956):“The two spans of the bridge of inference”

50 University of Colorado undergraduates; 40 positively/negatively valencedEnglish adjectives My actual samples

All healthy, Western adults; All non-neutral visual stimuli 50 University of Colorado undergraduates; 40 positively/negatively valencedEnglish adjectives Ultimate targets of generalization My actual samples

All healthy, Western adults; All non-neutral visual stimuli All CU undergraduates taking Psych 101 in Spring 2014; All short, common, strongly valenced English adjectives 50 University of Colorado undergraduates; 40 positively/negatively valencedEnglish adjectives Ultimate targets of generalization My actual samples All potentially sampled participants/stimuli

All healthy, Western adults; All non-neutral visual stimuli “Subject-matter span” “Statistical span” 50 University of Colorado undergraduates; 40 positively/negatively valencedEnglish adjectives Ultimate targets of generalization My actual samples All potentially sampled participants/stimuli

Difficulties crossing the statistical span • Failure to account for uncertainty associated with stimulus sampling (i.e., treating stimuli as fixed rather than random) leads to biased, overconfident estimates of effects • The pervasive failure to model stimulus as a random factor is probably responsible for many failures to replicate when future studies use different stimulus samples

Doing the correct analysis is easy! • Modern statistical procedures solve the statistical problem of stimulus sampling • These linearmixed models with crossed random effects are easy to apply and are already widely available in major statistical packages • R, SAS, SPSS, Stata, etc.

Illustrative Design • Participants crossed with Stimuli • Each Participant responds to each Stimulus • Stimuli nested under Condition • Each Stimulus always in either Condition A or Condition B • Participants crossed with Condition • Participants make responses under both Conditions Sample of hypothetical dataset:

Typical repeated measures analyses (RM-ANOVA) How variable are the stimulus ratings around each of the participant means? The variance is lost due to the aggregation • “By-participant analysis”

Typical repeated measures analyses (RM-ANOVA) 4.00 3.67 6.33 7.33 3.67 6.33 8.00 6.00 8.00 4.00 5.00 5.33 Sample 1 v.s. Sample 2 “By-stimulus analysis”

Simulation of type 1 error rates for typical RM-ANOVA analyses • Design is the same as previously discussed • Draw random samples of participants and stimuli • Variance components = 4, Error variance = 16 • Number of participants = 10, 30, 50, 70, 90 • Number of stimuli = 10, 30, 50, 70, 90 • Conducted both by-participant and by-stimulus analysis on each simulated dataset • True Condition effect = 0

Type 1 error rate simulation results • The exact simulated error rates depend on the variance components, which although realistic, were ultimately arbitrary • The main points to take away here are: • The standard analyses will virtually always show some degree of positive bias • In some (entirely realistic) cases, this bias can be extreme • The degree of bias depends in a predictable way on the design of the experiment (e.g., the sample sizes)

The old solution: Quasi-F statistics • Although quasi-Fs successfully address the statistical problem, they suffer from a variety of limitations • Require complete orthogonal design (balanced factors) • No missing data • No continuous covariates • A different quasi-F must be derived (often laboriously) for each new experimental design • Not widely implemented in major statistical packages

The new solution: Mixed models • Known variously as: • Mixed-effects models, multilevel models, random effects models, hierarchical linear models, etc. • Most psychologists familiar with mixed models for hierarchical random factors • E.g., students nested in classrooms • Less well known is that mixed models can also easily accommodate designs with crossed random factors • E.g., participants crossed with stimuli

Grand mean = 100

MeanA = -5 MeanB = 5

Participant Means 5.86 7.09 -1.09 -4.53

Stimulus Means: -2.84 -9.19 -1.16 18.17

Participant Slopes 3.02 -9.09 3.15 -1.38

Everything else = residual error

The linear mixed-effects modelwith crossed random effects Fixed effects Random effects

Fitting mixed models is easy: Sample syntax R library(lme4) model <- lmer(y ~ c + (1 | j) + (c | i)) proc mixed covtest; class i j; model y=c/solution; random intercept c/sub=i type=un; random intercept/sub=j; run; SAS MIXED y WITH c /FIXED=c /PRINT=SOLUTION TESTCOV /RANDOM=INTERCEPT c | SUBJECT(i) COVTYPE(UN) /RANDOM=INTERCEPT | SUBJECT(j). SPSS

Mixed models successfully maintain the nominal type 1 error rate (α = .05)

Conclusion • Stimulus variation is a generalizability issue • The conclusions we draw in the Discussion sections of our papers ought to be in line with the assumptions of the statistical methods we use • Mixed models with crossed random effects allow us to generalize across both participants and stimuli

The end Further reading: Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of personality and social psychology, 103(1), 54-69.

Overlooking Stimulus Variance