Let ’ s do a Bayesian analysis

Let’s do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • https://coglab.cengage.com/?labname=visual_search • Log in with ID=GregTest-145, password=12345678

Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • Vary number of distractors: 4, 16, 32, 64 • Vary type of distractors: feature (different color), conjunctive (different color or shape)

Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • Vary number of distractors: 4, 16, 32, 64 • Vary type of distractors: feature (different color), conjunctive (different color or shape) • Measure reaction time: time between onset of image and participant’s response • 5 trials for each of 16 conditions: 4 number of distractors x 2 target (present or absent) x 2 distractor types (feature or conjunctive) = 80 trials

Visual Search • Typical results: For conjunctive distractors, response time increases with the number of distractors

Linear model • Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation • Let’s do it for a single participant • We are basically going through Section 4.4 of the text, but using a new data set and using brms instead of rethinking • Download files from the class web site and follow along in class

Read in data • rm(list=ls(all=TRUE)) # clear all variables • Clear old variables out from R’s memory • graphics.off() • Remove old graphics from the display • VSdata<-read.csv(file="VisualSearch.csv",header=TRUE,stringsAsFactors=FALSE) • Reads in the contents of the file VisualSearch.csv • Uses the contents of the first row as a header, and creates variables by the names of those headers (no spaces in the header names!) • VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$Target=="Absent" & VSdata$DistractorType=="Conjunction") • Creates a new variable that contains just the data for a particular Participant • Only includes data from the Target absent trials • Only includes data for the Conjunction distractors • Check the contents of VSdata2 in the R console • head(VSdata2)

Define the model • library(brms) • This loads up the R library we want to use • It has some nice functions for plotting and displaying tables • model1=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2 ) • Defines the linear model • Assigns default prior distributions to each parameter (b1, b2, sigma) • “Improper” uniform priors

brms • Here, defining the model basically does all the work • This function takes the priors and the data, and computes posterior distributions • The main output is the set of b1, b2, and sigma that have the highest probabilities • Often this is actually probability density rather than probability

Posterior Distribution • Remember, these are distributions of the population parameters!

brms The main output of the brm( ) calculation is the set of b1, b2, and sigma that have the highest probabilities • Often this is actually probability density rather than probability

Posterior Distributions • It is more complicated because we have three parameters with a joint posterior probability density function • Finding the peak of a multidimensional surface is not a trivial problem • brm calls an algorithm called Stan that does the computations • Involves lots of random sampling to estimate distributions • Things can go wrong sometimes, and the model may not “converge” • Sometimes takes a long time

Estimates Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 829.14 138.81 549.44 1099.10 2235 1.00 NumberDistractors 41.53 3.80 34.12 49.02 2047 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 376.46 68.52 269.61 544.69 2162 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). There were 16 warnings (use warnings() to see them) • print(summary(model1)) • Prints out a table of estimates of b1, b2, and sigma

Estimates (2nd run) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 830.07 132.98 570.36 1087.89 2534 1.00 NumberDistractors 41.46 3.63 34.41 48.93 2643 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 372.40 66.54 267.22 531.17 2269 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). There were 16 warnings (use warnings() to see them) • print(summary(model1)) • Prints out a table of estimates of b1, b2, and sigma

Posteriors • plot(model1) • Fuzzy graphs on right indicate convergence of the sampling (these look good)

Working with posteriors • post<-posterior_samples(model1) • Pulls out the sampled values for each parameter > head(post) b_Intercept b_NumberDistractors sigma lp__ 1 762.4510 39.89337 359.6752 -155.3252 2 920.2443 42.11010 371.4696 -155.2212 3 816.8137 38.72261 298.6904 -155.8784 4 820.5475 42.55215 435.3729 -155.2769 5 869.2564 38.18557 398.8995 -155.1636 6 608.1106 43.10367 352.3777 -156.7883

Working with posteriors • Plot posterior of b_NumberDistractors • density(post$b_NumberDistractors) • This is the same plot we got fromplot(model1)

Posterior • You can ask all kinds of questions about predictions and so forth by just using probability • For example, what is the posterior probability that b_NumDistractors is less than 42? > length(post$b_NumberDistractors[post$b_NumberDistractors < 42])/length(post$b_NumberDistractors) [1] 0.562963 > mean(post$b_NumberDistractors) [1] 41.45843

Posterior • You can ask all kinds of questions about predictions and so forth by just using probability • For example, what is the posterior distribution of RT_ms for 35 distractors? RT_at_35 <- post$b_Intercept + post$b_NumberDistractors * 35 dev.new() plot(density(mu_at35))

Posterior • You can ask all kinds of questions about predictions and so forth by just using probability • What is the probability that RT_ms for 35 distractors is greater than 2400? probGT2400 <-length(RT_at_35[RT_at_35 > 2400])/length(RT_at_35) cat("Probability RT_ms for 35 distractors is greater than 2400 = ", probGT2400, "\n") Probability RT_ms for 35 distractors is greater than 2400 = 0.07888889

Questions • What is the probability that predicted RT_ms is greater than 3000 for 42 distractors? • What is the probability that b_Intercept is more than 900? • What is the probability that b_NumberDistractors is less than 35?

Predictions • Automatically predict RT_ms for different values of NumberDistractors • plot(marginal_effects(model1), points=TRUE) • Gray shading indicates 95%of posterior distribution • Called “credible intervals” • This is not the same thingas a 95% confidence interval!

Compare to tradition • Is this any different than traditional linear regression? • The “traditional” bestfit line is • b_Intercept=832.94 • b_NumberDistractors=41.38 • Compare to > mean(post$b_Intercept) [1] 830.0674 > mean(post$b_NumberDistractors) [1] 41.45843

Standard linear regression • What parameter values b1, b2, minimize the prediction error? • This is necessarily just a pair of numbers b1, b2

Bayesian estimates • The mean of the posterior distributions is just one possible characterization of population values • But the posterior distribution contains a lot more information • Uncertainty about the estimates • Uncertainty about the prediction • Compare to confidence intervals

Changing priors > summary(model2) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 1143.58 103.55 947.97 1356.70 2496 1.00 NumberDistractors 30.59 0.99 28.69 32.51 2461 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 449.58 77.59 325.82 627.83 2438 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 30 • We can set a prior • model2=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) • print(summary(model2))

Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 30 • We can set a prior • model2=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) • plot(marginal_effects(model2), points = TRUE)

Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 30 • We can set a prior • model2=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(30, 1), class = "b")) ) • plot(model2)

Changing priors > print(summary(model3)) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 456.15 113.62 219.03 687.47 2383 1.00 NumberDistractors 54.35 0.99 52.41 56.23 1417 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 481.06 87.83 343.45 679.90 2112 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55 • We can set a prior • model3=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 1), class = "b")) ) • print(summary(model3))

Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55 • We can set a prior • model3=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(55, 1), class = "b")) ) • plot(marginal_effects(model3), points = TRUE)

Changing priors • Before running the model, we may have useful information about themodel parameters • For example, maybe we know that the slope parameter is usually around 55 • We can set a prior • model2=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(55, 1), class = "b")) ) • plot(model3)

Changing priors > print(summary(model4)) Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 782.72 139.06 491.94 1047.84 2171 1.00 NumberDistractors 43.11 3.63 36.15 50.50 2207 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 373.82 68.57 269.35 532.89 2235 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller • We can set a prior • model4=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) • print(summary(model4))

Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller • We can set a prior • model4=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) • plot(marginal_effects(model4), points = TRUE)

Changing priors • Before running the model, we may have useful information about the model parameters • For example, maybe we know that the slope parameter is usually around 55, but we believe it could be rather larger or smaller • We can set a prior • model4=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(50, 10), class = "b")) ) • plot(model4)

Conclusions • That wasn’t so bad • An uninformed prior is nearly the same as standard linear regression • But you get a distribution of population values rather than just point estimates • You can compute and consider probabilities of different ranges of population values • You can compute and consider probabilities of different outcomes • If you have an informed prior, you get quite different results from standard linear regression • How different depends on differences between the priors and the data • Better information leads to better inference!

Let ’ s do a Bayesian analysis