1 / 31

Let ’ s continue to do a Bayesian analysis

Let ’ s continue to do a Bayesian analysis. Greg Francis. PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University. Visual Search. A classic experiment in perception/attention involves visual search

jjurado
Download Presentation

Let ’ s continue to do a Bayesian analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Let’s continue to do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

  2. Visual Search • A classic experiment in perception/attention involves visual search • Respond as quickly as possible whether an image contains a target (a green circle) or not • Vary number of distractors: 4, 16, 32, 64 • Vary type of distractors: feature (different color), conjunctive (different color or shape)

  3. Visual Search • Typical results: For conjunctive distractors, response time increases with the number of distractors

  4. Linear model • Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation • Let’s do it for a single participant • We are basically going through Section 4.4 of the text, but using a new data set and using brms instead of rethinking • Download files from the class web site and follow along in class

  5. Define the model • library(brms) • This loads up the R library we want to use • It has some nice functions for plotting and displaying tables • model1=brm(RT_ms ~ NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3, thin = 2 ) • Defines the linear model • Assigns default prior distributions to each parameter (b1, b2, sigma) • “Improper” uniform priors

  6. Output Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ NumberDistractors Data: VSdata2 (Number of observations: 20) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 840.42 138.88 574.93 1121.98 2115 1.00 NumberDistractors 41.19 3.81 33.46 48.79 1836 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 374.90 69.17 273.24 543.65 2320 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • print(summary(model1)) • Prints out a table of estimates of b1, b2, and sigma

  7. HDPI • A useful quantitative characterization of the posterior distribution for a parameter is the Highest Density Posterior Interval (HPDI) > library(broom) > library(coda) > > tidyMCMC(model1$fit, conf.int = TRUE, conf.method = "HPDinterval") term estimate std.error conf.low conf.high 1 b_Intercept 840.4235 138.87583 571.21624 1114.73219 2 b_NumberDistractors 41.1892 3.80958 33.64076 48.94022 3 sigma 374.9036 69.17027 259.08606 514.65371

  8. HPDI vs CI > library(broom) > library(coda) > > tidyMCMC(model1$fit, conf.int = TRUE, conf.method = "HPDinterval") term estimate std.error conf.low conf.high 1 b_Intercept 840.4235 138.87583 571.21624 1114.73219 2 b_NumberDistractors 41.1892 3.80958 33.6407648.94022 3 sigma 374.9036 69.17027 259.08606 514.65371 > fit <- lm(RT_ms ~ NumberDistractors, data = VSdata2) > fit Call: lm(formula = RT_ms ~ NumberDistractors, data = VSdata2) Coefficients: (Intercept) NumberDistractors 832.94 41.38 > confint(fit) 2.5 % 97.5 % (Intercept) 565.75775 1100.13180 NumberDistractors 34.10564 48.66024

  9. HPDI vs CI • Pretty similar, so why bother? Different interpretations: • HPDI is the smallest contiguous set of values of b_NumberDistractors that have a 95% probability • If the model is valid, the priors are appropriate, and so forth • It is a statement about parameter values • CI is the smallest set of values that results from a process that 95% of the time includes the true value of b_NumberDistractors • If the model is valid and so forth • It is a statement about constructing the confidence interval

  10. Significance? • It might be tempting to use HDPI in a way similar to confidence intervals are used • If the HDPI does not include 0, then we have a “significant” result • There are better approaches, which we will develop later • For the moment, we just want to describe the posterior distributions from our Bayesian analysis

  11. Visual Search • Typical results: For conjunctive distractors, response time increases with the number of distractors

  12. Define the model • # Pull out just the trials for the first participant's conjunctive condition (includes Target present and Target absent) • VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") • # By default brms provides priors that are adapted to fit within the range of your data. Generally, this produces results similar to standard linear regression. Later, we will introduce informative priors • model2 <- brm(RT_ms ~ Target:NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3) • Basically defines the model to treat Target:NumberDistractors as an interaction • # print out summary of model • print(summary(model2))

  13. Output Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ Target:NumberDistractors Data: VSdata2 (Number of observations: 40) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 836.53 147.75 545.74 1131.55 4014 1.00 TargetAbsent:NumberDistractors 41.19 4.73 32.07 50.67 3540 1.00 TargetPresent:NumberDistractors 23.81 4.74 14.64 32.98 3857 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 578.31 70.40 460.32 737.97 4301 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • print(summary(model2)) • Prints out a table of estimates of b1, b2 (2 of them), and sigma

  14. Visual Search • Looking at posteriors

  15. Visual Search • Compare fit to data • > plot(marginal_effects(model2), points = TRUE)

  16. Varying intercepts • Our model so far has one (common) intercept and varying slopes (one for each search condition: target present/absent) • We can have different intercepts (several ways of doing this) • model3 <- brm(RT_ms ~ Target*NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3) • Essentially the same as • model3 <- brm(RT_ms ~ Target + NumberDistractors + Target:NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3)

  17. Output Family: gaussian Links: mu = identity; sigma = identity Formula: RT_ms ~ Target * NumberDistractors Data: VSdata2 (Number of observations: 40) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 828.33 213.76 390.80 1241.50 2609 1.00 TargetPresent 10.55 300.03 -570.01 592.91 2134 1.00 NumberDistractors 41.32 5.82 29.74 52.90 2506 1.00 TargetPresent:NumberDistractors -17.52 8.18 -33.62 -1.47 2018 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 587.89 70.96 468.71 741.47 4200 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • print(summary(model3)) • Prints out a table of estimates of b1, b2 (3 of them), and sigma

  18. Visual Search • Looking at posteriors

  19. Visual Search • Compare fit to data • Has to be run from the command line (hit return for next plot) • > plot(marginal_effects(model3), points = TRUE) • Now gives three plots • Main effects • Interaction (full set) • Not much different from single Intercept model • Data specific

  20. Different subject • VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") • model2 <- brm(RT_ms ~ Target:NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3)

  21. Different subject • VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") • model3 <- brm(RT_ms ~ Target*NumberDistractors, data = VSdata2, iter = 2000, warmup = 200, chains = 3)

  22. Computed from 5400 by 40 log-likelihood matrix Estimate SE elpd_loo -313.4 6.3 p_loo 4.5 1.6 looic 626.8 12.6 ------ Monte Carlo SE of elpd_loo is 0.1. Pareto k diagnostic values: Count Pct. Min. n_eff (-Inf, 0.5] (good) 39 97.5% 2523 (0.5, 0.7] (ok) 1 2.5% 253 (0.7, 1] (bad) 0 0.0% <NA> (1, Inf) (very bad) 0 0.0% <NA> All Pareto k estimates are ok (k < 0.7). See help('pareto-k-diagnostic') for details. Model comparison Computed from 5400 by 40 log-likelihood matrix Estimate SE elpd_loo -311.4 5.1 p_loo 5.2 1.5 looic 622.8 10.2 ------ Monte Carlo SE of elpd_loo is 0.1. Pareto k diagnostic values: Count Pct. Min. n_eff (-Inf, 0.5] (good) 39 97.5% 2308 (0.5, 0.7] (ok) 1 2.5% 297 (0.7, 1] (bad) 0 0.0% <NA> (1, Inf) (very bad) 0 0.0% <NA> All Pareto k estimates are ok (k < 0.7). See help('pareto-k-diagnostic') for details. • We might want to see whether a model with a fixed intercept better fits the data than a model that uses a different intercept for different conditions • There are several ways to do it (we discuss them later) • As a taste, brms provides LOO function (Leave-One-Out cross validation) • LOO(model2) • LOO(model3) • compare_ic(LOO(model2), LOO(model3)) • Smaller LOO is better LOOIC SE model2 626.75 12.63 model3 622.81 10.19 model2 - model3 3.95 4.28

  23. Other distributions • We have assumed the data is subject to noise from a normal distribution • Our data measures reaction times, which is often skewed to longer RTs • One alternative distribution is the exgaussian • Three parameters • > model4 <- brm(RT_ms ~ Target*NumberDistractors, family = exgaussian(link = "identity"), data = VSdata2, iter = 2000, warmup = 200, chains = 3)

  24. Output Family: exgaussian Links: mu = identity; sigma = identity; beta = identity Formula: RT_ms ~ Target * NumberDistractors Data: VSdata2 (Number of observations: 40) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept -608.51 427.86 -1534.21 145.96 5 1.66 TargetPresent 6.30 17.36 -16.47 47.56 5 2.89 NumberDistractors 24.05 16.25 -5.15 54.36 4 1.92 TargetPresent:NumberDistractors -4.85 8.40 -26.84 13.55 14 1.14 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1808.66 204.00 1468.43 2257.61 36 1.06 beta 56.64 10.81 41.72 82.43 380 1.01 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). Warning messages: 1: The model has not converged (some Rhats are > 1.1). Do not analyse the results! We recommend running more iterations and/or setting stronger priors. 2: There were 5338 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup • print(summary(model4)) • Prints out a table of estimates of b1, b2 (3 of them), and sigma

  25. Visual Search • Looking at posteriors • Bad chains!

  26. Visual Search • Compare fit to data • Has to be run from the command line (hit return for next plot) • > plot(marginal_effects(model4), points = TRUE)

  27. Other distributions • Increase number of iterations • > model4 <- brm(RT_ms ~ Target*NumberDistractors, family = exgaussian(link = "identity"), data = VSdata2, iter = 8000, warmup = 2000, chains = 3)

  28. Output Family: exgaussian Links: mu = identity; sigma = identity; beta = identity Formula: RT_ms ~ Target * NumberDistractors Data: VSdata2 (Number of observations: 40) Samples: 3 chains, each with iter = 8000; warmup = 2000; thin = 1; total post-warmup samples = 18000 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 829.50 208.98 421.10 1237.09 2263 1.00 TargetPresent 11.57 298.93 -582.62 585.91 1679 1.00 NumberDistractors 41.34 5.73 30.04 52.40 2073 1.00 TargetPresent:NumberDistractors -17.61 8.18 -33.91 -1.70 1622 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 575.28 66.72 462.26 723.10 4657 1.00 beta 25.66 10.08 14.87 52.50 2830 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). Warning message: There were 12903 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup • print(summary(model4)) • Prints out a table of estimates of b1, b2 (3 of them), and sigma

  29. Visual Search • Looking at posteriors • Better chains!

  30. Visual Search • Compare fit to data • Has to be run from the command line (hit return for next plot) • > plot(marginal_effects(model4), points = TRUE)

  31. Conclusions • Many types of models • Can compare models (more later) • HPDI vs. CI • Model convergence

More Related