1 / 25

Bayesian style t-test

Explore the comparison between traditional t-test and Bayesian statistics in estimating the time physicians spend with patients based on weight data. Learn how to implement and interpret results using both approaches.

thomassims
Download Presentation

Bayesian style t-test

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian style t-test Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

  2. Physicians and patient weight • Hebl & Xu (2001): sent medical charts to physicians. Patient complained of migraine headaches. Chart indicated whether patient was average_weight or overweight • The question is how much time doctors estimate they would spend with the patient • Each physician contributes one score, between subjects design

  3. Standard analysis • # load data file • PWdata<-read.csv(file="PhysiciansWeight.csv",header=TRUE,stringsAsFactors=TRUE) • # Compare to traditional t-test • traditional <- t.test(Time ~ Weight, data=PWdata, var.equal=TRUE) • print(traditional) Two Sample t-test data: Time by Weight t = 2.856, df = 69, p-value = 0.005663 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.997955 11.255633 sample estimates: mean in group average_weight mean in group overweight 31.36364 24.73684

  4. Standard (regression) analysis • # Compare to traditional linear regression • print(summary(lm(Time~Weight, data=PWdata))) • Prints out a table of estimates of Intercept (average_weight) and deviation for overweight (difference of means) Call: lm(formula = Time ~ Weight, data = PWdata) Residuals: Min 1Q Median 3Q Max -19.737 -5.550 -1.364 5.263 35.263 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 31.364 1.697 18.477 < 2e-16 *** Weightoverweight -6.627 2.320 -2.856 0.00566 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1 Residual standard error: 9.751 on 69 degrees of freedom Multiple R-squared: 0.1057, Adjusted R-squared: 0.09276 F-statistic: 8.157 on 1 and 69 DF, p-value: 0.005663

  5. Bayesian variation of t-test • We start with default priors (uniform, improper) • Follow along by downloading “PhysiciansWeight.csv” and “PhysiciansWeight1.R” from the class web site # load data file PWdata<-read.csv(file="PhysiciansWeight.csv",header=TRUE,stringsAsFactors=TRUE) # load the brms library library(brms) model1 = brm(Time ~ Weight, data = PWdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) # print out summary of model print(summary(model1))

  6. Output Family: gaussian Links: mu = identity; sigma = identity Formula: Time ~ Weight Data: PWdata (Number of observations: 71) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 31.41 1.77 27.78 34.74 2342 1.00 Weightoverweight -6.66 2.37 -11.40 -2.03 2476 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 9.89 0.87 8.38 11.70 2559 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). • Prints out a table of estimates of Intercept (average_weight) and deviation for overweight (difference of means)

  7. Output • # Check on model convergence and posteriors • plot(model1) • Looks good!

  8. Output • # Plot model fit to data • dev.new() • plot(marginal_effects(model1), points = TRUE)

  9. HDPI Two Sample t-test data: Time by Weight t = 2.856, df = 69, p-value = 0.005663 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.997955 11.255633 sample estimates: mean in group average_weight mean in group overweight 31.36364 24.73684 • You might be interested in the Highest Density Posterior Interval (HPDI) for the differences of means # Compute HDPIs library(broom) library(coda) print( tidyMCMC(model1$fit, conf.int = TRUE, conf.method = "HPDinterval") ) term estimate std.error conf.low conf.high 1 b_Intercept 31.408944 1.7658964 27.759071 34.703057 2 b_Weightoverweight -6.662331 2.3698441 -11.032153 -1.768079 3 sigma 9.886702 0.8658198 8.420497 11.715446

  10. t-test versus Bayesian • Note, we get essentially the same pattern of results either way • It looks like there is a difference in the mean times dependent on the patient weight

  11. Posteriors • You can compute things that are not possible with the standard t-test • Probability that the mean time for overweight patients is longer than for average_weight patients post<-posterior_samples(model1) dev.new() plot(density(post$b_Weightoverweight)) # check probability that difference is less than zero cat("Probability that group mean difference is greater than zero is: ", sum(post$b_Weightoverweight > 0) / length(post$b_Weightoverweight)) >Probability that group mean difference is greater than zero is: 0.002222222

  12. Comparing models Compare to sample statistics > mean(PWdata$Time) [1] 27.8169 > sd(PWdata$Time) [1] 10.23762 • What happens for a model with just one mean? (One intercept) • model2 = brm(Time ~ 1, data = PWdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(model2) Family: gaussian Links: mu = identity; sigma = identity Formula: Time ~ 1 Data: PWdata (Number of observations: 71) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 27.88 1.24 25.38 30.29 2414 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 10.39 0.91 8.75 12.28 2222 1.00

  13. Comparing models • Does a model with two means (model1) do better or worse at LOO than a model with one mean (model2)? • In terms of predicting data, the two-mean model does better than the one-mean model > LOO(model1, model2) LOOIC SE model1 529.91 16.43 model2 535.41 14.01 model1 – model2 -5.50 6.24

  14. Welch’s t-test • We have unequal sample sizes in our dataset, so we might want to use Welch’s test in case we have unequal standard deviations (s=9.86 for average_weight and s=9.65 for overweight) > traditional <- t.test(Time ~ Weight, data=PWdata, var.equal=FALSE) > print(traditional) Welch Two Sample t-test data: Time by Weight t = 2.8516, df = 67.174, p-value = 0.005774 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 1.988532 11.265056 sample estimates: mean in group average_weight mean in group overweight 31.36364 24.73684

  15. Baysian Welch’s t-test • We modify the model set up so that it treats sigma as differing across groups • # Define a formula that includes variations in sigma by group • uneq_var_frm <- brmsformula(Time ~ Weight, sigma ~ Weight) • model3 = brm(uneq_var_frm, data = PWdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) • # print out summary of model • print(summary(model3))

  16. Baysian Welch’s t-test • Output splits up intercept and slope for sigma as well as for means • Careful of the link! Family: gaussian Links: mu = identity; sigma = log Formula: Time ~ Weight sigma ~ Weight Data: PWdata (Number of observations: 71) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 31.41 1.80 27.94 34.93 2344 1.00 sigma_Intercept 2.30 0.12 2.08 2.56 2421 1.00 Weightoverweight -6.60 2.44 -11.44 -1.76 2187 1.00 sigma_Weightoverweight -0.03 0.17 -0.36 0.30 2442 1.00 exp(2.30)=9.974 exp(2.30-0.03)= 9.679

  17. Baysian Welch’s t-test • Posteriors and chains look fine

  18. Baysian Welch’s t-test • Does a model with two means and two standard deviations do better than a model with two means and one standard deviation? (No) > print(LOO(model1,model3)) LOOIC SE model1 529.91 16.43 model3 533.14 18.25 model1 - model3 -3.23 2.03 Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 3: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 4: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 5: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 6: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 7: Found 1 observations with a pareto_k > 0.7 in model 'model3'. It is recommended to set 'reloo = TRUE' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 1 times to compute the ELPDs for the problematic observations directly.

  19. Baysian Welch’s t-test • Does a model with two means and two standard deviations do better than a model with two means and one standard deviation? (No) > print(LOO(model1, model3, reloo=TRUE)) No problematic observations found. Returning the original 'loo' object. 1 problematic observation(s) found. The model will be refit 1 times. Fitting model 1 out of 1 (leaving out observation 59) Start sampling …… LOOIC SE model1 529.91 16.43 model3 533.11 18.22 model1 - model3 -3.20 2.00 There were 23 warnings (use warnings() to see them)

  20. Baysian Welch’s t-test • Which kind of model does best?: • One mean, one standard deviation (model1) • Two means, one standard deviation (model2) • Two means, two standard deviations (model3) > print(LOO(model1,model2, model3)) LOOIC SE model1 529.91 16.43 model2 533.05 18.10 model3 533.14 18.25 model1 - model2 -3.14 1.89 model1 - model3 -3.23 2.03 model2 - model3 -0.09 0.19

  21. Finish up these models • Define and fit model4 that uses • One mean, two standard deviations • Compare all the models with > print(LOO(model1,model2, model3, model4))

  22. Non-default priors • model5 = brm(Time ~ Weight, data = PWdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c(prior(normal(0, 10), class = "Intercept"), prior(normal(0, 10), class = "b")) ) • print(summary(model5)) Family: gaussian Links: mu = identity; sigma = identity Formula: Time ~ Weight Data: PWdata (Number of observations: 71) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 30.83 1.69 27.53 34.13 2342 1.00 Weightoverweight -6.35 2.29 -10.84 -1.98 2340 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 9.86 0.85 8.36 11.76 2433 1.00

  23. Kind of crazy prior • model6 = brm(Time ~ Weight, data = PWdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c(prior(normal(10, 1), class = "Intercept"), prior(normal(0, 10), class = "b")) ) • print(summary(model6)) Family: gaussian Links: mu = identity; sigma = identity Formula: Time ~ Weight Data: PWdata (Number of observations: 71) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 16.37 2.41 11.56 21.06 2358 1.00 Weightoverweight -5.60 3.92 -13.04 2.13 2511 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 17.60 1.74 14.40 21.34 2536 1.00

  24. Reasonable priors • Let’s discuss what might be reasonable priors here • Then define corresponding models

  25. Conclusions • Bayesian t-tests • Welch’s test • Model comparison • Setting priors

More Related