450 likes | 479 Views
Bayesian Shrinkage. Greg Francis. PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University. Shrinkage. In Lecture 5, we noted that shrinking means toward a fixed value improved model fits for new data
E N D
Bayesian Shrinkage Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University
Shrinkage • In Lecture 5, we noted that shrinking means toward a fixed value improved model fits for new data • There is an error in the equations there (the numerator should be variance instead of standard deviation) • Shrink to the average • Morris-Efron estimator
Morris-Efron shrinkage # load data file SLdata<-read.csv(file="SmilesLeniency.csv",header=TRUE,stringsAsFactors=TRUE) # Morris Efron Shrinkage # Get each sample mean Means <- aggregate(Leniency~SmileType, FUN=mean, data=SLdata) counts<- aggregate(Leniency~SmileType, FUN=length, data=SLdata) Vars <- aggregate(Leniency~SmileType, FUN=var, data=SLdata) GrandMean = sum(Means$Leniency)/4 newMeans = (1- ((length(Means$Leniency)-3))*Vars$Leniency/counts$Leniency /sum( (Means$Leniency - GrandMean)^2 )) *(Means$Leniency - GrandMean) + GrandMean par(bg="lightblue") range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) plot(Means$SmileType, Means$Leniency, main="Morris-Efron", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2) • Using the Smiles and Leniency data set (independent means ANOVA)
Bayesian ANOVA Family: gaussian Links: mu = identity; sigma = identity Formula: Leniency ~ SmileType Data: SLdata (Number of observations: 136) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 5.37 0.28 4.83 5.91 3708 1.00 SmileTypeFelt -0.46 0.40 -1.24 0.32 4173 1.00 SmileTypeMiserable -0.46 0.40 -1.22 0.29 4211 1.00 SmileTypeNeutral -1.26 0.40 -2.04 -0.50 4229 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1.64 0.10 1.46 1.86 5400 1.00 • Using the Smiles and Leniency data set (independent means ANOVA) • We used linear regression to produce the Bayesian equivalent of an ANOVA (a little hard to interpret) # Model as intercept and slopes model1 = brm(Leniency ~ SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3) print(summary(model1))
Bayesian ANOVA • Using the Smiles and Leniency data set (independent means ANOVA) • We used linear regression to produce the Bayesian equivalent of an ANOVA (a little hard to interpret) # compute means of posteriors post<-posterior_samples(model1) newMeans <- c(mean(post$b_Intercept), mean(post$b_Intercept + post$b_SmileTypeFelt), mean(post$b_Intercept + post$b_SmileTypeMiserable), mean(post$b_Intercept + post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 1", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Equivalent model • Using the Smiles and Leniency data set (independent means ANOVA) • Remove the Intercept (easier to interpret) Family: gaussian Links: mu = identity; sigma = identity Formula: Leniency ~ 0 + SmileType Data: SLdata (Number of observations: 136) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat SmileTypeFALSE 5.36 0.28 4.79 5.92 5400 1.00 SmileTypeFelt 4.91 0.28 4.36 5.48 5400 1.00 SmileTypeMiserable 4.91 0.28 4.36 5.48 5400 1.00 SmileTypeNeutral 4.11 0.28 3.57 4.66 5400 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1.64 0.10 1.45 1.86 5400 1.00 # Model without intercept (more natural) model2 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3) print(summary(model2))
Bayesian ANOVA • Using the Smiles and Leniency data set (independent means ANOVA) • Remove the Intercept (easier to interpret) # compute means of posteriors post<-posterior_samples(model2) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) plot(Means$SmileType, Means$Leniency, main="Model 2", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Bayesian Shrinkage • We get something very much like shrinkage by using a prior • Let’s set up a prior to pull values toward the grand mean Family: gaussian Links: mu = identity; sigma = identity Formula: Leniency ~ 0 + SmileType Data: SLdata (Number of observations: 136) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat SmileTypeFALSE 5.33 0.26 4.82 5.85 5400 1.00 SmileTypeFelt 4.90 0.26 4.38 5.42 5400 1.00 SmileTypeMiserable 4.90 0.27 4.37 5.43 5400 1.00 SmileTypeNeutral 4.17 0.27 3.64 4.71 5400 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 1.64 0.11 1.45 1.86 5400 1.00 # Model without intercept (more natural) and shrink to grand mean stanvars <-stanvar(GrandMean, name='GrandMean') prs <- c(prior(normal(GrandMean, 1), class = "b") ) model3 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, prior = prs, stanvars=stanvars ) print(summary(model3))
Bayesian ANOVA • Using the Smiles and Leniency data set (independent means ANOVA) • Remove the Intercept (easier to interpret) # compute means of posteriors post<-posterior_samples(model3) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 3", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Model comparison • Shrinkage should lead to better prediction • Small effect, but helpful • Shrinkage never really hurts for prediction, is that true of the effect of priors? > model_weights(model2, model3, weights="loo") model2 model3 0.4154797 0.5845203
Bayesian Shrinkage • Larger standard deviation stanvars <-stanvar(GrandMean, name='GrandMean') prs <- c(prior(normal(GrandMean, 5), class = "b") ) model4 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, prior = prs, stanvars=stanvars ) print(summary(model4)) # compute means of posteriors post<-posterior_samples(model4) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 4", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Bayesian Shrinkage • Smaller standard deviation stanvars <-stanvar(GrandMean, name='GrandMean') prs <- c(prior(normal(GrandMean, 0.1), class = "b") ) model5 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, prior = prs, stanvars=stanvars ) print(summary(model5)) # compute means of posteriors post<-posterior_samples(model5) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 5", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Model comparison • Contrary to shrinkage, the prior can hurt if it is too constraining • Makes intuitive sense • A standard deviation of 1 (model3) helps, but a standard deviation of 0.1 (model5) hurts • A standard deviation of 5 (model4) helps some > model_weights( model2, model3, model4, model5, weights="loo") model2 model3 model4 model5 0.2409125 0.3389294 0.2616917 0.1584664
Ad hoc approach • Going too big for the standard deviation is not going to hurt (but it might not help as much as it good) • Try using a standard deviation estimated from the data itself • Standard error of the mean (0.2791214) GrandSE = sqrt(mean(Vars$Leniency)/min(counts$Leniency) ) # pooled across groups (all have same sample size) stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model6 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, prior = prs, stanvars=stanvars ) print(summary(model6)) # compute means of posteriors post<-posterior_samples(model6) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 6", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Model comparison > model_weights( model2, model6, weights="loo") model2 model6 0.313577 0.686423 • Does a good job here compared to the no shrinkage model • Does a good job compared to other models • No guarantees > model_weights( model2, model3, model4, model5, model6, weights="loo") model2 model3 model4 model5 model6 0.1577314 0.2219054 0.1713360 0.1037519 0.3452754
ADHD Treatment • 24 children given different dosages of a drug • Measure scores on a Delay of Gratification task (bigger is better) • Dependent means ANOVA
Shrinkage for subjects # load data file ATdata<-read.csv(file="ADHDTreatment.csv",header=TRUE,stringsAsFactors=TRUE) # Pull out individual subjects and plot plot(ATdata$DosageNumber, ATdata$CorrectResponses) for(i in c(1:24)) { thisLabel <- paste("Subject", toString(i), sep="") thisSet<- subset(ATdata, ATdata$SubjectID == thisLabel) lines(thisSet$DosageNumber, thisSet$CorrectResponses, col=i) } • Our data has quite some variability across subjects
Shrinkage for subjects • We may care about dosage effect for individual subjects • Knowledge of other subjects should inform how we interpret scores of a given subject • Subject7 has the highest scores inthe data set • But maybe we shouldsuspect they areoverestimates • Subject4 has the lowest scores in the data set • But maybe we shouldsuspect they areunderestimates
Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + SubjectID Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 55.52 3.04 49.59 61.29 641 1.00 DosageD15 55.52 3.07 49.54 61.61 603 1.00 DosageD30 60.08 3.05 54.03 66.01 621 1.00 DosageD60 60.49 3.02 54.56 66.29 619 1.00 SubjectIDSubject10 -3.21 4.18 -11.55 4.70 870 1.00 SubjectIDSubject11 -22.85 4.17 -31.06 -14.56 998 1.00 SubjectIDSubject12 -8.65 4.14 -16.82 -0.77 774 1.00 SubjectIDSubject13 -19.39 4.16 -27.66 -11.41 989 1.00 SubjectIDSubject14 -6.32 4.30 -14.91 2.34 1051 1.00 SubjectIDSubject15 -22.21 4.21 -30.70 -14.17 840 1.01 SubjectIDSubject16 -2.95 4.20 -11.56 5.19 1204 1.00 SubjectIDSubject17 -24.35 4.11 -32.12 -16.24 1011 1.00 SubjectIDSubject18 -16.41 4.20 -24.68 -8.26 1026 1.00 SubjectIDSubject19 -22.41 4.08 -30.42 -14.46 1025 1.00 SubjectIDSubject2 -16.44 4.09 -24.88 -8.68 874 1.00 SubjectIDSubject20 -1.34 4.13 -9.31 6.94 788 1.00 SubjectIDSubject21 -23.10 4.17 -31.39 -15.23 968 1.00 SubjectIDSubject22 -24.41 4.13 -32.45 -16.45 842 1.00 SubjectIDSubject23 -20.99 4.14 -29.11 -13.15 923 1.00 SubjectIDSubject24 -21.34 4.21 -29.70 -13.05 1149 1.00 SubjectIDSubject3 -24.10 4.24 -32.53 -15.74 941 1.00 SubjectIDSubject4 -29.75 4.32 -38.23 -21.06 961 1.00 SubjectIDSubject5 -20.34 4.19 -28.86 -12.48 903 1.00 SubjectIDSubject6 -20.77 4.19 -28.92 -12.37 866 1.00 SubjectIDSubject7 9.20 4.28 1.01 17.65 980 1.00 SubjectIDSubject8 -17.42 4.13 -25.58 -9.33 996 1.00 SubjectIDSubject9 -18.95 4.23 -27.36 -10.43 906 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.07 0.52 5.15 7.16 2178 1.00 Simple model • Using fixed offset for each subject • Dosage level corresponds to Subject1 model0 = brm(CorrectResponses ~ 0 + Dosage + SubjectID, data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(summary(model0))
Subject level dev.new() # Look at effect on each subject post<-posterior_samples(model0) plot(ATdata$DosageNumber, ATdata$CorrectResponses, main="Model 0") xLabels<-c(0, 15, 30, 60) for(i in c(1:24)) { thisLabel <- paste("b_SubjectIDSubject", toString(i), sep="") if(i==1){ SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] } else{ SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] + post[,thisLabel] } newMeans <- c(mean(SubjectMeanEstimates$b_DosageD0), mean(SubjectMeanEstimates$b_DosageD15), mean(SubjectMeanEstimates$b_DosageD30), mean(SubjectMeanEstimates$b_DosageD60)) lines(xLabels, newMeans, col=i, lty=i) } • From the posterior, we can pull out the intercept of each subject, and see the effect for each subject (mean of the resulting posterior distribution)
Bayesian Shrinkage • We want to put a prior on the Subject intercepts • By default brm uses a student t distribution • Student_t parameters are (degrees of freedom, location, scale) • Think of it as (df, mean, sd) > get_prior(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata) prior class coef group resp dpar nlpar bound 1 b 2 b DosageD0 3 b DosageD15 4 b DosageD30 5 b DosageD60 6 student_t(3, 0, 10) sd 7 sd SubjectID 8 sd Intercept SubjectID 9 student_t(3, 0, 10) sigma
Standard model Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.66 1.62 7.07 13.35 1219 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 39.66 2.35 35.01 44.17 873 1.00 DosageD15 39.61 2.31 35.17 44.21 853 1.00 DosageD30 44.26 2.39 39.67 49.00 780 1.00 DosageD60 44.64 2.32 40.25 49.19 895 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.07 0.51 5.16 7.17 2080 1.00 • Using default priors for subject intercepts model1 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(summary(model1))
Impact of prior • From the posterior, we can pull out the intercept of each subject, and see the effect for each subject (mean of the resulting posterior distribution) dev.new() # Look at effect on each subject post<-posterior_samples(model1) plot(ATdata$DosageNumber, ATdata$CorrectResponses) xLabels<-c(0, 15, 30, 60) for(i in c(1:24)) { thisLabel <- paste("r_SubjectID[Subject", toString(i),",Intercept]", sep="") SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] + post[,thisLabel] newMeans <- c(mean(SubjectMeanEstimates$b_DosageD0), mean(SubjectMeanEstimates$b_DosageD15), mean(SubjectMeanEstimates$b_DosageD30), mean(SubjectMeanEstimates$b_DosageD60)) lines(xLabels, newMeans, col=i, lty=i) }
Modeling • In some sense, this is just modeling • Extreme data are interpreted as noise, and the model predicts that testing an extreme subject again will result in something more “normal”
Compare models • Favors model with shrinkage > model_weights(model0, model1, weights="loo") model0 model1 0.2519673 0.7480327 >
More shrinkage! Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 6.50 0.83 4.96 8.28 1735 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 39.73 1.88 36.07 43.55 1623 1.00 DosageD15 39.68 1.86 36.14 43.50 1621 1.00 DosageD30 44.33 1.87 40.80 47.99 1585 1.00 DosageD60 44.74 1.90 41.11 48.55 1248 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.31 0.60 5.26 7.63 1745 1.00 Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.66 1.62 7.07 13.35 1219 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 39.66 2.35 35.01 44.17 873 1.00 DosageD15 39.61 2.31 35.17 44.21 853 1.00 DosageD30 44.26 2.39 39.67 49.00 780 1.00 DosageD60 44.64 2.32 40.25 49.19 895 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.07 0.51 5.16 7.17 2080 1.00 • Adjust the t-distribution • Bigger df (not so fat tails) • Smaller scale (~standard deviation) • Not much affect on population means prs <- c(prior(student_t(30, 0, 1), class = "sd") ) model2 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs ) print(summary(model2))
Impact of prior • From the posterior, we can pull out the intercept of each subject, and see the effect for each subject (mean of the resulting posterior distribution) dev.new() # Look at effect on each subject post<-posterior_samples(model2) plot(ATdata$DosageNumber, ATdata$CorrectResponses, main="Model 2") xLabels<-c(0, 15, 30, 60) for(i in c(1:24)) { thisLabel <- paste("r_SubjectID[Subject", toString(i),",Intercept]", sep="") SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] + post[,thisLabel] newMeans <- c(mean(SubjectMeanEstimates$b_DosageD0), mean(SubjectMeanEstimates$b_DosageD15), mean(SubjectMeanEstimates$b_DosageD30), mean(SubjectMeanEstimates$b_DosageD60)) lines(xLabels, newMeans, col=i, lty=i) }
Less shrinkage! Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.95 1.73 7.13 13.82 852 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 39.71 2.41 35.06 44.45 821 1.00 DosageD15 39.63 2.47 34.75 44.58 869 1.00 DosageD30 44.29 2.46 39.52 49.13 907 1.00 DosageD60 44.74 2.46 39.70 49.55 868 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.07 0.53 5.15 7.22 2110 1.00 Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.66 1.62 7.07 13.35 1219 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 39.66 2.35 35.01 44.17 873 1.00 DosageD15 39.61 2.31 35.17 44.21 853 1.00 DosageD30 44.26 2.39 39.67 49.00 780 1.00 DosageD60 44.64 2.32 40.25 49.19 895 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.07 0.51 5.16 7.17 2080 1.00 • Adjust the t-distribution • Standard df • Bigger scale (~standard deviation) • Not much affect on population means prs <- c(prior(student_t(3, 0, 20), class = "sd") ) model3 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs ) print(summary(model3))
Impact of prior • From the posterior, we can pull out the intercept of each subject, and see the effect for each subject (mean of the resulting posterior distribution) • Not much difference dev.new() # Look at effect on each subject post<-posterior_samples(model3) plot(ATdata$DosageNumber, ATdata$CorrectResponses, main="Model 3") xLabels<-c(0, 15, 30, 60) for(i in c(1:24)) { thisLabel <- paste("r_SubjectID[Subject", toString(i),",Intercept]", sep="") SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] + post[,thisLabel] newMeans <- c(mean(SubjectMeanEstimates$b_DosageD0), mean(SubjectMeanEstimates$b_DosageD15), mean(SubjectMeanEstimates$b_DosageD30), mean(SubjectMeanEstimates$b_DosageD60)) lines(xLabels, newMeans, col=i, lty=i) }
Compare models • Favors model with default shrinkage • Bigger scale is almost as good • I was never able to beat the default prior > model_weights(model0, model1, model2, model3, weights="loo") model0 model1 model2 model3 0.14602043 0.43350085 0.04518049 0.37529822 >
Morris-Efron shrinkage ATdata<-read.csv(file="ADHDTreatment.csv",header=TRUE,stringsAsFactors=TRUE) # Morris Efron Shrinkage # Get each sample mean Means <- aggregate(CorrectResponses~Dosage, FUN=mean, data=ATdata) counts<- aggregate(CorrectResponses~Dosage, FUN=length, data=ATdata) Vars <- aggregate(CorrectResponses~Dosage, FUN=var, data=ATdata) GrandMean = sum(Means$CorrectResponses)/4 newMeans = (1- ((4-3))*Vars$CorrectResponses/counts$CorrectResponses /sum( (Means$CorrectResponses - GrandMean)^2 )) *(Means$CorrectResponses - GrandMean) + GrandMean par(bg="lightblue") range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) plot(Means$Dosage, Means$CorrectResponses, main="Morris-Efron", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2) • No need to ignore populaton means
Bayesian model Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.74 1.68 7.04 13.76 893 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 39.81 2.42 34.94 44.55 850 1.00 DosageD15 39.71 2.38 35.12 44.19 941 1.00 DosageD30 44.36 2.40 39.46 48.89 916 1.00 DosageD60 44.69 2.42 39.85 49.46 902 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.10 0.53 5.17 7.19 2202 1.00 • Population level effects # This model has different means for different dosages and an (random) intercept for each subject model4 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(summary(model4))
Bayesian ANOVA • Plot the estimated means # compute means of posteriors post<-posterior_samples(model4) newMeans <- c(mean(post$b_DosageD0), mean(post$b_DosageD15), mean(post$b_DosageD30), mean(post$b_DosageD60)) range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) plot(Means$Dosage, Means$CorrectResponses, main="Model 4", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Shrink it all! • The Smiles and Leniency data set has multiple groups • SmileType • Subject • Both can be shrunk by imposing a prior for each one • We define a shrinkage prior using the data Means <- aggregate(CorrectResponses ~ Dosage, FUN=mean, data=ATdata) counts<- aggregate(CorrectResponses ~ Dosage, FUN=length, data= ATdata) Vars <- aggregate(CorrectResponses ~ Dosage, FUN=var, data= ATdata) GrandMean = sum(Means$CorrectResponses)/length(Means$CorrectResponses) GrandSE = sqrt(mean(Vars$CorrectResponses)/min(counts$CorrectResponses) ) # pooled across groups (all have same sample size) stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") )
Bayesian Shrinkage Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.53 1.56 6.91 13.04 1069 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 40.14 1.50 37.15 43.08 1732 1.00 DosageD15 40.07 1.51 37.11 43.08 1720 1.00 DosageD30 43.96 1.52 40.96 46.94 1836 1.00 DosageD60 44.31 1.50 41.32 47.33 1690 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.10 0.52 5.19 7.22 1951 1.00 • We get something very much like shrinkage by using a prior • Let’s set up a prior to pull values toward the grand mean • Using the standard deviation of the sample means (2.782921) GrandSE = sd(Means$CorrectResponses) stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model5 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model5))
Bayesian ANOVA • Effect of prior # compute means of posteriors post<-posterior_samples(model5) newMeans <- c(mean(post$b_DosageD0), mean(post$b_DosageD15), mean(post$b_DosageD30), mean(post$b_DosageD60)) range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) dev.new() plot(Means$Dosage, Means$CorrectResponses, main="Model 5", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Model comparison • Shrinkage should lead to better prediction • Small effect, but helpful • Other standard deviations? print(model_weights(model4, model5, weights="loo") ) model4 model5 0.4396908 0.5603092
Bayesian Shrinkage Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.62 1.56 7.03 13.17 1153 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 39.74 2.22 35.32 44.18 1157 1.00 DosageD15 39.67 2.17 35.36 43.94 1218 1.00 DosageD30 44.28 2.21 39.95 48.66 1248 1.00 DosageD60 44.63 2.18 40.42 48.83 1161 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.07 0.53 5.14 7.16 2302 1.00 • We get something very much like shrinkage by using a prior • Let’s set up a prior to pull values toward the grand mean • Use a large standard deviation GrandSE = 10 stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model6 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model6))
Bayesian ANOVA • Effect of a prior with a large SD # compute means of posteriors post<-posterior_samples(model6) newMeans <- c(mean(post$b_DosageD0), mean(post$b_DosageD15), mean(post$b_DosageD30), mean(post$b_DosageD60)) range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) dev.new() plot(Means$Dosage, Means$CorrectResponses, main="Model 6", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Bayesian Shrinkage Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.34 1.51 6.82 12.67 1204 1.00 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD0 42.10 0.10 41.91 42.30 2422 1.00 DosageD15 42.10 0.10 41.91 42.29 2406 1.00 DosageD30 42.13 0.10 41.94 42.31 2105 1.00 DosageD60 42.13 0.10 41.93 42.32 2383 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.58 0.57 5.61 7.80 2344 1.00 • We get something very much like shrinkage by using a prior • Let’s set up a prior to pull values toward the grand mean • Small standard deviation GrandSE = 0.1 stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model7 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model7))
Bayesian ANOVA • Effect of a prior with a large SD # compute means of posteriors post<-posterior_samples(model7) newMeans <- c(mean(post$b_DosageD0), mean(post$b_DosageD15), mean(post$b_DosageD30), mean(post$b_DosageD60)) range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) dev.new() plot(Means$Dosage, Means$CorrectResponses, main="Model 7", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Bayesian Shrinkage Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + (1 | Dosage) + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~Dosage (Number of levels: 4) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 38.08 12.29 21.46 70.03 568 1.00 ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.72 1.68 7.04 13.58 866 1.00 Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sigma 6.11 0.55 5.18 7.31 1773 1.00 • We can treat condition similar to subjects • Use default brm prior model8 = brm(CorrectResponses ~ 0 +(1 | Dosage) + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(summary(model8))
Bayesian ANOVA • Effect of default prior: seems to pull to zero • Need to reconfigure the model # compute means of posteriors post<-posterior_samples(model8) newMeans <- c() xLabels<-c(0, 15, 30, 60) for(i in c(1:length(xLabels)) ){ thisLabel <- paste("r_Dosage[D", toString(xLabels[i]),",Intercept]", sep="") DosageMeanEstimates <- post[,thisLabel] newMeans <- c(newMeans, mean(DosageMeanEstimates)) } range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) dev.new() plot(Means$Dosage, Means$CorrectResponses, main="Model 8", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
Model comparison • Shrinkage should lead to better prediction • It does, unless the prior is too constraining (model7) • Or the prior pulls to something far from actual values (model8) • It tends to be a small effect • Bigger effects for more means • As we saw in Lecture 5 > print(model_weights(model4, model5, model6, model7, model8, weights="loo") ) model4 model5 model6 model7 model8 0.214826463 0.273758885 0.298573221 0.001356395 0.211485037
Conclusions • Shrinkage • Side effect of prior across a group • Helps prediction • Need “appropriate” prior • Not too tight • For subject-level effects, defaults in brm seem pretty good for cases I have tested