1 / 36

Bayesian Model Checking and Parameter Tests

Learn strategies for model fit checks, parameter testing, and simulation-based methods. Explore examples like rating professors and risky behaviors. Discover the pros and cons of each approach.

epartain
Download Presentation

Bayesian Model Checking and Parameter Tests

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 36-463/663: Multilevel & Hierarchical Models Bayesian Model Checking & Parameter Tests Brian Junker 132E Baker Hall brian@stat.cmu.edu TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

  2. Outline • Model Checking • Two strategies for checking model fit • DIC • Simulation checks/fake data/posterior predictive checks • Example: Rating professors • Parameter Tests • Testing a difference by simulation • Example: Risky Behaviors

  3. Model Checking

  4. Two main strategies for checking model fit • Overall index of fit based on the deviance • AIC = D(M) + 2 k • BIC = D(M) + k log(n) • DIC = meansim(D(M)) + 2 kD • Simulation-based checks • “Fake data” methods • Posterior predictive checks

  5. Overall penalized-deviance indices • Pro’s: • Quick-and-dirty (a drop of 2 is marginally interesting, a drop of 3 is interesting, a drop of 10 is a big deal) • Tells you whether one model “fits better” overall than another • Con’s: • Doesn’t tell whether the “best” model really fits • If a model doesn’t fit, doesn’t tell you what went wrong

  6. Simulation-based checks • Basic idea: • Simulate data from the fitted model • Does the original data look like it could have come from this simulation? • Implementations • Eyeball a graph • Make a hypothesis test. Use the simulated “fake data” as the “null hypothesis” distribution • If the model fits, you will not reject the null when you test the “real data”

  7. Simulation-based checks • Con’s • The method is not automatic • You have to choose a “feature” of the data or model that you want to check • You have to program the simulation • Pro’s • You can “see” what went wrong with the model, and sometimes fix it • IDEALLY, DIC [or AIC or BIC] and simulation checks work together • Simulation checks illustrate what made DIC go down

  8. Example – Instructor Ratings • 463 “rate the professor” evaluations on 94 professors in 31 classroom settings [data in week12 area of website] • Several other covariates on the professors • tenured? female? minority? age … etc. • Several other covariates on the classes • number of students, number who filled out evaluation, “rate the course” evaluations … etc. • Beauty rating of each instructor (average of 6 student ratings) • Does beauty predict professor evaluations, after controlling for other covariates?

  9. > display(lm.1) lm(formula = eval ~ beauty) coef.est coef.se (Intercept) 4.01 0.03 beauty 0.13 0.03 --- n = 463, k = 2 residual sd = 0.55, R-Squared = 0.04 rube.lm.1 <- "model { for (i in 1:n) { eval[i] ~ dnorm(mu[i],sig2inv) mu[i] <- b0 + b1*beauty[i] } b0 ~ dnorm(0,.000001) b1 ~ dnorm(0,.000001) sig2inv <- pow(sig,-2) sig ~ dunif(0,100) }" (rube.lm.1.fit <- rube(rube.lm.1,data.list,rube.lm.1.inits, parameters.to.save = c("b0", "b1", "sig"), n.chains=3)) mean sd b0 4.010 0.0252 b1 0.133 0.0315 sig 0.547 0.0181 deviance 753.594 2.4783 Basic model: eval ~ beauty

  10. rube.lm.1.new <- "model { for (i in 1:n) { eval[i] ~ dnorm(mu[i],sig2inv) mu[i] <- b0 + b1*beauty[i] } b0 ~ dnorm(0,.000001) b1 ~ dnorm(0,.000001) sig2inv <- pow(sig,-2) sig ~ dunif(0,1) for (i in 1:n) { neweval[i] ~ dnorm(mu[i],sig2inv) } }" attach(rube.lm.1.fit$sims.list) b0.hat <- mean(b0) b1.hat <- mean(b1) sig.hat <- mean(sig) detach() rube.lm.1.new.inits <- function(){ list (b1=b1.hat, b0=b0.hat, sig=sig.hat) } rube.lm.1.new.fit <- rube(rube.lm.1.new, data.list,rube.lm.1.new.inits, parameters.to.save = c("neweval"), n.iter=400, n.chains=1) neweval <- rube.lm.1.new.fit$sims.list$neweval dim(neweval) # [1] 200 463 Making a sample for simulation checks… Starting at last best estimates This simulates new evaluations

  11. We think “female” may matter so look at, e.g., residuals based on “female” original data replication 1 Are there too many/ too few neg residuals? replication 2 replication 3 x = male o = female

  12. Check the proportion of negative residuals overall (eval.resid.neg <- sum(eval < b0.hat + b1.hat*beauty)/n) # [1] 0.4492441 neweval.resid.neg <- apply(neweval,1, function(x) sum(x < b0.hat + b1.hat*beauty)/n) par(mfrow=c(1,1)) hist(neweval.resid.neg) abline(v=eval.resid.neg, col="Red") sum(neweval.resid.neg < eval.resid.neg)/n.sims # [1] 0.02

  13. Check proportion of negative residuals for females f.eval <- eval[female==1] f.beauty <- beauty[female==1] f.n <- sum(female==1) f.neweval <- neweval[,female==1] (f.eval.resid.neg <- sum(f.eval < b0.hat + b1.hat*f.beauty)/f.n) # [1] 0.5282051 f.neweval.resid.neg <- apply(f.neweval,1,function(x) sum(x < b0.hat + b1.hat*f.beauty)/f.n) par(mfrow=c(1,1)) hist(f.neweval.resid.neg) abline(v=f.eval.resid.neg,col="Red") sum(f.neweval.resid.neg < f.eval.resid.neg)/n.sims # [1] 0.69

  14. Check proportion of negative residuals for males m.eval <- eval[female==0] m.beauty <- beauty[female==0] m.n <- sum(female==0) m.neweval <- neweval[,female==0] m.neweval <- neweval[,female==0] (m.eval.resid.neg <- sum(m.eval < b0.hat + b1.hat*m.beauty)/m.n) # [1] 0.3917910 m.neweval.resid.neg <- apply(m.neweval,1,function(x) sum(x < b0.hat + b1.hat*m.beauty)/m.n) par(mfrow=c(1,1)) hist(m.neweval.resid.neg) abline(v=m.eval.resid.neg,col="Red") sum(m.neweval.resid.neg<m.eval.resid.neg)/n.sims # [1] 0

  15. Conclusions so far… • It is worth putting “female” in the model… • In the R notes for this class I also checked to see if “course.id” should be in the model, again using simulation checks. • “course.id” should also be in the model. • For variable selection in lm(), stepAIC is faster! • But this illustrates the idea…

  16. Other models… • I tried several other models using just these variables… • summary(lm.2 <- lm(eval ~ beauty + female)) • summary(lm.8 <- lm(eval ~ beauty + as.factor(course.id))) • summary(lm.9 <- lm(eval ~ beauty + female + as.factor(course.id))) • summary(lm.10 <- lm(eval ~ beauty*female + as.factor(course.id))) • lm.10 seemed to work the best…

  17. summary(lm.10)… Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.13988 0.03799 108.966 < 2e-16 *** beauty 0.21823 0.04595 4.749 2.79e-06 *** female -0.22802 0.05259 -4.336 1.81e-05 *** beauty:female -0.13716 0.06480 -2.117 0.034873 * factor(course.id)1 0.47509 0.23538 2.018 0.044171 * factor(course.id)2 0.51671 0.36603 1.412 0.158774 factor(course.id)3 -0.13889 0.18437 -0.753 0.451685 factor(course.id)4 -0.24897 0.12202 -2.040 0.041922 * factor(course.id)5 -0.02639 0.25883 -0.102 0.918834 … • Some course.id coefficients are significant, most are not • Should we be treating course.id as a random effect instead?

  18. To check whether course.id should be a random effect… • Fit lm.10 in WinBUGS • Re-run WinBUGS to simulate new response data “neweval” • Look at the spread in the simulated data, vs the spread of the original data “eval”, especially between groups defined by “course.id” • Details in R handout • Example picture on next slide…

  19. Examining Between-Groups Spread with Boxplots… Replication 2 Replication 1 red = eval green = neweval Replication 4 Replication 3

  20. General conclusions… • The red boxplots (eval) seem less spread out than the green boxplots (replications of neweval) • It is probably worth it to fit course.id as a random effect • A more formal test might look at the between-group variance (variance of the group means) for the different course.id’s.

  21. rube.lmer.10 <- "model { for (i in 1:n) { eval[i] ~ dnorm(mu[i],sig2inv) mu[i] <- a0[course.id[i]+1] + b1*beauty[i] + b2*female[i] + b3*beauty[i]*female[i] } for (j in 1:J) { a0[j] ~ dnorm(b0,tau2inv) } b0 ~ dnorm(0,.0001) b1 ~ dnorm(0,.0001) b2 ~ dnorm(0,.0001) b3 ~ dnorm(0,.0001) tau2inv <- pow(tau,-2) tau ~ dunif(0,100) sig2inv <- pow(sig,-2) sig ~ dunif(0,100) }" data.list <- list(eval=eval,beauty=beauty,female=female, course.id=course.id,n=length(beauty), J=length(unique(course.id))) rube.lmer.10.inits <- function(){ list (b0=rnorm(1),b1=rnorm(1),b2=rnorm(1),b3=rnorm(1), a0=rnorm(J),tau=runif(1,0,10),sig=runif(1,0,10)) } rube(rube.lmer.10,data.list,rube.lmer.10.inits) rube.lmer.10.fit <- rube(rube.lmer.10,data.list,rube.lmer.10.inits, parameters.to.save=c("b0","b1", "b2", "b3", "a0", "sig","tau"), n.chains=3) Fitting the model with course.id as a random effect…

  22. Results of fit mean sd b0 4.090 0.0677 b1 0.213 0.0438 b2 -0.211 0.0515 b3 -0.131 0.0655 a0[1] 4.131 0.0378 a0[2] 4.364 0.1941 ... sig 0.517 0.0174 tau 0.260 0.0756

  23. Ideally, DIC and simulation checks should work together (DIC <- c(rube.lm.1.fit$DIC,rube.lm.10.fit$DIC,rube.lmer.10.fit$DIC)) [1] 756.510 732.603 723.647 • lm.1: eval ~ beauty • lm.10: eval ~ beauty * female + factor(course.id) • lmer.10: eval ~ beauty * female + (1 | course.id) • So in this case, they do work together…

  24. What about AIC, BIC? • Usual rules of thumb • difference of 2 marginally intersting • difference of 3 interesting • difference of 10 a big deal

  25. Testing a contrast by simulation

  26. Using Posterior Simulation to Test Hypotheses about Parameters • The “Risky Behaviors” data set (you worked with on earlier hw) compares two treatements (couples counseling and women-only counseling) with a null control group, for reducing HIV infection. • The variables are: sex : "woman","man" couples : 0/1: in couples counseling?women_alone: 0/1: in women-only counseling? bs_hiv : HIV status at baseline: "negative","positive" bupacts : number of unprotected sex acts @ baseline fupacts : number of unprotected sex acts @ end of study id : respondent id (1:434) • Recall that fupacts has to be rounded (always data cleaning…)

  27. One possible model… > summary(glmer.1 <- glmer(fupacts ~ couples + women_alone + bupacts + bs_hiv + (1|id), data=rb, family=poisson)) Random effects: Groups Name Variance Std.Dev. id (Intercept) 3.2374 1.7993 Number of obs: 434, groups: id, 434 Fixed effects: Estimate Std. Error z value Pr(>|z|) (Intercept) 1.666481 0.186823 8.920 < 2e-16 *** couples -0.759627 0.231159 -3.286 0.00102 ** women_alone -0.974322 0.235538 -4.137 3.53e-05 *** bupacts 0.021172 0.002847 7.438 1.03e-13 *** bs_hivpositive -1.080734 0.239413 -4.514 6.36e-06 *** • Clearly both treatments are better than the null control • Counseling only women looks better than counseling couples – is this real?

  28. The lazy statistician’s answer: Do the CI’s overlap? est <- coef(summary(glmer.1))[2:3,1:2] est[1,1] + c(-2,2)*est[1,2] est[2,1] + c(-2,2)*est[2,2] plot(c(est[1,1] + c(-2,2)*est[1,2], est[2,1] + c(-2,2)*est[2,2]),c(1,1,2,2), type="n",ylim=c(0,3), xlab="CI for coefficient", axes=F,ylab="condition") lines(c(est[1,1] + c(-2,2)*est[1,2]),c(1,1)) lines(c(est[2,1] + c(-2,2)*est[2,2]),c(2,2)) box() axis(1) axis(2,at=c(1,2), labels=c("couples", "women only"))

  29. A slightly less (?) lazy answer... n.sim <- 1000 beta.c <- rnorm(n.sim,est[1,1],est[1,2]) beta.w <- rnorm(n.sim,est[2,1],est[2,2]) plot(density(beta.c - beta.w), main="Simulated difference between coefficient estimates", xlab="beta(couples) - beta(women only)") abline(v=0,col="red") pval <- sum(beta.c - beta.w < 0)/n.sim legend("topleft",legend=paste("P[diff<0] =",round(pval,2)),lty=1,col="red") cor(beta.c,beta.w) # [1] .0455817 contour(kde2d(beta.c,beta.w)) 0.25 0.25 • The ’s have asymptotically normal distributions (under resampling of the data); simulate from that? From library(MASS)

  30. A substantially better answer... (mu <- est[,1]) # couples women_alone # -0.7596268 -0.9743223 (Sigma <- vcov(glmer.1)[2:3,2:3]) # 2 x 2 Matrix of class "dsyMatrix" # [,1] [,2] # [1,] 0.05343457 0.02872367 # [2,] 0.02872367 0.05547830 n.sim <- 1000 sims.1 <- mvrnorm(n.sim,mu,Sigma) beta.c <- sims.1[,1] beta.w <- sims.1[,2] plot(density(beta.c - beta.w), main="Simulated difference between coefficient estimates", xlab="beta(couples) - beta(women only)") abline(v=0,col="red") pval <- sum(beta.c - beta.w < 0)/n.sim legend("topleft",legend=paste("P[diff<0] =",round(pval,2)),lty=1,col="red") cor(beta.c,beta.w) # [1] 0.490836 contour(kde2d(beta.c,beta.w)) 0.16 From library(MASS) From library(MASS) • The asymptotic distribution of the ’s is correlated normal; simulate from that!

  31. An Even Better Answer… • The “normality” of the ’s is only asymptotic. • In fact, the ’s have a distribution that is not quite normal (because, for example, the data is Poisson and not normal). • We can get the exact distribution of the ’s from a Bayesian model. • We next fit the Bayesian model, and look at a simulation of the difference from the posterior distribution of that model.

  32. Setting up and fitting the Bayesian model… rube.glmer.1 <- "model{ # LEVEL 1 for (i in 1:n) { fupacts[i] ~ dpois(lambda[i]) log(lambda[i]) <- a0[i] + b1*couples[i] + b2*women_alone[i] + b3*bupacts[i] + b4*bs_hiv[i] } # LEVEL 2 for (i in 1:n) { a0[i] ~ dnorm(b0,tau2inv) } # LEVEL 3 b0 ~ dnorm(0,.0001); b1 ~ dnorm(0,.0001) b2 ~ dnorm(0,.0001); b3 ~ dnorm(0,.0001) b4 ~ dnorm(0,.0001); tau2inv <- 1/tau2 tau2 ~ dunif(0,10) }" data.list <- as.list(rb) data.list$bs_hiv <- ifelse(data.list$bs_hiv=="negative",0,1) data.list$n <- dim(rb)[1] inits <- function() { list(b0=rnorm(1), b1=rnorm(1), b2=rnorm(1), b3=rnorm(1), b4=rnorm(1), tau2=runif(1,0,10)) } run.glmer.1 <- rube(rube.glmer.1, data.list, inits, parameters.to.save=c("b1","b2","tau2"), n.chains=3) p3(run.glmer.1) Yikes! …so this is what unconverged MCMC samples look like…

  33. Some possible fixes… • Explore the full posterior… • just very diffuse? • actually has secondary modes? • Try a set of initial values from the glmer fit… • initial values from a “simpler” model! co <- coef(summary(glmer.1)) inits <- function() { list(b0=rnorm(1,co[1,1],co[1,2]), b1=rnorm(1,co[2,1],co[2,2]), b2=rnorm(1,co[3,1],co[3,2]), b3=rnorm(1,co[4,1],co[4,2]), b4=rnorm(1,co[5,1],co[5,2]), tau2=runif(1,0,5)) } run.glmer.1 <- rube(rube.glmer.1, data.list, inits, parameters.to.save = c("b1", "b2", "tau2"), n.chains=3) p3(run.glmer.1) Still not great, but better… (other params similar)

  34. Posterior simulation of the difference… beta.c <- run.glmer.1$sims.list$b1 beta.w <- run.glmer.1$sims.list$b2 n.sim <- length(beta.c) plot(density(beta.c - beta.w), main="Simulated difference between coefficient estimates", xlab="beta(couples) - beta(women only)") abline(v=0,col="red") pval <- sum(beta.c - beta.w < 0)/ n.sim legend("topleft",legend=paste("P[ diff<0] =", round(pval,2)), lty=1, col="red") cor(beta.c,beta.w) #[1] 0.6314137 contour(kde2d(beta.c,beta.w)) 0.21 • We now simulate from the joint posterior distribution of the ’s.

  35. Simulation-based testing (summary) • We can simulate “fake data” to check the fit of the model • Does a statistic from the real data “fit in” with fake data simulation of the statistic from the fitted model? • Plug in ’s and simulate data from that model (parametric bootstrap) • Simulate data from the “posterior predictive” distribution (posterior predictive checks) • We can simulate from • the [likelihood-based] asymptotic distribution of the parameter estimates, or • the [Bayesian] posterior distribution, to construct tests directly on the parameters themselves.

  36. Summary • Model Checking • Two strategies for checking model fit • DIC • Simulation checks/fake data/posterior predictive checks • Example: Rating professors • Parameter Tests • Testing a difference by simulation • Example: Risky Behaviors

More Related