290 likes | 367 Views
Priors, Normal Models , Computing Posteriors. st5219 : Bayesian hierarchical modelling lecture 2.3. Why posteriors need to be computed. If you can find a conjugate prior for your data-parameter model, it’s very sensible to use that and to coerce your prior into that form
E N D
Priors, Normal Models,Computing Posteriors st5219: Bayesian hierarchical modelling lecture 2.3
Why posteriors need to be computed • If you can find a conjugate prior for your data-parameter model, it’s very sensible to use that and to coerce your prior into that form • Eg you were going to have a normal prior but then noticed the gamma is conjugate in your model, then convert your normal prior into a gamma and use that • If not, you need to do some computation • In the past: this was a big problem that precluded Bayesian analysis outside of the ivory tower • Now: personal computers make this no problem lah
What’s the challenge? Frequentist Bayesian • For ML estimation: need to find maximum of a surface or curve • Approaches • calculus • deterministic search algorithms using (i) gradient or (ii) heights only • stochastic algorithms, eg simulated annealing, cross entropy • For Bayesian estimation: need to find the whole posterior Quadrature is quite difficult Optimisation is fairly simple
The curse of dimensionality Toy problems Real problems • Illustrate concepts • Simple data, few complications • Often just use calculus • Usually few parameters • :o) • Solving real problems • Complex data, always loads of complications • Cannot do differentiation • Usually lots of parameters • :o( Lots of parameters prohibit grid searches
Simulations • Assume you’re interested in the lengths of elephant tusks • There will be some distribution of these • How do you try to estimate that distribution?
Why not?
Simulations • Easy to sample statistical distributions(more than elephants) • Can build up large samples quickly • Most popular method is MCMC • Others are • Monte Carlo • Importance sampling If your sample is of size 10 000 or 100 000 then it IS the distribution you’re sampling from Sample the posterior Calculate stuff you’re interested inlike the posterior mean (mean of sample), quantiles etc
Monte Carlo • Capital of Monaco • Casino • Roulette wheel gaveinspiration to USscientists working onthe Manhattan project Read Metropolis (1987)http://library.lanl.gov/cgi-bin/getfile?00326866.pdf
When to use Monte Carlo • If your posterior distribution is known, and you can simulate from it, but it’s awkward to do anything else • Example: you have data (between sex difference in prey items per h within songlark nests) you assume are a random draw from a N(μ,σ²) distribution • xˉ=3.18 • s=4.42 • n=22 • Prior: (μ,σ²)~NSIG(0,1/100,1/100,1/100)
Example Monte Carlo code • library(geoR) ; set.seed(666) • mu_0=0;kappa_0=1;nu_0=1;sigma2_0=100 • ndraws=100000 • prior_sigma2=rinvchisq(ndraws,nu_0,sigma2_0) • prior_mu=rnorm(ndraws,mu_0,prior_sigma2/kappa_0) • xbar=3.18 • s=4.42 • n=22 • mu_n=(kappa_0/(kappa_0+n))*mu_0 + (n/(kappa_0+n))*xbar • kappa_n= kappa_0+n • nu_n= nu_0 + n • sigma2_n= (nu_0*sigma2_0 + (n-1)*s*s + kappa_0*n*((xbar-mu_0)^2)/(kappa_0+n))/nu_n • posterior_sigma2=rinvchisq(ndraws,nu_n,sigma2_n) • posterior_mu=rnorm(ndraws,mu_n,posterior_sigma2/kappa_n) • plot(posterior_mu[q],sqrt(posterior_sigma2[q]),pch=16,col=rgb(0,0,0,0.1), xlab=expression(mu),ylab=expression(sigma))
Marginal posteriors • summariser=function(x){print(paste( "Posterior mean is ",signif(mean(x),3), ", 95%CI is (",signif(quantile(x,0.025),3), ",",signif(quantile(x,0.975),3),")“,sep=""),quote=FALSE)} • summariser(posterior_mu)[1] Posterior mean is 3.04, 95%CI is (0.776,5.3) • summariser(posterior_sigma2)[1] Posterior mean is 24.7, 95%CI is (13.6,44.5) • summariser(sqrt(posterior_sigma2))[1] Posterior mean is 4.91, 95%CI is (3.69,6.67) • summariser(sqrt(posterior_sigma2)/posterior_mu)[1] Posterior mean is 1.37, 95%CI is (0.913,5.79)
Wondrousness of Bayesianism • Once you have a sample from the posterior: the world is your oyster! • Want a confidence interval for the variance? • Want a confidence interval for the standard deviation? • Want a confidence interval for the coefficient of variation? • How to do this in the frequentist paradigm?See Koopmans et al (1964) Biometrika 51:25--32It’s not nice
Problem with Monte Carlo • Usually an exact form for the posterior cannot be got • Two alternative simulation-based approaches present themselves: • Importance sampling • Markov chain Monte Carlo
Importance Sampling • The idea of importance sampling is that instead of sampling from the actual posterior, you simulate from some other distribution and “correct” the sample by weighting it • If • You want f(|data) • You can simulate and evaluate the density of q() • You can evaluate the density of f( |data) up to a constant of proportionality • you can use importance sampling.
Importance sampling recipe • Generate a sample of theta from q() • For each sample i, associate a weightwi f(i|data)/q(i) • Scale the ws such that wi =1 • The posterior mean of any function g() of interest isE[g()] = wig(i) • You can get things like confidence intervals by working with the CDF of the marginals
Example: H1N1 • Naïve version: Simulate (p,σ) from U(0,1) ², ie q(p,σ)=1 over [0,1]² • Weight by posterior, scale to sum to one
Code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=runif(NDRAWS),sigma=runif(NDRAWS)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors) • weights=weights/sum(weights) • plot(sampled$p,sampled$sigma,cex=20*weights,pch=16,xlab='p',ylab=expression(sigma),xlim=0:1,ylim=0:1)
Better code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=runif(NDRAWS),sigma=rbeta(NDRAWS,630,136)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)) • weights=weights/sum(weights)
Even better code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=rbeta(NDRAWS,20,100),sigma=rbeta(NDRAWS,630,136)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)-dbeta(sampled$p,20,100,log=TRUE)) • weights=weights/sum(weights)
Problems with importance sampling • Choice of sampling distribution q: an obvious choice, the prior, leads to a simple weight function (the likelihood) but if the prior is over dispersed relative to the posterior, lots of wasted samples • Alternatives would be to do it iteratively, trying the prior first, then tuning q to be nearer to the area that looks like the posterior and iterating until a good distribution is found • Ideally, sample directly from the posterior: weights 1/m • Dealing with a weighted sample is a bit of a nuisance
An alternative • Finding the posterior is simpler if the computer will automatically move in the right direction even given a rubbish guess at q • Markov chain Monte Carlo is an ideal way to achieve this • Next section!!!