Priors, Normal Models , Computing Posteriors

Priors, Normal Models,Computing Posteriors st5219: Bayesian hierarchical modelling lecture 2.3

Why posteriors need to be computed • If you can find a conjugate prior for your data-parameter model, it’s very sensible to use that and to coerce your prior into that form • Eg you were going to have a normal prior but then noticed the gamma is conjugate in your model, then convert your normal prior into a gamma and use that • If not, you need to do some computation • In the past: this was a big problem that precluded Bayesian analysis outside of the ivory tower • Now: personal computers make this no problem lah

What’s the challenge? Frequentist Bayesian • For ML estimation: need to find maximum of a surface or curve • Approaches • calculus • deterministic search algorithms using (i) gradient or (ii) heights only • stochastic algorithms, eg simulated annealing, cross entropy • For Bayesian estimation: need to find the whole posterior Quadrature is quite difficult Optimisation is fairly simple

The curse of dimensionality Toy problems Real problems • Illustrate concepts • Simple data, few complications • Often just use calculus • Usually few parameters • :o) • Solving real problems • Complex data, always loads of complications • Cannot do differentiation • Usually lots of parameters • :o( Lots of parameters prohibit grid searches

Simulations • Assume you’re interested in the lengths of elephant tusks • There will be some distribution of these • How do you try to estimate that distribution?

Why not? 

Simulations • Easy to sample statistical distributions(more than elephants) • Can build up large samples quickly • Most popular method is MCMC • Others are • Monte Carlo • Importance sampling If your sample is of size 10 000 or 100 000 then it IS the distribution you’re sampling from Sample the posterior Calculate stuff you’re interested inlike the posterior mean (mean of sample), quantiles etc

Monte Carlo • Capital of Monaco • Casino • Roulette wheel gaveinspiration to USscientists working onthe Manhattan project Read Metropolis (1987)http://library.lanl.gov/cgi-bin/getfile?00326866.pdf

When to use Monte Carlo • If your posterior distribution is known, and you can simulate from it, but it’s awkward to do anything else • Example: you have data (between sex difference in prey items per h within songlark nests) you assume are a random draw from a N(μ,σ²) distribution • xˉ=3.18 • s=4.42 • n=22 • Prior: (μ,σ²)~NSIG(0,1/100,1/100,1/100)

Example Monte Carlo code • library(geoR) ; set.seed(666) • mu_0=0;kappa_0=1;nu_0=1;sigma2_0=100 • ndraws=100000 • prior_sigma2=rinvchisq(ndraws,nu_0,sigma2_0) • prior_mu=rnorm(ndraws,mu_0,prior_sigma2/kappa_0) • xbar=3.18 • s=4.42 • n=22 • mu_n=(kappa_0/(kappa_0+n))*mu_0 + (n/(kappa_0+n))*xbar • kappa_n= kappa_0+n • nu_n= nu_0 + n • sigma2_n= (nu_0*sigma2_0 + (n-1)*s*s + kappa_0*n*((xbar-mu_0)^2)/(kappa_0+n))/nu_n • posterior_sigma2=rinvchisq(ndraws,nu_n,sigma2_n) • posterior_mu=rnorm(ndraws,mu_n,posterior_sigma2/kappa_n) • plot(posterior_mu[q],sqrt(posterior_sigma2[q]),pch=16,col=rgb(0,0,0,0.1), xlab=expression(mu),ylab=expression(sigma))

Plot of posterior

Marginal posteriors • summariser=function(x){print(paste( "Posterior mean is ",signif(mean(x),3), ", 95%CI is (",signif(quantile(x,0.025),3), ",",signif(quantile(x,0.975),3),")“,sep=""),quote=FALSE)} • summariser(posterior_mu)[1] Posterior mean is 3.04, 95%CI is (0.776,5.3) • summariser(posterior_sigma2)[1] Posterior mean is 24.7, 95%CI is (13.6,44.5) • summariser(sqrt(posterior_sigma2))[1] Posterior mean is 4.91, 95%CI is (3.69,6.67) • summariser(sqrt(posterior_sigma2)/posterior_mu)[1] Posterior mean is 1.37, 95%CI is (0.913,5.79)

Wondrousness of Bayesianism • Once you have a sample from the posterior: the world is your oyster! • Want a confidence interval for the variance? • Want a confidence interval for the standard deviation? • Want a confidence interval for the coefficient of variation? • How to do this in the frequentist paradigm?See Koopmans et al (1964) Biometrika 51:25--32It’s not nice

Problem with Monte Carlo • Usually an exact form for the posterior cannot be got • Two alternative simulation-based approaches present themselves: • Importance sampling • Markov chain Monte Carlo

Importance Sampling • The idea of importance sampling is that instead of sampling from the actual posterior, you simulate from some other distribution and “correct” the sample by weighting it • If • You want f(|data) • You can simulate and evaluate the density of q() • You can evaluate the density of f( |data) up to a constant of proportionality • you can use importance sampling.

Importance sampling recipe • Generate a sample of theta from q() • For each sample i, associate a weightwi f(i|data)/q(i) • Scale the ws such that wi =1 • The posterior mean of any function g() of interest isE[g()] = wig(i) • You can get things like confidence intervals by working with the CDF of the marginals

Example: H1N1 • Naïve version: Simulate (p,σ) from U(0,1) ², ie q(p,σ)=1 over [0,1]² • Weight by posterior, scale to sum to one

Code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=runif(NDRAWS),sigma=runif(NDRAWS)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors) • weights=weights/sum(weights) • plot(sampled$p,sampled$sigma,cex=20*weights,pch=16,xlab='p',ylab=expression(sigma),xlim=0:1,ylim=0:1)

Naïve sample

Better code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=runif(NDRAWS),sigma=rbeta(NDRAWS,630,136)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)) • weights=weights/sum(weights)

Better sample

Even better code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=rbeta(NDRAWS,20,100),sigma=rbeta(NDRAWS,630,136)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)-dbeta(sampled$p,20,100,log=TRUE)) • weights=weights/sum(weights)

Better sample

Problems with importance sampling • Choice of sampling distribution q: an obvious choice, the prior, leads to a simple weight function (the likelihood) but if the prior is over dispersed relative to the posterior, lots of wasted samples • Alternatives would be to do it iteratively, trying the prior first, then tuning q to be nearer to the area that looks like the posterior and iterating until a good distribution is found • Ideally, sample directly from the posterior: weights 1/m • Dealing with a weighted sample is a bit of a nuisance

An alternative • Finding the posterior is simpler if the computer will automatically move in the right direction even given a rubbish guess at q • Markov chain Monte Carlo is an ideal way to achieve this • Next section!!!

Priors, Normal Models , Computing Posteriors

Priors, Normal Models , Computing Posteriors

Presentation Transcript

Latent normal models for missing data

Application models for utility computing

Application Models for utility computing

Class 40: Alternate Computing Models

Working With Normal Models

Priors, Normal Models, Computing Posteriors

Smoothing/Priors/ Regularization for Maxent Models

Priors, Normal Models, Computing Posteriors

Priors, Normal Models, Computing Posteriors

Associative Computing Models

Mixture Models with Adaptive Spatial Priors

Models of Distributed Computing

Biospheric Models as Priors

Chapter 3: Log-Normal Shadowing Models

CIS 6930: Mobile Computing Mobile Computing Models

The Normal Models/Distributions

Diffuse Priors for Base Models

Cloud Computing Business Models

cloud computing service models

About priors

Mixture Models with Adaptive Spatial Priors