Lecture 7

Lecture 7 MCMC for Normal response multilevel models

Lecture Contents • Gibbs sampling recap. • Variance components models. • MCMC algorithm for VC model. • MLwiN MCMC demonstration using Reunion island dataset. • Choice of variance prior. • Residuals, predictions etc. • Chains for derived quantities.

MCMC Methods (Recap) • Goal: To sample from joint posterior distribution. • Problem: For complex models this involves multidimensional integration. • Solution: It may be possible to sample from conditional posterior distributions, • It can be shown that after convergence such a sampling approach generates dependent samples from the joint posterior distribution.

Gibbs Sampling (Recap) • When we can sample directly from the conditional posterior distributions then such an algorithm is known as Gibbs Sampling. • This proceeds as follows for the linear regression example: • Firstly give all unknown parameters starting values, • Next loop through the following steps:

Gibbs Sampling ctd. • Sample from These steps are then repeated with the generated values from this loop replacing the starting values. The chain of values produced by this procedure are known as a Markov chain, and it is hoped that this chain converges to its equilibrium distribution which is the joint posterior distribution.

Variance Components Model Yesterday you were introduced to multilevel models and in particular the variance components model, for example: This can be seen as an extension of a linear model to allow for differing intercepts for each higher level unit e.g. schools, herds, hospitals.

school 1 0 + 1x1ij u0,1 school 2 u0,2 -3 0 1 +3 Random intercepts model A variance components model with 1 continuous predictor is known as a random intercepts model.

Bayesian formulation of a variance components model To formulate a variance components model in a Bayesian framework we need to add prior distributions for For the Gibbs sampling algorithm that follows we will assume (improper) uniform priors for the fixed effects, β and conjugate inverse Gamma priors for the two variances. (in fact we will use inverse χ2 priors which are a special case)

Full Bayesian Model Our model is now: We need to set starting values for all parameters to start our Gibbs sampler:

Gibbs Sampling for VC model • Sample from These steps are then repeated with the generated values from this loop replacing the starting values. The chain of values produced by this procedure are known as a Markov chain. Note that β is generated as a block while each uj is updated individually.

Step 1: Updating β

Step 2: Updating uj

Step 3: Updating

Step 4: Updating

Algorithm Summary Repeat the following four steps • 1. Generate β from its (Multivariate) Normal conditional distribution. • 2. Generate each uj from its Normal conditional distribution. • 3. Generate 1/σu2 from its Gamma conditional distribution. • 3. Generate 1/σe2 from its Gamma conditional distribution.

Gibbs Sampling for other models • We have now looked at Gibbs sampling algorithms for 2 models. • From these you should get the general idea of how Gibbs sampling works. • From now on we will assume that if Gibbs sampling is feasible for a model that the algorithm can be generated. • When (conjugate) Gibbs Sampling is not feasible (see day 5) we will describe alternative methods.

Variance components model for the reunion island dataset • Dataset analysed in Dohoo et al. (2001) • Response is log(calving to first service). • 3 levels in data – observations nested within cow nested within herd. • 2 dichotomous predictors, heifer and artificial insemination neither of which are significant for this response.

MCMC MLwiN DemoIGLS – Equations window

IGLS – Trajectories window

Hierarchy Viewer

MCMC starting values and default priors

MCMC first 50 iterations

MCMC after 5000 iterations

Summary for 0 For details see next lecture!!!

Summary for 2v

Summary for 2u So need to run for longer but see practical and next lecture for details.

Running for 100k iterations Little change in estimates and all diagnostics now happy!

Residuals • In MCMC the residuals are part of the model and are updated at each iteration. • By default MLwiN stores the sum and sum of squares for each residual so that it can calculate their mean and sd. • It is possible to store all iterations of all residuals but this takes up lots of memory!

Herd level residuals in MCMC Herd 28 in red, Herd 35 in blue Compared to IGLS

Choice of variance prior • MLwiN offers 2 default variance prior choices • Gamma(,) priors for precisions Or • Uniform priors on the variances Browne (1998) and Browne and Draper (2004) looked at the performance of these priors in detail and compared them with the IGLS and RIGLS methods.

Uniform on variance priors Main difference is marginal increase in herd level variance

Adding in heifer predictor As can be seen heifer appears not to be significant at all. We look at model comparison in lecture 9

Heifer parameter information

Predictions

Informative prior for 1

Confidence intervals for Variance Partition Coefficients • In our three level model there are two VPCs. One for herd and one for cow: • Note that MLwiN stores the parameters stacked in column c1090. If we split this column up we can look at chains for VPCs.

Commands to create the VPC column CODE 5 1 5000 C1089 - creates an indicator column SPLIT c1090 c1089 C21 C22 C23 C24 c25 - splits up 5 parameter chains CALC c26 = c23/(c23+c24+c25) – calculates chain for herd VPC CALC c27 = c24/(c23+c24+c25) – calculates chain for cow VPC NAME c26 ‘HerdVPC’ c27 ‘CowVPC’ – names the columns Note that Browne (2003) gives a more involved method for working out chains for ranks of residuals as you will see in the practical!

Herd level VPC

Cow level VPC

What have we covered • Setting up a model and running it using MCMC in MLwiN. • Looking at residuals, predictions, derived quantities. • Choosing default priors and incorporating informative priors. • Convergence and model comparison are to come … as are links with WinBUGS.

Introduction to Practical • The practical is taken from the MCMC in MLwiN book (Browne 2003) with some modifications. • The tutorial dataset is from education and includes pupils within schools. • You will cover similar material to this demonstration along with some information on convergence diagnostics.

Lecture 7

Lecture 7

Presentation Transcript

Lecture 7

Lecture # 7

Lecture 7

Lecture 7

Software Engineering Lecture 7 Lecture # 7

Lecture 7

Lecture # 7

Lecture # 7

Lecture 7

Lecture 7

Lecture 7

Lecture 7

Lecture 7

Lecture 7

Lecture 7

Lecture 7

Software Engineering Lecture 7 Lecture # 7

Lecture 7

LECTURE № 7

Lecture 7

Lecture 7

Lecture 7