Lecture 8

Lecture 8 Other MCMC features in MLwiN and the MLwiN->WinBUGS interface

Lecture Contents • Convergence diagnostics. • Starting values. • MLwiN demonstration. • MLwiN to WinBUGS interface. • t distributed residuals example.

Recap of practical session The practical covered two features of MCMC sampling that we will cover in more detail here: • Convergence diagnostics and summary statistics. • Choosing starting values for unknown parameters.

Reunion Island Model revisited We firstly fitted the following model to the dataset:

Trajectories window • By default we have the last 500 iterations • Clearly some chains mix better than others!

MCMC diagnostics for 0 • We will describe each pane separately

Trace plot • This graph plots the generated values of the parameter against the iteration number. • A crude test of mixing is the ‘blue finger’ test. • This chain doesn’t mix that well but could be worse!

Kernel Density plot This plot is like a smoothed histogram. Instead of counting the estimates into bins of particular widths like a histogram, the effect of each iteration is spread around the estimate via a Kernel function e.g. a normal distribution. This means that at each point we get the sum of the Kernel function parts for each iteration. The Kernel density plot has a smoothness parameter that can be modified.

Time series diagnostics Here we have the Auto correlation function (ACF) and partial autocorrelation function (PACF) plots. The ACF measures how correlated the values in the chain are with their close neighbours. The lag is the distance between the two chains to be compared. An independent chain will have approximately zero autocorrelation at each lag. A Markov chain should have a power relationship in the lags i.e. if ACF(1) =  then ACF(2) = 2 etc. This is known as an AR(1) process. The PACF measures discrepancies from such a process and so should normally have values 0 after lag 1.

Accuracy Diagnostics MLwiN has 2 accuracy diagnostics: • Raftery-Lewis works on quantiles of distribution. • Brooks-Draper works on quoting the mean to n significant figures. It’s formulae uses the estimate, it’s s.d. and the ACF and it can often give very small or very large values!

Raftery Lewis Diagnostic This diagnostic is based on the properties of a 2 state Markov chain. For a given quantile the chain is dichotomized into values that are greater (1) or less than (0) the value of the quantile. For an independent chain the 0’s and 1’s should appear randomly. If there is large clustering (in time) of 0’s or 1’s then the chain is mixing badly.

Summary Statistics Three estimates of location are given: Mean – from the chain. Mode – from the kernel plot. Median (50% quantile) – by sorting the chain and finding the middle value. The SD is calculated from the chain and the other quantiles are used to give (possibly) non-symmetric interval estimates. The MCSE and ESS will be discussed next.

Monte Carlo Standard Error The Monte Carlo Standard Error (MCSE) is an indication of how much error is in the estimate due to the fact that MCMC is used. As the number of iterations increases the MCSE0. For an independent sampler it equals the SD/n. However it is adjusted due to the autocorrelation in the chain. The graph above gives estimates for the MCSE for longer runs.

Effective Sample Size This quantity gives an estimate of the equivalent number of independent iterations that the chain represents. This is related to the ACF and the MCSE. Its formula is: For this parameter our 5,000 actual iterations are equivalent to only 344 independent iterations!

MCMC diagnostics for 2u This is the worst mixing parameter with a suggestion of a run of over 100k iterations being required.

MCMC diagnostics for 2uAfter 100,000 iterations Note that now the diagnostics are all satisfied.

Starting Values in MLwiN By default MLwiN uses the values currently stored for each parameter. As we usually run MCMC directly after IGLS this means we use IGLS estimates as starting values. These values are a good place to start and hence burnin will often be short. This is OK for most multilevel models as we know they should be unimodal. However it is better to try several starting values to ensure we have convergence. WinBUGS allows multiple chains to be run which allows multiple chain MCMC diagnostics.

Diagnostics in WinBUGS WinBUGS uses a multiple chain diagnostic by Brooks, Gelman and Rubin. This is based on the ratio of between:within chain variance (ANOVA). WinBUGS produces plots of: • Average 80% interval within-chains (blue) and pooled 80% interval between chains (green) which should converge to stable values. • Ratio pooled:average interval widths (red) – should converge to 1.0. • To print values of the diagnostic double click on the plot and press CTRL left hand mouse button.

MCSE in WinBUGS Accuracy of the posterior estimates can be assessed by the Monte Carlo standard error for each parameter. If samples were generated independently could estimate SE of mean as is the sample variance. This will underestimate the true MCSE due to autocorrelation in the chain. WinBUGS uses a ‘batch means’ method – replace sample variance S2 by variance of batched means that are assumed independent. MLwiN replaces actual sample size with effective sample size.

BUGS history Bayesian inference Using Gibbs Sampling: • Language for specifying complex Bayesian models. • Constructs object-oriented internal representation of the model. • Builds up an arbitrary complex model through specification of local structure. • Simulation from full conditionals using Gibbs sampling • Current version (WinBUGS 1.4) runs in Windows, and incorporates a script language for running in batch mode. • ‘Classic’ BUGS available for UNIX but this is an old version. WinBUGS is freely available from http://www.mrc-bsu.cam.ac.uk

WinBUGS common distributions NB: As noted earlier the Normal is parameterised in terms of its mean and precision = 1/variance= 1/sd2. See ‘Model Specification/The BUGS language: stochastic nodes/ Distributions’ in manual for full syntax. Note: Functions cannot be used as argument in distributions!! You need to create additional nodes

Some Built in WinBUGS functions • p <- step(x-.5):= 1 if x ≥0.5, 0 otherwise hence can be used to monitor when x ≥0.5 • p <- equals(x,.5) : = 1 if x=0.5, 0 otherwise. • tau <- 1/pow(s,2) sets • s <- 1/sqrt(tau) sets • p[i,k] <- inprod(pi[], Lambda[i,,k]) sets • See ‘Model Specification/Logical nodes’ in manual for full syntax.

Graphical Models Model building Statistical modelling of complex systems involve usually many interconnected random variables. How to we build the connections? Key idea: conditional independence It is helpful to represent the modelling process by a graph • Nodes: all random quantities • Links (directed or undirected): association between the nodes Directed edges: natural ordering of association, “causal” influence Undirected edges: symmetric association, correlation The graph is used to represent a set of conditional independence statements

Graphical Model: Doodle in WinBUGS The following is a doodle for the rats example in WinBUGS

The MLwiN to WinBUGS interface MLwiN has an interface that will generate WinBUGS code for an MLwiN model. The interface will save the model in two possible versions of WinBUGS and will input back the WinBUGS output in CODA format and use MLwiN diagnostics.

Model Description model { # Level 1 definition for(i in 1:N) { lncfs[i] ~ dnorm(mu[i],tau) mu[i]<- beta[1] * cons[i] + u2[cow[i]] * cons[i] + u3[herd[i]] * cons[i] } # Higher level definitions for (j in 1:n2) { u2[j] ~ dnorm(0,tau.u2) } for (j in 1:n3) { u3[j] ~ dnorm(0,tau.u3) } # Priors for fixed effects beta[1] ~ dflat() # Priors for random terms tau ~ dgamma(0.001000,0.001000) sigma2 <- 1/tau tau.u2 ~ dgamma(0.001000,0.001000) sigma2.u2 <- 1/tau.u2 tau.u3 ~ dgamma(0.001000,0.001000) sigma2.u3 <- 1/tau.u3 } WinBUGS describes the model in a mixture of deterministic and stochastic statements. The interface takes the names from the MLwiN columns and uses some other standard names e.g. beta, sigma2. Note this means there is some redundancy e.g. cons is included. Caution: Care must be taken with some column names.

Data description Note that the data is stored in the file after the word list and between parentheses. For example: list(N= 3027, n2 = 1575, n3 = 50, cow = c(1,1,2,2,3,4,5,5,5,5,6,6,6,7,7,7,8,8,8,9, 9,9,9,10,10,10,11,11,12,12,13,14,14,14,14,15,15,15,16,16, 16,17,17,17,17,18,18,18,18,19,20,20,20,21,21,21,22,22,22,22, 23,23,23,23,24,24,25,26,27,27,28,29,29,29,30,30,31,31,32,33, 33,33,33,34,34,34,35,35,35,36,36,37,38,39,39,40,40,40,41,41, 41,41,42,42,42,43,43,43,44,44,44,44,45,45,45,46,46,47,47,47, ….),…. Note some notation e.g. c for vectors borrowed from R and S-plus.

Initial values in WinBUGS Note that the initial values are also stored in the file after the word list and between parentheses. For example: list(beta= c(4.217674), u2 = c( 0.062732,-0.020885,0.054452,0.125001,-0.063034,-0.028254,-0.013445,-0.137141,-0.056420,-0.017268, 0.025402,-0.013377,-0.013452,-0.018461,-0.103801,-0.095923,0.031627,-0.019062,-0.026862,-0.037929, 0.152077,0.008395,-0.015221,0.047518,-0.029718,0.041764,-0.040790,-0.055156,0.048479,-0.036886, The MLwiN->WinBUGS interface has given the IGLS starting values to WinBUGS.

Running the reunion island dataset in WinBUGS The model was run for 3 parallel chains using MLwiN starting values and 2 other sets. We run the 3 chains for 5,000 after a 500 burnin Even the worst mixing parameter shows overlap in the three chains (see below)

Running the reunion island dataset in WinBUGS Brooks,Gelman & Rubin Diagnostic Summary Statistics for the run node mean sd (2.5%, 97.5%) beta[1] 4.217 0.0192 (4.18,4.255) sigma2 0.135 0.0048 (0.126,0.145) sigma2.u2 0.0193 0.0041 (0.0117,0.0274) sigma2.u3 0.0152 0.0039 (0.0092,0.0244)

Testing the sensitivity to the Normal assumption When we look at plots of residuals in a multilevel model we often see outlying residuals. These may be genuine outlying higher level units that do not follow the same distribution or it may be that the Normal distribution is not appropriate. The Normal is a member of the t family of distributions and other members of the family have longer tails and so may be more appropriate. We will investigate this in the practical.

Information on the practical In the practical we will look at how to fit the variance components model for the education dataset using WinBUGS. We will also examine sensitivity to the Normal assumption by considering instead a t distribution with unknown degrees of freedom and with fixed degrees of freedom (8).

Lecture 8

Lecture 8

Presentation Transcript

Lecture 8

Lecture 8

Lecture 8

Lecture 8

Lecture #8

Lecture 8

Lecture 8

Lecture 8

Lecture 8

Lecture 8

Lecture 8

LECTURE № 8

LECTURE 8

Lecture 8

Lecture 8

Lecture 8

Lecture 8

Lecture 8

Lecture 8