340 likes | 361 Views
Lecture 23. Spatial Modelling 2 : Multiple membership and CAR models for spatial data. Lecture Contents. Multiple membership models revisited Neighbourhood classification CAR models CAR models in MLwiN CAR models in WinBUGS GeoBUGS and mapping tools. Spatial Data (recap).
E N D
Lecture 23 Spatial Modelling 2 : Multiple membership and CAR models for spatial data
Lecture Contents • Multiple membership models revisited • Neighbourhood classification • CAR models • CAR models in MLwiN • CAR models in WinBUGS • GeoBUGS and mapping tools
Spatial Data (recap) In these 2 afternoon lectures we are interested in fitting statistical models that account for spatial correlation in the dataset. We are in the main considering responses that correspond to areas on maps. The response is then a total or average response for the whole area. We believe that areas that are spatially close together are more likely to have similar responses than areas that are far apart.
Approaches so far Approach 1 Nested random effects : Here lower level geographical areas are correlated via shared higher level geographical effects. This works well however the correlation is rather “all or nothing” as responses either side of a boundary are assumed uncorrelated! Approach 2 Location/Distance & Direction effects: Here fixed effects for the co-ordinates of each location or the distance and direction from a sensible landmark are included. The idea is then that the resulting random effects will be uncorrelated. This works well if there is one source to the spatial correlation e.g. distance from London but cannot pick up other ‘local’ effects.
Spatial approaches In this lecture we consider some approaches that are more related to the particular spatial nature of our data: • Multiple membership models – these link in nicely with the random effects nature of this course but are less common in spatial statistics. • CAR models – these are commonly used in image analysis and disease mapping modelling. • Other spatial models in GeoBUGS – these will generally be extensions to CAR models and will be typically used in disease mapping.
Multiple membership models These are models where each level 1 unit is a member of more than one higher level unit. For example, • Pupils change schools/classes and each school/class has an effect on pupil outcomes. • Patients are seen by more than one nurse during the course of their treatment. • Counties are bordered by more than one other neighbouring county!
n1(j=1) n2(j=2) n3(j=3) p1(i=1) 0.5 0 0.5 p2(i=2) 1 0 0 p3(i=3) 0 0.5 0.5 p4(i=4) 0.5 0.5 0 Here patient 1 was seen by nurse 1 and 3 but not nurse 2 and so on. If we substitute the values of w(2)i,j, i and j. from the table into (1) we get the series of equations : Non Spatial Example Note that nurse(i) now indexes the set of nurses that treat patient i and w(2)i,jis a weighting factor relating patient i to nurse j. For example, with four patients and three nurses, we may have the following weights:
Estimation methods • Although it is theoretically possible to fit multiple membership models using the IGLS method it is not reliable for larger problems and so we will not consider this here. • MCMC methods are far easier to use for multiple membership models in MLwiN and so we will consider only these methods here.
Revisiting the House price data In the last lecture we considered a 4-level variance components model: where i indexes year, j indexes town, k indexes county and l indexes region. The response, y is the log of the average price. We will here ignore the region effects and instead concentrate on fitting spatial effects at the county level.
Multiple membership (MM) model We will fit 2 sets of random effects at the county level: the county itself and its neighbours that form a multiple membership classification. We therefore have the following model: Here the weights for each set of neighbours are chosen to be equal and to sum to 1.
Estimates for the MM model Here we see that the neighbourhood variance is far larger (0.152) than the county variance (0.013). Although it should be noted that these are not really comparable due to the weighted effects. The rest of the model is little changed from before.
Model Comparison If we compare the nested models with and without random effects and the MM model in terms of DIC we get the following results which point to little difference between the models:
Plots of effects On the left are county effect and on right neighbour effects:
Choice of Neighbours We have here chosen all counties that border a particular county to be part of the neighbour classification. We can extend this idea to include counties further away or counties whose centroid is within a certain distance. Perhaps a more sensible neighbourhood definition would be to include the county itself in its neighbourhood therefore making neighbourhood a kind of higher level classification but unique to each county.
Including the county in its neighbourhood This results in the estimates to the right. The DIC at -9996.31 is very similar to the earlier models. The neighbourhood classification explains more of the county level variability.
Plots of effects On the left are county effect and on right neighbour effects:
Extension to the MM Model Alastair Leyland and Ian Langford worked with the MLwiN team on an extension to the MM model in the IGLS framework. They correlated the area and neighbourhood effects as there is 1 of each per area. This model can give good results and is available through MLwiN macros. The Bayesian equivalent that essentially assumes a MV Normal prior for the 2 sets of effects can be easily fitted in WinBUGS but is not available in MLwiN.
Conditional Autoregressive Models This is a more standard way of fitting spatial modelling and these models originated in image analysis (e.g. Besag, York & Mollie 1991). Here a set of spatially correlated random effects are introduced in the model that have a MVN model that accounts for spatial correlation. The conditional form results in conditional distributions for each random effect: where ni is the number of neighbours for area i and the weights are typically all 1.
CAR Models in MLwiN MLwiN has limited CAR model functionality. It allows the user to specify one set of random effects that are CAR distributed. CAR models need additional constraints to allow identifiability as the set of effects does not have a fixed mean. One solution is to remove the global intercept and MLwiN uses this method. As we will see in the practical CAR models work fine for Poisson models in MLwiN however there appears to be a bug with Normal responses and so we cannot use MLwiN for the house price dataset.
Lip Cancer example of a CAR model We will study a dataset on Scottish Lip cancer in the practical. Here is a CAR model for that dataset. Note the intercept has been removed. We have one predictor %age working in agriculture, forestry and fishing which is a surrogate for sun exposure.
CAR Models and convolution models A CAR model fits spatially correlated effects for a classification. It is also possible to fit uncorrelated random effects for the classification in the same model. Such a model is called a convolution model. The DIC diagnostic suggests this is not necessary for the lip cancer data.
CAR Models in WinBUGS Another approach to fitting CAR models is to, at each iteration of the MCMC algorithm, after updating the set of CAR residuals centre them so that they have mean 0. This approach is used in WinBUGS and has the advantage that we can also include a global intercept in the model. We can also, if we wish, consider CAR residuals at several levels in the model. WinBUGS also has no problems with Normal responses, so we will use it to consider the house price dataset here.
CAR models in WinBUGS CAR distributed residuals are an interesting case in WinBUGS as they must be treated as one block of parameters. This is because WinBUGS works on directed graphs without loops and the CAR residuals form a loop through their conditional formulation. WinBUGS therefore has a special distribution as follows: u4[1:n4] ~ car.normal(adj[],weights[],num[],tau.u4) Here we define an adjacency list adj which is a ragged matrix containing lists of neighbours, num which contains the number of neighbours for each area and weights which contains the CAR weights, typically all 1.
CAR models for the House price dataset We fitted a convolution model with both structured and unstructured random effects at the county level. We also had postal town effects and year and year2 as fixed effects. The model was run in WinBUGS with a burnin of 1,000 and a main run of 10,000 due to time constraints. We probably should have run for longer and the DIC value (-9985.3) may be due to the chain not quite converging in 1,000 iterations. model { # Level 1 definition for(i in 1:N) { logprice[i] ~ dnorm(mu[i],tau) mu[i]<- beta[1] * cons[i] + beta[2] * year[i] + beta[3] * year2[i] + u2[PT[i]] * cons[i] + u3[county[i]] * cons[i] + u4[county[i]] * cons[i] } # Higher level definitions for (j in 1:n2) { u2[j] ~ dnorm(0,tau.u2) } for (j in 1:n3) { u3[j] ~ dnorm(0,tau.u3) } u4[1:n4] ~ car.normal(adj[],weights[],num[],tau.u4) # Priors for fixed effects for (k in 1:3) { beta[k] ~ dflat() } # Priors for random terms tau ~ dgamma(0.001000,0.001000) sigma2 <- 1/tau tau.u2 ~ dgamma(0.001000,0.001000) sigma2.u2 <- 1/tau.u2 tau.u3 ~ dgamma(0.001000,0.001000) sigma2.u3 <- 1/tau.u3 tau.u4 ~ dgamma(0.001000,0.001000) sigma2.u4 <- 1/tau.u4 }
Estimates for the model The table below gives means and intervals for all parameters:
Plots of county effects On the left are uncorrelated effects and on the right CAR spatial effects:
GeoBUGS GeoBUGS is an add-on module to WinBUGS which provides an interface for: • Producing maps of the output from disease mapping and other spatial models. • Creating and manipulating adjacency matrices that are required as input for the conditional autoregressive (CAR) models available in WinBUGS1.4 for carrying out spatial smoothing. • It is actually incorporated in WinBUGS 1.4. • It has been used for the maps of the house price data.
Incorporating map (polygon) files GeoBUGS takes polygon files as its form of map input. It can take files from : Splus, Arcinfo, Epimap, ArcView. The input files are text based and contain: • The number of regions in the map. • Lists of labels for each region with corresponding ID number. • List of x and y co-ordinates for each polygon, plus the polygon label. See GeoBUGS manual for further details.
Example map of GB for house prices (GBMLwiN) This is in fact a modification of the supplied map GB-counties which contains only counties required for our dataset. Note our dataset had no data for 4 of the Scottish island based counties which we therefore removed. A note of caution: It is important when using the mapping tools to ensure your numbering of counties corresponds to the maps numbering!
Adjacency Tool All GeoBUGS options can be found under the map menu. The adjacency tool produces the grey map seen in the last slide and clicking on a county displays its neighbours. If you want to find a numbered county then you can click on show region. Clicking on adj matrix will give the ragged matrix of neighbours: list( num = c(3, 2, 5, 6, 4, 2, 3, 5, 3, 4, 6, 2, 5, 2, 7, 5, 4, 5, 3, 5, 5, 6, 7, 5, 5, 7, 6, 7, 3, 5, 7, 7, 5, 4, 4, 3, 3, 3, 8, 2, 7, 2, 3, 6, 3, 4, 3, 4, 1, 7, 5, 5, 4, 3, 7, 5, 7, 6, 3, 3, 4, 0, 5 ), adj = c( 4, 3, 2, 3, 1, 6, 5, 4, 2, 1, 9, 8, 7, 5, 3, 1, 7, 6, 4, 3, 5, 3, ….
Mapping Tool We can map estimates for chains as we have already seen. We can however also map predictor variables, quantiles of the chain and pretty much anything else we might like to map! The tool offers a selection of colour maps and options to change the number of cut points.
Spatial distributions WinBUGS offers several spatial distributions (we have only considered the car.normal distribution) that you can investigate and the GeoBUGS manual gives more information. These include: car.normal, car.l1, car.proper, spatial.exp, spatial.disc, spatial.pred, spatial.unipred, pois.conv and mv.car. You may also find examples of the use of some of these models in Lawson et al. (2003) although as these are not common random effects models we will not cover them in this course.
Information for the practical The practical will take you through an analysis of modelling Scottish lip cancer dataset. This will be done primarily in MLwiN with a little bit in WinBUGS and will contrast multiple membership and CAR models. If you finish the practical early and think spatial models are important for your data you may want to read through some of the GeoBUGS manual and examples.