Bayesian models for area health and mortality variations; an overview Peter Congdon, Centre for Statistics and Dept of G

Bayesian models for area health and mortality variations; an overviewPeter Congdon, Centre for Statistics and Dept of Geography, QMUL • Bayesian methods have played major role in recent developments in statistical models for spatial data. • Many Bayesian applications have occurred in spatial epidemiology – books by Lawson et al (1999), Elliott et al (2000). Other application areas: spatial econometrics – see Lesage (1999); and geostatistics - see Diggle et al (JRSS,1998), Banerjee et al (2004)

Discrete area (‘lattice’) framework predominates in disease mapping using administrative data • In geostatistics (continuous spatial framework) goal is often spatial interpolation (kriging) between readings at sampled locations. • My talk concentrates on discrete areas, health applications, & WINBUGS implementation of discrete spatial models • Consider issues such as identifiability, choices for priors, model parsimony, parameter interpretability, multivariate & interaction effects, validity of commonly made assumptions, sparse data, etc

Model for data y=(y1,..yn) begins with likelihood p(y|). Inference about parameters  combines knowledge in prior with evidence from data; conditioning on data apparent in posterior density for parameters, p(|y)  p(y|)p() The prior is chosen to (a) summarise existing knowledge; may be “noninformative” (b) be suitable for form of parameter; e.g. gamma prior for inverse variances (precisions) c) for collection of parameters (e.g. random area effects) prior may specify form of correlation or alternatively independent ‘exchangeable’ parameters BAYESIAN METHOD

Bayesian Hierarchical Models • In hierarchical models involving latent data Z (e.g. random effects, missing observations), the joint density has the form p(y,,Z)=p(y|,Z)p(,Z)= p(y|,Z)p(Z|)p(). • Examples: models with spatially correlated random effects, multilevel models, longitudinal models • Posterior is p(,Z|y) unless random effects integrated out. Bayes approach via MCMC easier if random effects retained to define complete data likelihood p(y|,Z)

MCMC • Independent sampling from posterior density ()=p(|y) or (,Z)=p(,Z|y) not usually feasible. • Markov Chain Monte Carlo (MCMC) methods generate dependent draws (t) or {(t), Z(t)} via Markov chains defined by the assumption p((t)|(1),.. (t-1)) = p((t)|(t-1)), so that only preceding state relevant to future state. Sampling from such a Markov chain converges to stationary distribution  if additional requirements on chain are satisfied.

CANONICAL MODEL FOR LATTICE DATA • Let Yi and Ei denote observed & expected disease cases in areas i, i =1,...,n. If disease is rare or area populations are small, may assume Yi ~ Poisson(i); i = Eii where i is relative disease risk; Ei is expected events (Implicitly assumes multiplicative age & area effects – no age x area interactions) • To allow for known covariates Xi and Poisson over‑dispersion, common to represent relative risks in log‑linear form.

Thus: log i =  + Xi + Ui where  represents log relative risk of disease across entire region; Ui represents impact of latent (unobserved) risk factors • Could make U spatially correlated: but imports strong prior belief • Or could make U independent of space (“exchangeable”) over areas regardless of adjacency patterns or distances between areas

Compromise between spatial & non-spatial effects (aka convolution prior) • Besag, York, Mollie (1991) or BYM model is compromise between these two extreme possibilities. Thus Ui=hi+si hicapture unstructured heterogeneity, and si are spatially interdependent • Assume hi normal with overall zero mean hiN(0,2h)

For si, BYM model assumes conditional autoregressive (CAR) dependence over areas. Prior for area i is conditional on remaining effects (areas 1,..i-1,i+1,..n) denoted s[i] . So si| s[i] ~ N(Ai,Vi) Ai is weighted average of remaining effects: Ai = jcijsj/jcij =jwijsj Variances are Vi = 2s/ jcij depend on conditional variance 2s & on spatial interaction structure represented by cij.

Spatial Interactions Typical forms for cij are • binary adjacency: cij=1 if areas i and j are neighbours, cij=0 otherwise (and cii=0) • distance decay: cij=exp(-dij) where >0, dij are distances between area centres

Problems with CAR prior and convolution model • Only sum of hi and si is identified. Partitioning of total residual variance between unstructured & spatial variation may be affected by prior specifications. Priors on variances {2s,2h} or their inverses especially relevant • CAR prior does not specify average level of si, only their differences. Need to centre them during MCMC sampling • Two sets of effects not parsimonious in modelling terms • Poor identification with diffuse priors on variances (near impropriety in posterior)

Other Options • Introduce extra parameter  (“proper CAR” prior) si| s[i] ~ N(Ai,Vi)  in range [0,1] • Interpretation issues regarding  and with limiting case =0. Not a straightforward spatial correlation parameter

LLB Spatial Compromise Prior • Leroux, Lei, Breslow (1999) ri|r[i]~ N(ai∑j≠irj,Vi) ai=λ/(1-λ+λ∑j≠icij) Vi=2r/(1-λ+λ∑j≠icij) Reduces to unstructured heterogeneity when =0. • Under binary adjacency, cij=1 if areas {i,j} adjacent, Mi=number of areas next to area i, ai=λ/(1-λ+λMi) Vi=2r/(1-λ+λMi)

Joint prior under LLB (r1,…,rn)~Nn(0, 2rD-1) D=R+(1-)I rii= -∑j≠icij, rij=cij (i≠j) • Adaptive version (spatial dependence varies over map), logit(λi) normal with unknown mean and variance D=ΛR+(I-Λ)I, Λ=diag(λi)

Worked Example • Elevated blood level readings yi among children (under 72 months) tested in 133 counties of Virginia in 2000 - see http://www.vahealth.org/leadsafe/dataclpp.htm Numbers sampled Ti vary considerably (from 1 to 3808). Spatial structure is binary, with cij=1 for intercounty distances under 50km and cij=0 otherwise.

n=133, NN=1056

Compare BYM & LLB for these data • Assume binomial sampling with yiBin(Ti,πi), one option is BYM convolution model, logit(πi)=α+si+hi conditional variance δs for CAR spatial effects si; variance δh for unstructured effects. • Share of total variance due to unstructured variance is δh/[δh+Vs]; Vs=var(si) is marginal variance of spatial eeffects calculated at each iteration • For BYM model, priors are 1/δs~Ga(1,0.001) and g=δh/(δh+Vs)~U(0,1). Analogous to uniform shrinkage prior (Daniels, Canadian J. Statistics, 1999)

LLB model for binomial data Compared to Leroux et al (1999) model with logit(πi)=α+ri ri|r[i]~ N(ai∑j≠irj,Vi) ai=λ/(1-λ+λMi) U(0,1) Adaptive version also with logit(i)~N(μλ,τλ)

Fit & Model Checking • Use deviance information criterion, abbreviated DIC, and log pseudo marginal likelihood, log(psML), to compare models • Model checking possible using estimates of conditional predictive ordinates (CPOs) • see Congdon, Bayesian Statistical Modelling 2nd ed (Wiley, 2006) for more on these measures

DIC • Deviance Information Criterion (DIC) built into WINBUGS (Spiegelhalter et al, 2002, JRSSB) • Benefit of DIC approach is in providing measure of model complexity pe (aka “effective parameter count”). Also reflects degree of heterogeneity in random effects (if all effects non-significant pe will be low)

BYM Model Code model {for (i in 1:133) {y[i] ~dbin(p[i],T[i]) logit(p[i]) <- alph+s[i]+h[i] ; h[i] ~dnorm(0,inv.delta.h); # assess significance of individual spatial effects p.sig[i] <- step(s[i]) # log-likelihood L[i] <- logfact(n[i])-logfact(y[i])-logfact(r[i])+y[i]*log(p[i])+r[i]*log(1-p[i]) # Estimate log(CPO) for each area by taking minus logs of posterior means of inverse likelihoods G[i] <- 1/exp(L[i]); r[i] <- T[i]-y[i]}

Rest of code for BYM model • s[1:133]~ car.normal(adjcodes[], wneigh[], numneigh[], inv.delta.s) for (j in 1:1056) {wneigh[j] <- 1} # prior on residual var as propn of total var g ~dunif(0,1); V.s <- pow(sd(s[]),2); delta.h <- g*V.s/(1-g); inv.delta.h <- 1/delta.h; inv.delta.s ~dgamma(1,0.001); alph ~dflat()}

BYM Results • Inferences from iterations 1000-50,000 of 2 chain run. • 15 from 133 si parameters significant: posterior probabilities Pr(si>0|y) > 0.95 or < 0.05. • Average deviance (-2*logLKD) is 501 • DIC and pe are 577 and 76 respectively. log pseudo marginal likelihood is -315.4 CPOs vary from 0.001 to 0.945 • g estimated at 0.42.

LLB Model Code • model {for (i in 1:133) {y[i] ~dbin(p[i],T[i]) p.sig[i] <- step(r[i]); logit(p[i]) <- alph+r[i] .. r[i] ~dnorm(R[i],inv.V[i]); inv.V[i] <- inv.delta * (1-lam+lam*M[i]) R[i] <- (lam/(1-lam+lam*M[i]))*sum(Wr[C[i]+1:C[i+1] ])} # error vector over neighbours for (i in 1:1056) {Wr[i] <- r[map[i]] } inv.delta ~dgamma(1,0.001); alph ~dflat(); lam ~dunif(0,1)}

LLB vs BYM Results • Since LLB model is less parameterised (one set of area effects) might expect average deviance to rise. In fact average deviance unchanged at 501.4. Lower pe (73.6) → reduced DIC of 575. • LLB model log psML is -313, -315.4 for BYM → psBF=10 • λ estimated at 0.63 (95% interval, 0.25-0.97). • Adaptive LLB has log(psML)=-311.6 but DIC of 575.7; λi vary between 0.53 and 0.64 • Only slight increase in effective parameters under convolution model may reflect lesser significance of individual spatial effects.

LLB vs BYM, Significance of Area Effects • In LLB model 34 effects ri significant. In BYM model, only 15 si parameters significant, but 40 composite terms hi+si significant. No individual hi parameters significant • Such a pattern suggests no great advantage (in terms of substantive interpretation not just statistical terms) in having two distinct sets of effects

Multivariate Spatial Models: Spatially Varying Predictor Effects • Let yi be health outcome (e.g. counts of CHD deaths) with yi~Po(Eii). Let Xi be predictor of dimension P (e.g. social class, nonwhite ethnicity, water hardness). • Instead of same regression effect over all parts of region allow spatially varying regression coefficients log(i)=  + Xii where i=(1i,…Pi)' vary by area according to a multivariate version of the CAR prior.

Example of Multivariate CAR So have P regression effects for each area, correlated with each other within areas, and also correlated over areas. MCAR prior of Gamerman et al (2003) and others generalises univariate pairwise difference joint prior P(1,….,n|)  ||-0.5(n-1)exp[-0.5 j≠icij(i -j)-1(i-j)] where  is dispersion matrix of order P.

Multivariate CAR prior: Conditional Form • Conditional autoregressive form of this prior under contiguity adjacency is i|[i] ~ NP(Gi, /Mi) where Gi =[G1i,….. ,GPi] and Gpi is average of pj over j=1,..,Mi neighbours of area i for predictors p=1,..,P. • Proper priors need to be specified for  • In WINBUGS available as mv.car • Multivariate version of LLB prior also available

Multivariate Models: Spatial Factor and Spatial SEMs • Multivariate correlated outcomes (e.g. male and female smoking linked cancers such as lung, oral cavity, pancreas, bladder). Say J outcomes yij~Po(Eijij). Multivariate version of BYM approach log(ij)=j+Xij+sij+hij would imply MCAR prior of dimension J for sij and Multivariate normal of dimension J for hij.

Parsimonious Spatial Factor Models Instead factor model might specify single set of spatial and unstructured effects with impact measured by outcome specific loadings j and j as in log(ij)=j+Xij+jsi+jhi To identify model need to set scale of si and hi. Either set their variances (e.g. to 1) or fix loadings (e.g. set 1=1=1). Standardised factors constraint vs “anchoring” constraint (terminology of Skrondal & Rabe-Hesketh)

Interactions • Consider deaths or disease counts by area and age. Yix ~ Po(Pixmix) where Pix are populations by area i, age group x • Proportional or multiplicative model: age & area independent (no interaction in log-linear model) log(mix)=a+si+x (ref: Hoem (1987) Statistical analysis of multiplicative model Int Stat Rev) • Leads on to canonical relative risk models which partial out age, e.g. Yi ~ Po(Eii) • Simple relative risk model not always appropriate. Age & area often not independent e.g. mortality in middle/early old age higher in deprived areas (those with higher si).

Life table functionsObtain death probabilities qix from central death rates mix. Then numbers lix surviving to exact age x & other life table functions. Spatial life tables (Congdon 2001 Stat Modelling) with pooling strength for age & area effects. Gives full posterior densities for parameters (e.g. expectancies Eix), unlike mechanical life tables

Other life table functions

If age & area interact, then life expectancies better than relative risk ρito summarise area mortality contrasts • So model full dimension interactions uix in log linear model (random parameters same dimension as data) log(mix)=a+si+x+uix? • Less heavily parameterised is product interaction log(mix)=a+x+xsi where for identification the x sum to 1 and the hi sum to zero. (Congdon, Biometrics 2006). • Alternative constraint: the x constrained to average 1 → then if x equal, modelreduces to proportional model

Age effects in non-proportional model (data for London). Excess mortality in high mortality areas concentrated at ages under 60.

Another option (Congdon, Int J Health Geographics, 2007). Select out uix (form of random effects selection) log(ix)=a+x+si+dxuix; dx~Bern(πd), where πd small (e.g. πd=0.05) • Model for mortality in Chinese provinces. Pr(dx=1|y) > 0.9 for ages 0-4 and ages over 85. Discrepancies from multiplicative model greatest at extreme ages, e.g. multiplicative model underestimates child mortality and overestimates deaths in very old in underdeveloped provinces

Other Possibilities for Smoothing • Sparse deaths data for very small areas (super output areas in England with populations of around 1500) • Random effects smoothing over age & space may not be enough for such sparse data • Regression smoothing (aka parametric smoothing): use relational model linking logit(lix) to logit(lsx) where lsx are from large (standard) population logit(lix)=i+ilogit(lsx)+eix Provides smoothed l*ix = expit(i+ilogit(lsx)) from which other life table functions obtained

Alternatives to Bayes • Can use Empirical Bayes (e.g. GLLAMM in STATA or GLIMMIX in SAS) (see Modelling of discrete spatial variation in epidemiology using GLIMMIX. Computer Methods & Programs in Biomedicine, 76, Rasmussen) • However Bayes methods arguably have greater flexibility, and less reliant on simplifying assumptions of classical approaches

Bayesian models for area health and mortality variations; an overview Peter Congdon, Centre for Statistics and Dept of G