1k likes | 1.11k Views
HEALTH APPLICATIONS OF BAYES TECHNIQUES. Population Health Perspectives. Peter Congdon Research Professor of Quantitative Geography & Health Statistics, QMUL e-mail: p.congdon@qmul.ac.uk http://www.geog.qmul.ac.uk/staff/congdonp.html http://webspace.qmul.ac.uk/pcongdon/.
E N D
HEALTH APPLICATIONS OF BAYES TECHNIQUES Population Health Perspectives
Peter CongdonResearch Professor of Quantitative Geography & Health Statistics, QMUL • e-mail: p.congdon@qmul.ac.uk • http://www.geog.qmul.ac.uk/staff/congdonp.html http://webspace.qmul.ac.uk/pcongdon/
What is population health? Major tasks in definition and analysis
Population Health: Ecological (Contextual) Risk Factors • To describe/analyze health variation over • areas or area categories (poverty status, area socioeconomic classifications, “deprivation quintiles”) • by area SES scales (deprivation gradients), or other area characteristics (social “fragmentation”, social capital) • according to area environmental exposures (e.g. pollution levels or categories)
Population health: individual level risk factors • To describe health variation over • demographic categories (age, race, gender, family type) • individual socioeconomic variables (income, education) • health behaviours (smoker or not, obese or not) • Assess how individual and contextual factors (aka upstream & downstream factors) interact in their impacts on health
Role of Statistical Analysis • Assess which potential sources of variation in health are significant (or not) • Summarise health variations parametrically • Provide stable estimates
Policy Implications of Population Health Analysis: Tasks • Assess how health need (need for healthcare) is distributed over areas or social groups • to guide distribution of scarce health resources and effective targeting of healthcare interventions • May involve “health need indices” based on characteristics of areas or area populations
AREA STUDIES. • A major focus of my talk will be on models for spatial variations in health, and predictors of those variations (“ecological studies”) • These models typically use area counts (deaths, incidence or prevalence totals) from official registration systems • Statistical models often seek to assess the implications of area health variation (e.g. locating areas with excess risk, ranking areas according to health risk, measuring inequality, smoothing ragged observed area rates).
Area Studies (continued) • Relevance of “ecological studies” (despite “ecological fallacy”) to broader upstream/downstream debate: what are contextual effects, what causes them, how should they be modelled, etc • Crude rates for rare events unreliable stabilized/robust/smoothed area health outcome rates essential to accurate description of population health. • Smoothing may draw on spatial structure of known or unknown risk factors
Spatial clustering of risk factors (or indicators of latent area SES)…
MULTILEVEL & SURVEY STUDIES. • I will also consider multilevel perspectives • Assess how individual and contextual factors interact in their impacts on health, e.g. area variables may act as effect modifiers for individual risk factors. • Health surveys (e.g. Health Survey for England, Behavioral Risk Factor Surveillance System) are the most suitable for analysing the effects on health of age, ethnicity, and individual SES, and their possible modification by geographic variables. • But administrative or census data also can be analyzed profitably by ML methods
Outline RELEVANT METHODS • Some distinctive aspects of methodologies for modelling population health APPLICATION THEMES • 1 Assessing varying health risks in areas • 2 Spatially varying predictor effects • 3 Age and area: life table methods • 4 Spatial aspects of health care use • 5 Multilevel modelling • 6 Prevalence Modelling • 7 Common Spatial Factor Models
Relevant Methods: general linear model regression • General linear models (e.g. with count or binary response) more frequently used than linear models • For health survey data often need binary regression, with logit or log link, and maybe accounting for differential survey weights • For area counts of health events (e.g. deaths, prevalence) typically need Poisson or binomial or over-dispersed versions of these densities
Relevant methods: pooling strength • Often use random effects to pool strength (or borrow strength) over areas or other relevant dimensions (e.g. age) • Essentially refers to adding stability/precision to estimates by referring to overall population density • Maybe pool strength over variables too multivariate random effect and common factor models
Relevant methods: spatially oriented applications • In area health applications, both outcomes (e.g. mortality or prevalence) and ecological risk factors (e.g. area deprivation, area smoking rates) are typically spatially structured. Also applies to “unknown” risk factors • So in statistical models, spatially correlated random effects often involved (in Bayesian terminology “spatial priors”) • Modelling aims to account for spatial structure • Inter alia, a good model will ensure regression residuals are free of spatial correlation
Relevant methods: hierarchical nesting and interactions across levels • In multilevel modelling effects of individual risk factors may vary according to area contexts • For example, ethnic relativities in diabetes prevalence may not be constant over areas • So use random effects (maybe spatially structured) to model spatial variation in impacts of individual level risk factors
Relevant methods: benefits of MCMC and Bayesian techniques • Bayesian approach using MCMC sampling assists in monitoring “derived parameters” or outputs, providing full densities, and in testing hypotheses about derived parameters. • Classical estimation typically provides confidence intervals under assumed asymptotic normality for model parameters only, with delta method for derived parameters • Bayesian approach arguably more flexible for models with multiple or nested random effects, or where there is partially missing data
An example of a derived model output that is not the model response. Model response (Poisson) are deaths by area & age. Derived model output is life expectancy EASTERN REGION OF ENGLAND, MALE LIFE EXPECTANCY. Ref: Congdon , 2009, International Statistical Review
Relevant methods: latent variable techniques • Many relevant risk or outcome variables for analysing population health can be regarded as latent constructs, not directly observed but proxied by several observed variables • Examples in area studies: area unemployment or rates of social housing are proxies for area construct “deprivation” • Examples in survey studies: battery of survey items on neighbourhood perceptions and trust are proxies for individual level construct “social capital”
APPLICATION THEME 1 • ASSESSING VARYING HEALTH RISKS IN AREAS
Maximum Likelihood • Observed data (e.g. death totals by area i) are y[i], and E[i] are expected event totals. • Limitations of conventional (fixed effects) maximum likelihood estimates of relative risks (or “standard mortality ratios”) y[i]/E[i] as description of spatial contrasts. • OR data might be y[i] and populations P[i], MLE (e.g. crude death rate) is y[i]/P[i] (or such rates feed into “age standardised” rate) • OR data: y[i] (infant deaths) and births B[i]. MLE is y[i]/B[i]
MLE estimation (continued) • Maximum likelihood approach (underlies conventional demographic techniques) treats each area (or risk category) as a separate isolated entity, taking no account of: • overall average for the event, • the location of the area, or risk category in relation to other areas (or risk categories) • By neglecting the broader context, MLE estimates also potentially unstable
Bayesian Approach • Under Bayesian random effects, information on the pattern of disease risk across the collectivity of areas (or risk categories) is used to provide an estimate of the underlying relative risk for each area (or risk category) • Treat each area’s outcome with reference to the ensemble of areas • The “prior” specifies the chosen overarching density of relative risk (e.g. normal or gamma) and whether or not the density specifies local or global pooling of strength.
Adaptive Spatial Priors • However, may be unwise to uncritically assume complete spatial dependence - or homogenous spatial correlation. • So allow for some unstructured variation or for spatial outliers • Spatial outliers: areas unlike their neighbours, e.g. socially dissimilar (example, suburban “social housing” estates surrounded by owner occupied housing areas) • Allow extent of spatial dependence to vary across the map • Congdon, 2008, Statistical Methodology
Policy relevant posterior inferences • Use of spatial risk modelling for policy inferences • One may assess for example, the posterior probability that a particular area has an elevated relative risk (compared to the average) • Assume RR=1 on average. Then simply count the proportion of MCMC iterations where condition RR[i]>1 holds • More complicated to do this under frequentist approaches • e.g. Congdon, Health and Place, 1997, article on area contrasts in suicide and attempted suicide in NE London
APPLICATION THEME 2 • EXTENDING THE SMOOTHING PRINCIPLE: Spatial Heterogeneity In Regression Effects
Spatial Models for Regression Effects • Spatial pooling of strength may be applied not only to disease risks but to effects of area risk factors. Example: how are lung cancer incidence relativities iaffected by area smoking rates xi • Conventionally assume constant slope on xi over all areas • However, risk relationship may vary (smoothly) over space varying slopes i • e.g. Congdon, Health and Place, 1997, article on area contrasts in suicide and attempted suicide in part of NE London (x=deprivation)
Application Theme 3 • EXTENDING THE SMOOTHING PRINCIPLE: Smoothing over areas and ages to derive small area life tables
Modelling area and age effects • Modelling mortality data yix (and maybe illness data hix too) by both area i and age group x • As before, neighbouring areas have similar rates under prior incorporating spatial dependency • But also assume neighbouring ages have similar rates under pooling (random effects) prior • Technically, often use “state space” or “random walk” priors for age effects
Why assume similar ages related: strong correlation in successive age effects
Health and Mortality • Congdon 2006, Demographic Research, A model for geographical variation in health and total life expectancy • Spatial Framework, 33 London Boroughs, ca 230k population on average • Use illness data (long term ill status from 2001 UK Census) as well as deaths data (bivariate outcome) • With mortality and illness data can model both total life expectancy and healthy life expectancy - difference between expectancies is expected years lived in disability (“disease burden”) • Correlation between disease burden & area deprivation
Life Expectancies • Calculate life expectancies Eix for areas i and ages x using usual life table calculations and “smoothed” age and area specific mortality rates Mix • Life expectancy at birth Ei0. • Monitor “derived outcomes” Ei0 in MCMC whereas likelihood for deaths uses Mix (“actual model parameters”) • Problems with conventional calculations for life expectancies when populations small, rates Mix unstable apply Bayesian random effects smoothing
Goal: model should reflect spatial clustering in “derived outcomes” • Congdon (2007) A model for spatial variations in life expectancy; mortality in Chinese regions. Int J Health Geographics • Negative binomial model because of large death counts/overdispersion but allowing for correlated area and age effects
Spatio-temporal models • Similar ideas apply if the second dimension is time rather than age • Correlation between adjacent times is expected and should be included in the model • For example, could have “random walk” in time parameters
Area-age-time model with “derived output” • Congdon, 2004, J Appl Stat “Modelling Trends and Inequality in Small Area Mortality” has three dimensions: area, age, time (years) in an analysis of area mortality through time • “Derived outputs” monitored by MCMC are Theil and Gini indices of inequality in life expectancies Eit between areas i =(1,..,n) at year t. • If Rit=Eit/Et where Et is average, then Theil entropy index in year t is Ht=i [Ritlog(Rit)]/n