GIS and Spatial Statistics: Methods and Applications in Public Health

Institute of Behavioral Science, Computing and Research Services, and the Social Sciences Data Lab University of Colorado at Boulder - March 11, 2008 GIS and Spatial Statistics: Methods and Applications in Public Health Marcia Castro Assistant Professor of Demography Harvard School of Public Health

Spatial Statistics First Law of Geography (Tobler 1979) “Everything is related to everything else, but near things are more relatedthan distant things.”

Types of research questions • Spatial determinants of transmission • Spatial associations of risk factors with disease and interaction with temporal processes • Origins of diseases and outbreaks • Spatial and temporal distribution of disease and risk factors • Planning of surveillance program and targeting control activities • Improved allocation of limited resources

Types of spatial data • Points • Events – crimes, accidents, flu cases • Sample from a surface – air quality monitors, house sales • Objects – county centroids • Area • Aggregates of events – accidents per census tract • Summary measures – density, mean house value

Spatial Pattern Analysis • Some attributes • Testing of Hypothesis • Hypothesis generation • Pattern evolution • Pattern prediction • Clustering • Test spatial regression assumptions • Cannot unequivocally determine cause and effect • Cannot assign meaning to spatial relationships

Problems / Challenges • Modifiable areal unit problem (MAUP) • Scale effect – spatial data analysis at different scales may produce different results • Zoning effect – regrouping zones at a given scale may produce different results • Optimal neighborhood size • Alternative zoning schemes

Problems / Challenges • Spatial dependence • Tobler’s law • Spatial heterogeneity • Uneven distributions at the global scale • Boundary problems • Missing data • Confidentiality • Collection, analysis, publication, data sharing • Disclosure risk • Methods do mask data

Spatial Autocorrelation • Null hypothesis: • Spatial randomness • Values observed at one location do not depend on values observed at neighboring locations • Observed spatial pattern of values is equally likely as any other spatial pattern • The location of values may be altered without affecting the information content of the data Regular Random Aggregated

Spatial autocorrelation • Formal test of match between locational similarity and value similarity • Locational similarity defined by spatial weights • Binary or Standardized • Types of neighborhoods: • Contiguity (common boundary) • Distance (distance band, K-nearest neighbors) • General weights (social distance, distance decay)

Spatial autocorrelation • Test for the presence of spatial autocorrelation • Global • Local • LISA – Local Indicators of Spatial Autocorrelation

Local spatial autocorrelation – LISA • Moran’s Ii, Geary’s ci, Ki • Test CSR – positive and negative autocorrelation • positive - similar values (either high or low) are spatially clustered • negative - neighboring values are dissimilar

Local spatial autocorrelation – LISA • Gi(d) • Does not consider the value of location i itself • Used for spread or diffusion studies • Useful for focal clustering • e.g. cholera infection around a specific water source • Gi*(d) • Takes the value of location i into account • Most appropriate for the identification of clusters • High and low values • Choice of d is not straightforward

(23)  (24)  (10)  (22)  (20)  (11)  (3)  (21)  (4)  (2)  (19)  (9)  (1)  (5)  (12)  (8)  (17)  (6)  (7)  (18)  (16)  (13)  (15)  (14)  LocalStatistics

LocalStatistics • Multiple and dependent tests • Two sources of spatial dependence • Geometric • Between the values of nearby locations

LocalStatistics • Multiple comparison correction • Conservative – Bonferroni, Sidak • Probability that a true null hypothesis is incorrectly rejected - Type I error • False Discovery Rate • Proportion of null hypotheses incorrectly rejected among all those that were rejected Q = V / (V + S) Proportion of rejected hypotheses that are erroneously rejected FDR defined as the mean of Q:

FDR & Local Statistics

Methods • Geostatistics • Semivariogram & Kriging • Weight the surrounding measured values to derive a prediction for each location • Weights are obtained from the semivariogram

Semivariogram

Empirical values Creating the empirical semivariogram

Directional Influence (Anisotropy)

Fitting a model to the empirical semivariogram Empirical values Fitted model

Kriging • BLUE • Different models • e.g. Cokriging • Prediction error

Methods • Multivariate analysis • The presence of spatial autocorrelation violates the independence assumption of standard linear regression models • Checking residuals – Moran’s I • Geographically weighted regression • Local estimates of regression parameters • Spatial weights – distance-decay kernel functions • Not parsimonious

Methods • Multivariate analysis • Spatially filtered regression • Spatial econometrics • Spatial lag model (real contagion) • Value of the dependent variable in one area is influenced by the values of that variable in the surrounding neighborhood; • A weighted average of the dependent value for the neighborhood location is introduced as an additional covariate. • Spatial error model (false contagion model) • Omitted covariates; • Autoregressive error term is included.

Spatial Analysis & Policy Making “…although basic science is directed at the discovery of general principles, the ultimate value of such knowledge, apart from simple curiosity, lies in our ability to apply it to local conditions and, thus, determine specific outcomes. Although such science may itself be placeless, the application of scientific knowledge in policy inevitably requires explicit attention to spatial variation, particularly when the basis of policy is local.” (Goodchild, Anselin, Appelbaum and Harthorn 2000: 142)

GIS and Spatial Statistics: Methods and Applications in Public Health

GIS and Spatial Statistics: Methods and Applications in Public Health

Presentation Transcript

Spatial Databases: Lecture 2

Public Health and The Law Robert Kaman, JD, PhD The Summer Institute 2006

Introduction to PUBLIC HEALTH ETHICS

Descriptive Statistics Univariate Statistics Chi Square ANOVA

Spatial Autocorrelation: The Single Most Important Concept in Geography and GIS! Introduction to Concepts

Spatial Data Analysis

FY 2004 Allied Health Project Grants Technical Assistance http://bhpr.hrsa.gov/grants/applications/04allhlth.htm Allied

Using NAMCS and NHAMCS Data

A Local Health Department and Federally Qualified Health Center: The Public Entity Ninth Annual Rural Public Health Inst

National Conference on Health Statistics Session: How measurement and modeling of social determinants of health can info

Public Health Information Network (PHIN) Series I

Public Health Education Unbound

Multidimensional Access Methods

Chapter 3

Spatial organization

Surveillance: The Public Health Version of CSI

The Spatial Scan Statistic

Temple University – CIS Dept. CIS616– Principles of Data Management

Public Health 101 – An Introduction for Public Health Stakeholders

V. Megalooikonomou Spatial Access Methods (SAMs) I

Public Health Information Network (PHIN) Series II

Practical Applications of Statistical Methods in the Clinical Laboratory