260 likes | 494 Views
Institute of Behavioral Science, Computing and Research Services, and the Social Sciences Data Lab University of Colorado at Boulder - March 11, 2008. GIS and Spatial Statistics: Methods and Applications in Public Health. Marcia Castro Assistant Professor of Demography
E N D
Institute of Behavioral Science, Computing and Research Services, and the Social Sciences Data Lab University of Colorado at Boulder - March 11, 2008 GIS and Spatial Statistics: Methods and Applications in Public Health Marcia Castro Assistant Professor of Demography Harvard School of Public Health
Spatial Statistics First Law of Geography (Tobler 1979) “Everything is related to everything else, but near things are more relatedthan distant things.”
Types of research questions • Spatial determinants of transmission • Spatial associations of risk factors with disease and interaction with temporal processes • Origins of diseases and outbreaks • Spatial and temporal distribution of disease and risk factors • Planning of surveillance program and targeting control activities • Improved allocation of limited resources
Types of spatial data • Points • Events – crimes, accidents, flu cases • Sample from a surface – air quality monitors, house sales • Objects – county centroids • Area • Aggregates of events – accidents per census tract • Summary measures – density, mean house value
Spatial Pattern Analysis • Some attributes • Testing of Hypothesis • Hypothesis generation • Pattern evolution • Pattern prediction • Clustering • Test spatial regression assumptions • Cannot unequivocally determine cause and effect • Cannot assign meaning to spatial relationships
Problems / Challenges • Modifiable areal unit problem (MAUP) • Scale effect – spatial data analysis at different scales may produce different results • Zoning effect – regrouping zones at a given scale may produce different results • Optimal neighborhood size • Alternative zoning schemes
Problems / Challenges • Spatial dependence • Tobler’s law • Spatial heterogeneity • Uneven distributions at the global scale • Boundary problems • Missing data • Confidentiality • Collection, analysis, publication, data sharing • Disclosure risk • Methods do mask data
Spatial Autocorrelation • Null hypothesis: • Spatial randomness • Values observed at one location do not depend on values observed at neighboring locations • Observed spatial pattern of values is equally likely as any other spatial pattern • The location of values may be altered without affecting the information content of the data Regular Random Aggregated
Spatial autocorrelation • Formal test of match between locational similarity and value similarity • Locational similarity defined by spatial weights • Binary or Standardized • Types of neighborhoods: • Contiguity (common boundary) • Distance (distance band, K-nearest neighbors) • General weights (social distance, distance decay)
Spatial autocorrelation • Test for the presence of spatial autocorrelation • Global • Local • LISA – Local Indicators of Spatial Autocorrelation
Local spatial autocorrelation – LISA • Moran’s Ii, Geary’s ci, Ki • Test CSR – positive and negative autocorrelation • positive - similar values (either high or low) are spatially clustered • negative - neighboring values are dissimilar
Local spatial autocorrelation – LISA • Gi(d) • Does not consider the value of location i itself • Used for spread or diffusion studies • Useful for focal clustering • e.g. cholera infection around a specific water source • Gi*(d) • Takes the value of location i into account • Most appropriate for the identification of clusters • High and low values • Choice of d is not straightforward
(23) (24) (10) (22) (20) (11) (3) (21) (4) (2) (19) (9) (1) (5) (12) (8) (17) (6) (7) (18) (16) (13) (15) (14) LocalStatistics
(23) (24) (10) (22) (20) (11) (3) (21) (4) (2) (19) (9) (1) (5) (12) (8) (17) (6) (7) (18) (16) (13) (15) (14) LocalStatistics
LocalStatistics • Multiple and dependent tests • Two sources of spatial dependence • Geometric • Between the values of nearby locations
LocalStatistics • Multiple comparison correction • Conservative – Bonferroni, Sidak • Probability that a true null hypothesis is incorrectly rejected - Type I error • False Discovery Rate • Proportion of null hypotheses incorrectly rejected among all those that were rejected Q = V / (V + S) Proportion of rejected hypotheses that are erroneously rejected FDR defined as the mean of Q:
Methods • Geostatistics • Semivariogram & Kriging • Weight the surrounding measured values to derive a prediction for each location • Weights are obtained from the semivariogram
Empirical values Creating the empirical semivariogram
Fitting a model to the empirical semivariogram Empirical values Fitted model
Kriging • BLUE • Different models • e.g. Cokriging • Prediction error
Methods • Multivariate analysis • The presence of spatial autocorrelation violates the independence assumption of standard linear regression models • Checking residuals – Moran’s I • Geographically weighted regression • Local estimates of regression parameters • Spatial weights – distance-decay kernel functions • Not parsimonious
Methods • Multivariate analysis • Spatially filtered regression • Spatial econometrics • Spatial lag model (real contagion) • Value of the dependent variable in one area is influenced by the values of that variable in the surrounding neighborhood; • A weighted average of the dependent value for the neighborhood location is introduced as an additional covariate. • Spatial error model (false contagion model) • Omitted covariates; • Autoregressive error term is included.
Spatial Analysis & Policy Making “…although basic science is directed at the discovery of general principles, the ultimate value of such knowledge, apart from simple curiosity, lies in our ability to apply it to local conditions and, thus, determine specific outcomes. Although such science may itself be placeless, the application of scientific knowledge in policy inevitably requires explicit attention to spatial variation, particularly when the basis of policy is local.” (Goodchild, Anselin, Appelbaum and Harthorn 2000: 142)