Dealing with Spatial Autocorrelation

Dealing with Spatial Autocorrelation Spatial Analysis Seminar Spring 2009

Spatial Autocorrelation Defined • “…the property of random variables taking values, at pairs of locations a certain distance apart, that are more similar (positive autocorrelation) or less similar (negative autocorrelation) than expected for randomly associated pairs of observations.” • Legendre (1993)

Types of Spatial Autocorrelation • Inherent autocorrelation: caused by “contagious biotic processes” vs. • Induced spatial dependence: biological variables of interest are functionally dependent on one or more autocorrelated exogenous variable(s)

Why Should We Care? • “natural systems almost always have autocorrelation in the form of patchiness or gradients…over a wide range of spatial and temporal scales.” • Fortin & Dale (2005) → Autocorrelation is a “fact of life” for ecologists!

2 Views of Spatial Autocorrelation: • It’s a nuisance that complicates statistical hypothesis testing • It’s functionally important in many ecosystems, so we must revise our theories and models to incorporate spatial structure • Either way, the first step involves describing the autocorrelation (i.e., the “spatial structure”)

Describing Spatial Autocorrelation • Compute Moran’s I or Geary’s c coefficients over multiple distances • Correlogram: plot distance on X-axis against correlation coefficient on Y-axis • Mantel correlogram: multivariate response • Semi-variogram/variogram

Example Data • Wetland hardwood forest (5 x 5 m cells) • Response variable: log of non-ground lidar points in 0-1 m vertical height bin • n1 = 217, n2 = 68 • Welch’s t-test (unequal variance, unequal sample sizes) results: t = 2.33, df = 181, p-value ≈ 0.021

Moran’s I correlograms

Now what do I do??? • Adjusting the effective sample size • Spatial statistical modeling methods • Restricted randomization • Other methods: canonical ordination, partial Mantel tests, etc.

Adjusting the Effective Sample Size • Estimate of effective sample size (Fortin & Dale 2005, p. 223, Equation 5.15): • For first-order autocorrelation ρ and large n:

Adjusting the Effective Sample Size • For the “Recently Burned” example data: • For the “Long Unburned” example data: • Welch’s t-test results: t = 1.76, df = 123, • p ≈ 0.080 • BUT, this is a very simplistic model!

Detour: Autocorrelation Models • Model 1 (“spatial independence”): • Model 2 (“first-order autoregressive”): • Model 3 (“induced autoregressive”): • Model 4 (“doubly autoregressive”): SOURCE: Fortin & Dale (2005), pp. 213-216

Detour: Autocorrelation Models • The models on the previous slide were one-dimensional, but most spatial data is two-dimensional (Lat-Long, XY-coordinates, etc.) • The two-dimensional spatial autocorrelation model incorporates W, a “proximity matrix” of neighbor weights, which in turn affects the variance-covariance matrix (C):

Generalized Least Squares (GLS) • Relatively easy way to introduce spatial autocorrelation structure to linear models • Fits a parametric correlation function (exponential, Gaussian, spherical, etc.) directly to the variance-covariance matrix • Assumes normally distributed errors, but errors are allowed to be correlated and/or have unequal variances • Built-in R package: nlme

GLS Model – No Spatial Structure library(nlme) … ## Model A: spatial independence ModelA <- gls(LN_COUNT~BURNED,data=SAC_data) plot(Variogram(ModelA, form=~x+y))

GLS Models with Spatial Structure > ModelB <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corAR1()) > ModelC <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corExp(form=~x+y)) > ModelD <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corGaus(form=~x+y)) > ModelE <- gls(LN_COUNT~BURNED,data=SAC_data,corr=corSpher(form=~x+y)) > AIC(ModelA,ModelB,ModelC,ModelD,ModelE) df AIC ModelA 3 702.1288 ModelB 4 677.3121 ModelC 4 591.7996 ModelD 4 607.3873 ModelE 4 604.7950 > anova(ModelA,ModelC) Model df AIC BIC logLik Test L.Ratio p-value ModelA 1 3 702.1288 713.0652 -348.0644 ModelC 2 4 591.7996 606.3814 -291.8998 1 vs 2 112.3293 <.0001 → Exponential GLS model seems to fit best

Other Autocorrelation Models • Conditional autoregressive (CAR), simultaneous autoregressive (SAR), and moving average (MA) models • See pp. 229-233 of Fortin & Dale (2005) • Implemented in R package spdep, as well as SAM (Spatial Analysis for Macroecology) software • Generalized linear mixed models (GLMMs): R built-in packages MASS, nlme • But wait, there’s more: see Dormann et al. (2007) review paper in Ecography (30) 609-628.

Models and Reality • “Much of the treatment of spatial autocorrelation in the statistical literature is predicated on the simplest AR model, which produces an exponential decline in autocorrelation as a function of distance (Figure 5.16).” • Fortin & Dale (2005, pp. 247-248) • BUT, simple corrections based on first-order AR don’t account for effects of potentially negative autocorrelation at greater distances

Restricted Randomization • PROBLEM: randomization tests based on complete spatial randomness will destroy autocorrelation structure • POTENTIAL SOLUTIONS: • “Toroidal shift” randomization (Figure 5.12) • Contiguity-constrained permutations (see Legendre et al. 1990 for algorithms)

Conclusion • Incorporating spatial structure into ecological models was identified by Legendre as a “new paradigm” in 1993, BUT… • …ecologists are still refining their methods for dealing with spatial autocorrelation • OUR LAST HOPE?: Dale, M.R.T. and M.-J. Fortin. (in press). Spatial Autocorrelation and Statistical Tests: Some Solutions. Journal of Agricultural, Biological, and Environmental Statistics.

Spatial autocorrelation, don’t make me open this…

Dealing with Spatial Autocorrelation