Spatial autoregressive methods

Spatial autoregressive methods Nr245 Austin Troy Based on Spatial Analysis by Fortin and Dale, Chapter 5

Autcorrelation types • None: independence • Spatial independence, functional dependence • True autocorrelation>> inherent autoregressive • Functional autocorr>> induced autoregressive

Autocorrelation types • Double autoregressive • Notice there are now two autocorrelation parameters r-x and r-z

Effects? • Standard test statistics become “too liberal”—more significant results than the data justify • Because observations are not totally independent have lower actual degrees of freedom, or lower “effective sample size”: n’ instead of n; since t stat denominator = s/n, if n is too big it inflates the t statistic Above: simulations yield 4x the type 1 errors for inherent AR than expected, induced AR model yields 2 x and double AR model is 8x

What to do? Non-effective • Why not just adjust up the significance level? E.g. 99% instead of 95%? Because don’t how by how much to adjust without further information. Could end up with a test that is way too conservative • Why not just adjust sampling to only include “independent samples?” Because wasteful of data and because easy to mistake “critical distance to independence”

Best approach: Adjust effective sample size • In presence of SA, variance of mean of obs can be adjusted sing covariances of Xs Cov(Xi, Xj) becomes • For large sample sizes • So for instance n=1000 and ro=.4 means n’=429 • Problem is that, to be useful, autoregressive model (ro parameter) has to be an effective descriptor of the structure of autocorrelation of the data, but it’s a simplification • Next step therefore is factoring in correlation matrix, R, based on lag distances r(d)

Moving average models • At 1st order we get a matrix like: • Half of info for Xi contained in Xi+1 • Half contained in Xi-1 • Hence only every other ob. Needed • So produce ro=.5 for large n and n’=n/2. n’=n/2 • A k order model can take form • Translates into generalized matrix form • With variance covariance matrix

Moving average • When you increase the order, calculating sample size gets complicated; e.g. second order model, where two ro parameters now • Important point: If there are several different levels of autocorrelation (rk), each rk must be incorporated even if non-significant • Using only significant values can understate n’ • Fortin and Dale recommend not using moving average approach because very sensitive to irregularities in the data and can produce a wide range of estimates

Two dimensional approaches • Problem with MA approach as it was just presented is assumes one-dimensionality • In 2-d spatial data, xi depends on all neighbors most likely • Now must define what is “neighbor” in 2d (e.g. w=1/8 for 9 cell grid of neighbors, all else = 0) • Two best ways for dealing with this: • Simultaneous autoregressive models (SAR) • Conditional autoregressive models (CAR) • CAR’s neighborhood matrices specify relationship between lagged response values at each location and neighboring location • SAR’s specify relationship between lagged residuals • Both use nxn spatial weights matrix (W) composed of wij • Can be based on adjacency, number neighbors or distance • Zeros on diagonals, weights on off diagonals • In both SAR and CAR, SA tends to persist across long distances

CAR • More commonly used in spatial statistics • Not based on spatial dependence per se; instead probability of a certain value is conditional on neighbor values • Here Where j is the autocorrelation parameter and V is a symmetrical weight matrix • Symmetrical requirement means that directional processes can’t be modeled.

SAR • Based on concept of set of simultaneous equations to be solved. In this xi and xi-1 are each defined by their own equations containing other xs • Where x is a vector and is linearly dependent on a vector of underlying variables z1,z2z3…. Given as matrix Z, u is a vector non-independent error terms with mean zero and var-covar matrix C • Spatial autocorrelation enters via u where • Here e is independent error term and W is neighbor weights standardized to row totals of 1. W is not necessarily symmetrical, allowing for inclusion of anisotropy.Wijis >0 if values at location i is not independent of value at location j

SAR • This yields the model • With variance covariance matrix (from u) • Note how similar to MA—difference is no inverse in formula • The elements of C are variances From Fortin and Dale p. 231

SAR • Advantages: doesn’t require weight matrix to be symmetrical, so can model anisotropic phenomena. • SAR can take three forms • Lagged response model: autoregressive process only occurs in the response variable • Lagged mixed model, where SA affects both response and predictors • Spatial error model: assumes SA process occurs only in error term and not in response or predictor

Spatial autoregressive methods