470 likes | 732 Views
Deana D. Pennington, PhD University of New Mexico. Spatial Modeling and Analysis. What is spatial analysis?. Analyses where the data are spatially located and explicit consideration is given to the possible importance of their spatial arrangement in the analysis. Statistical Issues.
E N D
Deana D. Pennington, PhD University of New Mexico Spatial Modeling and Analysis
What is spatial analysis? Analyses where the data are spatially located and explicit consideration is given to the possible importance of their spatial arrangement in the analysis
Statistical Issues Valid statistics depend on: • Temporal stability and causal transience • Unit homogeneity • Independence • Constant effects BUT Ecology & Earth Science violate all of these! We study: • Change with time (no temporal stability) • Legacies, persistence, recovery (no causal transience ) • Heterogenity through space and time (no unit homogeneity • Spatial structure (no independence) • Differences in response through space/time (non-constant effects) • Attributes rather than causal factors, which must be inferred
Issues in Spatial Analysis • Error • Small sample sizes compared with size of environmental data sets • Spatial dependency • Spatial heterogeneity • Boundaries effects • Modifiable Areal Unit Problem
Spatial Dependency Tobler’s Law: All things are related, but nearby things are more related than distant things ***Field samples tend to be taken from nearby locations, and are almost always spatially autocorrelated*** Non-independent observations: duplicates observations in the sample set, therefore is a loss of information compared with independent observations. Affects mean, variance, confidence intervals and significance tests
Heterogenity in spatial data Spatial Heterogeneity • Stratification of the landscape (regions, classes, etc) problematic due to gradational nature • Intra-strata variability, mixtures • Differences in numbers of observations within strata
7 Clouds 23 Roads 33 River 23 Barren 22 Riparian 28 Agriculture 38 Arid upland 25 Hyperspectral Example True Color False Color 6 km2 *low % samples *errors in samples 300 x 300 pixels, 192 training pixels out of 90,000 total pixels, 7 mislabeled
7 Hyperspectral Results River/agriculture • Confusion between river & agriculture Riparian Riparian • Confusion between clouds and barren Riparian K-means Unsupervised 10 classes • Unsampled semi-arid upland Riparian Clouds/barren • Mislabeled arid upland Arid upland • Unsampled variability in riparian Arid upland Semi-arid upland • Road variability Semi-arid upland
7 Unclassified Clouds River • Confusion between river & agriculture Riparian Arid upland • Confusion between clouds and barren Roads • Unsampled semi-arid upland Barren • Mislabeled arid upland (4.4%) Agriculture • Unsampled variability in riparian • Road variability K-means Unsupervised Maximum Likelihood 89.44% Naïve Bayesian 83.33% Parallelepiped 82.78% Support Vector Machine 77.22% Minimum Distance 69.44%
Boundary Effects • Loss of neighbors in analyses that depend on neighborhood values • Solution: collect data along a border outside of the analysis area
Modifiable Areal Unit Problem (MAUP) • Results sensitive to cell size, location, orientation
Components of Spatial Analysis Exploratory Spatial Data Analysis (ESDA) Finding interesting patterns. Visualization Showing interesting patterns. Spatial Modeling Explaining interesting patterns.
Spatial Analyses Things to consider: • Objective: describe, map, causation • Data type: binary (Y/N), categorical, continuous • Expected pattern: gradient, periodic, clustered • Scale of pattern • Univariate/multivariate
Spatial Analyses Biological survey where each point denotes the observation of an endangered species. If a pattern exists, like this diagram, we may be able to analyze behavior in terms of environmental characteristics • Quantify pattern • Attraction or repulsion • Directionality • Make inferences about process based on observed pattern
Choices Make maps from points Distance interpolation Kriging Trend surface analysis Spline Network Analysis Path analysis Allocation Connectivity Test models with space as causal factor Mantel test Mantel correlogram Multivariate analysis Describe spatial structure Point pattern analyses Context Adjacency measures Cross variogram Cross correlogram Gradient, periodic Single scale of pattern Semivariogram Correlogram Multiscale pattern Spectral analysis Single scale of pattern Quadrat analysis Nearest neighbor Multiscale pattern Refined nearest neighbor 2nd order analysis Ripley’s K Self-similarity Fractal dimension Edge Wavelet analysis
Uniform (repulsion) Point Pattern Analysis Clustered (attraction)
Point Pattern Analysis Statistical tests for significant patterns in data, compared with the null hypothesis of random spatial pattern The standard against which spatial point patterns are compared is a: Completely Spatially Random (CSR) Point Process Poisson probability distribution (mean = variance) used to generate spatially random points
Clustered # of cells Expected CSR = null hypothesis Uniform # of pts/cell Quadrat Analysis • Divide the area up into quadrats • Count the number of points in each quadrat • Compare counts with expected counts in random distribution Expected mean #/cell in CSR l = N/# of quads For Poisson distribution: p(x) = (e-llx)/x! Chi square C2 = (observed – expected)2/expected #OiP(x)Ei 0 2 0.0156 0.39 1 2 0.0649 1.62 5.39 2.42 2 5 0.1350 3.38 3 1 0.1873 4.68 … S C2 Check Chi square table If Ho rejected: Mean <> variance Mean > variance (uniform) Mean < variance (clustered)
Nearest Neighbor Distance • Calculate the distance to the nearest neighbor for every point • Calculate mean nn distance • Calculate expected mean for CSR distribution E(di) = 0.5 A/N • Compare expected mean to observed mean with Z statistic • Z = [ d – E(di)] / [0.0683 A/N2] Look up in significance in z-statistic table If Ho rejected, observed mean < expected and Z < 0 => clustered observed mean > expected and Z > 0 => uniform
Uniform Clustered Ripley’s K • Expand a circle of increasing radius around each point • Count the number of points within each circle. • Calculate L(d), a measure of the expected number of points within distance (d); L(d) = [ASkij/pN(N-1)]0.5, where A = area, Skij = number of points j within distance d of all i points • Monte Carlo simulations or t-test Expected CSR mean L(d) Radius ***Note added information – mean clustering distance
Analysis of Continuous Data • Variation in mean values • Describe local variability & spatial dependence
Mean trends Input Output Focal Zonal Global or table Single value (surface analysis)
Species A habitat Species B habitat Range Species A = 4 cells Species A depends on B Grid Analysis: Focal Analysis Spatial filters: output value for each cell is calculated from neighboring cells (moving windows) Neighborhood shapes: Majority Maximum Mean Median Minimum Range Standard deviation Sum Variety • Low pass: Smoothing, removing noise • High pass: Emphasize local variation • Edge enhancement
Grid Analysis: Zonal Analysis Area Centroid Geometry Perimeter Majority Maximum Mean Median Minimum Range Statistics Standard deviation Sum Thickness Variety Vegetation class A or land use A Vegetation class B or land use B Vegetation class C or land use C Output is: a) grid with same value in each cell for a given zone b) table with values by zone
Geostatistics Basics Parametric Stats UnivariateMultivariate Spatial Stats UnivariateMultivariate mean variance x semi-variance lag correlation lag covariance x, h h = lag (time or space) cross-semivariance (variogram) cross correlation ||inverse cross covariance (correlogram) x, y, h correlation covariance x, y
Local mean w.r.t study extent N Variance: s2 = S (xi – x )2 i=1 N Nh Semi-variance: gh = S (xi – xi+h )2 i=1 2Nh Xi+h Xi Semi-variance gh • Slide x through space to get gh • Vary h
Local mean Nh Semi-variance: gh = S (xi – xi+h )2 i=1 2Nh h = 1….Nh = 9 h = 5….Nh = 5 Semi-variance gh Number of cells N = 10 Number of windows Nh = # cells – h Xi+h Xi Limit h to 1/3 of study extent
Next x Nh Semi-variance: gh = S (xi – xi+h )2 i=1 2Nh Sill gh Nugget spatial dependence independence 0 h Range Semi-variogram If xi is similar to xi+h , gh is small, and they are spatially correlated If xi is not similar to xi+h , gh is large, and they are not spatially correlated => gh measures heterogeneity Nugget – value of gh at distance 0 (not in data) – measure of unexplained variability Range – distance h of leveling off – below range heterogeneity is increasing in a predictable manner, above range, heterogenity is constant – measure of independence Sill – measure of maximum heterogeneity in data (gmax)
Semi-variograms periodic, cyclic gradient, no sill or range gh gh 0 0 h h Examples: timber harvest, forest age range ~ harvest area sill ~ rotation
Xi-h Lag Covariance: Geary’s C Centered around mean values of x, x Local mean Nh Lag covariance: Ch = S (xi – xi-h )(xi – xi+h ) i=1 Nh Xi+h Xi If x, xi+h and xi-h are all the same, Ch = 0 If values are increasing or decreasing through space (xi-h < x < xi+h, or xi-h > x > xi+h, 1 term is negative and Ch = negative, things are not similar. Otherwise positive, things are similar Correlograms have the inverse shape of semi-variograms
Nh Lag covariance: Ch = S (xi – xi-h )(xi – xi+h ) i=1 Nh Lag correlation Ph = Ch Sx-h Sx+h Lag Correlation: Moran’s I Centered around mean values of x, x Standardized against sample variation
gh 0 h Comparison Correlated Independent Semi-variance gh 0 < Gh < Lag Covariance Geary’s C Ch - < Ch < Lag Correlation Moran’s I Ph -1 < Ph < +1 +1 zero Ch Ph - -1 0 0 h h similar h range
Surface Analysis • Spatial distribution of surface information in terms of a three-dimensional structure Surfaces do not have to be elevation, but could be population density, species richness, or any other measured attributed
Kriging • Uses semi-variogram to determine relative importance (weighting) of data at different distances • Uses global variation, only works well if semi-varigram captures variation across entire map Trend analysis Spline • Calculates a best-fit polynomial equation using linear regression • Recalculates all positions using equation (lose original data) • Smoothing depends on polynomial order • Calculates a 2-D minimum curvature surface that passes through every input point Surface Analysis Given geolocated point data, calculate values at regular intervals between points Inverse distance weighting • Can’t create extremes (ridges, valleys) • Isotropic influence (not ridge preserving) • Best with dense samples
Network Analysis • Designed specifically for line features organized in connected networks, typically applies to transportation problems and location analysis • Streams • Dispersal vectors • Community interactions
Network Analysis • Pathfinding: shortest or least cost • Allocation of network areas to a center based on supply, demand and impedance • Connectivity
Integrated Analysis Gauge Points Field Data (Vector) DEM Watershed Hydro Model Samples Grid Process Land Cover Statistics Soil Modeling- regression, et al.
Sampling • Spatial dependency must be considered in sample design • Non-independent observations • Fewer degrees of freedom • Differences within groups will appear small => over estimate significance of between group variation • Spatial structure & heterogeneity can affect experimental results – response due to treatments or due to inherent spatial structure? • Solutions: • include space as an explanatory variable (Mantel test) • Sample at greater distance than the variogram range
Excel File Sample 1, lat, long, species, presence Access File Sample 3, lat, long, species, absence Vegetation cover type Sample 2, lat, long, species, presence Integrated data: Elevation (m) P, juniper, 2200m, 16C P, pinyon, 2320m, 14C A, creosote, 1535m, 22C Mean annual temperature (C) Example: Integrating Species Occurrence Points and Images • Semantics • Compatible scales • Reproject • Resample grain • Clip extent • Sample occurrence points
Geographic patterns of species richness of 17 native rodent species. Sanchez-Cordero and Martinez-Meyer, 2000 ENM Results Model building and testing. a) training data; b) predictive model. Peterson, Ball and Cohoon, 2002