350 likes | 497 Views
Chapter 5. Part A: Spatial data exploration. Spatial data exploration. Spatial analysis and data models (Anselin, 2002). Spatial data exploration. Sampling frameworks Pure random sampling Stratified random – by class/strata (proportionate, disproportionate) Randomised within defined grids
E N D
Chapter 5 Part A: Spatial data exploration www.spatialanalysisonline.com
Spatial data exploration • Spatial analysis and data models (Anselin, 2002) www.spatialanalysisonline.com
Spatial data exploration • Sampling frameworks • Pure random sampling • Stratified random – by class/strata (proportionate, disproportionate) • Randomised within defined grids • Uniform • Uniform with randomised offsets • Sampling and declustering www.spatialanalysisonline.com
Spatial data exploration • Sampling frameworks – point sampling www.spatialanalysisonline.com
Spatial data exploration • Sampling frameworks – within zones Grid generation (hexagonal) - selection of 1 point per cell, random offset from centre Grid generation - square grid within field boundaries Selection of 5 random points per zone www.spatialanalysisonline.com
Spatial data exploration www.spatialanalysisonline.com
Spatial data exploration • Random points on a network www.spatialanalysisonline.com
Spatial data exploration • EDA, ESDA and ESTDA • EDA – basic aims (after NIST) • maximize insight into a data set • uncover underlying structure • extract important variables • detect outliers and anomalies • test underlying assumptions • develop parsimonious models • determine optimal factor settings www.spatialanalysisonline.com
Spatial data exploration • ESDA (see GeoDa and STARS) • Extending EDA ideas to the spatial domain (lattice/zone models) • Brushing • Linking • Mapped histograms • Outlier mapping • Box plots • Conditional choropleth plots • Rate mapping www.spatialanalysisonline.com
Spatial data exploration • ESDA: Brushing & linking www.spatialanalysisonline.com
Spatial data exploration • ESDA: Histogram linkage www.spatialanalysisonline.com
Spatial data exploration • ESDA: Parallel coordinate plot & star plot www.spatialanalysisonline.com
Spatial data exploration • ESDA: Mapped box plots www.spatialanalysisonline.com
Spatial data exploration • ESDA: Conditional choropleth mapping www.spatialanalysisonline.com
Spatial data exploration • ESDA: Mapped point data www.spatialanalysisonline.com
Spatial data exploration • ESDA: Trend analysis (continuous spatial data) www.spatialanalysisonline.com
Spatial data exploration • ESDA: Cluster hunting – GAM/K (steps) • Read data for the population at risk • Identify the MBR containing the data, identify starting circle radius, and degree of overlap • Generate a grid covering the MBR • For each grid-intersection generate a circle of radius r • Retrieve two counts for the population at risk and the variable of interest • Apply some “significance” test procedure • Keep the result if significant • Repeat Steps 5 to 7 until all circles have been processed • Increase circle radius by dr and return to Step 3 else go to Step 10 • Create a smoothed density surface of excess incidence for the significant circles • Map this surface and inspect the results www.spatialanalysisonline.com
Spatial data exploration • Grid-based statistics • Univariate analysis of attribute data (non-spatial metrics) • Cross-classification and cross-tab analyses • Spatial pattern analysis for grid data (including Landscape metrics) • Patch metrics; Class-level metrics; Landscape-level metrics • Quadrat analysis • Multi-grid regression analysis www.spatialanalysisonline.com
Spatial data exploration • Grid-based statistics • Landscape metrics • Non-spatial • Proportional abundance; Richness; Evenness; Diversity • Spatial • Patch size distribution and density; Patch shape complexity; Core Area; Isolation/Proximity; Contrast; Dispersion; Contagion and Interspersion; Subdivision; Connectivity www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics • Typically analysis of point-pair distances • Points vs events • Distance metrics: Euclidean, spherical, Lp or network • Weighted or unweighted events • Events, NOT computed points (e.g. centroids) • Classical statistical models vs Monte Carlo and other computational methods www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics • Basic Nearest neighbour (NN) model • Input coordinates of all points • Compute (symmetric) distances matrix D • Sort the distances to identify the 1st, 2nd,...kth nearest values • Compute the mean of the observed 1st, 2nd, ...kth nearest values • Compare this mean with the expected mean under Complete Spatial Randomness (CSR or Poisson) model www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics – NN model www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics – NN model • Mean NN distance: • Variance: • NN Index (Ratio): • Z-transform: www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics • Issues • Are observations n discrete points? • Sample size (esp. for kth order NN, k>1) • Model requires density estimation, m • Boundary definition problems (density and edge effects) – affects all methods • NN reflexivity of point sets • Limited use of frequency distribution • Validity of Poisson model vs alternative models www.spatialanalysisonline.com
Spatial data exploration • Frequency distribution of nearest neighbour distances, i.e. • The frequency of NN distances in distance bands, say 0-1km, 1-2kms, etc • The cumulative frequency distribution is usually denoted • G(d) = #(di < r)/n where di are the NN distances and n is the number of measurements, or • F(d) = #(di < r)/m where m is the number of random points used in sampling www.spatialanalysisonline.com
Spatial data exploration • Computing G(d) [computing F(d) is similar] • Find all the NN distances • Rank them and form the cumulative frequency distribution • Compare to expected cumulative frequency distribution: • Similar in concept to K-S test with quadrat model, but compute the critical values by simulation rather than table lookup www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics – clustering (ESDA) • Is the observed clustering due to natural background variation in the population from which the events arise? • Over what spatial scales does clustering occur? • Are clusters a reflection of regional variations in underlying variables? • Are clusters associated with some feature of interest, such as a refinery, waste disposal site or nuclear plant? • Are clusters simply spatial or are they spatio-temporal? www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics – clustering • kth order NN analysis • Cumulative distance frequency distribution, G(r) • Ripley K (or L) function – single or dual pattern • PCP • Hot spot and cluster analysis methods www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics – Ripley K or L • Construct a circle, radius d, around each point (event), i • Count the number of other events, labelled j, that fall inside this circle • Repeat these first two stages for all points i, and then sum the results • Increment d by a small fixed amount • Repeat the computation, giving values of K(d) for a set of distances, d • Adjust to provide ‘normalised measure’ L: www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics – Ripley K www.spatialanalysisonline.com
Spatial data exploration • Point (event) based statistics – comments • CSR vs PCP vs other models • Data: location, time, attributes, error, duplicates • Duplicates: deliberate rounding, data resolution, genuine duplicate locations, agreed surrogate locations, deliberate data modification • Multi-approach analysis is beneficial • Methods: choice of methods and parameters • Other factors: borders, areas, metrics, background variation, temporal variation, non-spatial factors • Rare events and small samples • Process-pattern vs cause-effect • ESDA in most instances www.spatialanalysisonline.com
Spatial data exploration • Hot spot and cluster analysis – questions • where are the main (most intensive) clusters located? • are clusters distinct or do they merge into one another? • are clusters associated with some known background variable? • is there a common size to clusters or are they variable in size? • do clusters themselves cluster into higher order groupings? • if comparable data are mapped over time, do the clusters remain stable or do they move and/or disappear? www.spatialanalysisonline.com
Spatial data exploration • Hot spot (and cool-spot) analysis • Visual inspection of mapped patterns • Scale issues • Proximal and duplicate points • Point representation (size) • Background variation/controls (risk adjustment) • Weighted or unweighted • Hierarchical or non-hierarchical • Kernel & K-means methods www.spatialanalysisonline.com
Spatial data exploration • Hot spot analysis – Hierarchical NN Cancer incidence data 1st and 2nd order clusters www.spatialanalysisonline.com