Spatial Statistics

Spatial Statistics Modified from Dr. YU-FEN LI

Point Pattern Descriptors • Central tendency • Mean Center (Spatial Mean) • Weighted Mean Center • Median Center (Spatial Median) – not used widely for its ambiguity Consider n points

Central tendency –Mean Center (Spatial Mean) • The two means of the coordinates define the location of the mean center as

Central tendency –Weighted Mean Center • The two means of the coordinates define the location of the mean center as where is the weight at point i

Point Pattern Descriptors • Dispersion and Orientation • Standard distance • Weighted standard distance • Standard deviational ellipse

Dispersion and Orientation –Standard Distance • How points deviate from the mean center • Recall population standard deviation • is the mean center,

Dispersion and Orientation –Weighted Standard Distance • Points may have different attribute values that reflect the relative importance • is the weighted mean center,

Dispersion and Orientation – Standard Deviational Ellipse • Standard distance is a good single measure of the dispersion of the incidents around the mean center, but it does not capture any directional bias • The standard deviational ellipse gives dispersion in two dimensions and is defined by 3 parameters • Angle of rotation • Dispersion along major axis • Dispersion along minor axis

Dispersion and Orientation – Standard Deviational Ellipse • Basic concept is to: • Find the axis going through maximum dispersion (thus derive angle of rotation) • Calculate standard deviation of the points along this axis (thus derive the length of major axis) • Calculate standard deviation of points along the axis perpendicular to major axis (thus derive the length of minor axis)

Statistical Methods in GIS • Point pattern analyzers • Location information only • Line pattern analyzers • Location + Attribute information • Polygon pattern analyzers • Location + Attribute information

POINT PATTERN ANALYZERS • Two primary approaches • Quadrat Analysis – • based on observing the frequency distribution or density of points within a set of grids • Nearest Neighbor Analysis – • based on distances of points

Quadrat Analysis (QA) • Point Densityapproach • The density measured by QA is compared with it of a random pattern RANDOM CLUSTERED UNIFORM/ DISPERSED

Multiple ways to create quadrats: Quadrats don’t have to be square and their size has a big influence Quadrat Analysis (QA) Exhaustive census Random sampling

Quadrat Analysis (QA) • Apply uniform or random grid over area (A) with size of quadrats given by: where r = # of points • width of square quadrat is • radius of circular quadrat is

Quadrat Analysis (QA) --Frequency distribution comparison • Treat each cell as an observation and count the number of points within it • Compare observed frequencies in the quadrats withexpected frequencies that would be generated by • a random process (modeled by the Poisson distribution) • a clustered process (e.g. one cell with r points, n-1 cells with 0 points) (n = number of quadrats) • a uniform process (e.g. each cell has r/n points) • The standard Kolmogorov-Smirnov (K-S) test for comparing two frequency distributions can then be applied

Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S) Test • The test statistic “D” is simply given by: where Oi and Ei are the observed and expected cumulative proportions of the ith category in the two distributions. i.e. the largest difference (irrespective of sign) between observed cumulative frequency and expected cumulative frequency

Kolmogorov-Smirnov Test （例1） A. Situations in which the control and treatment groups do not differ in mean, but only in some other way. For example consider the datasets: controlA={0.22, -0.87, -2.39, -1.79, 0.37, -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17, -0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50, -0.09} treatmentA={-5.13, -2.19, -2.43, -3.83, 0.50, -3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87, -3.10, -5.81, 3.76, 6.31,2.58, 0.07, 5.76, 3.50}

Kolmogorov-Smirnov Test （例1） • There are then a few situations in which it is a mistake to trust the results of a t-test: • Notice that both datasets are approximately balanced around zero; evidently the mean in both cases is "near“ zero. However there is substantiallymore variationin the treatment group which ranges approximately from-6 to 6whereas the control group ranges approximately from-2½ to 2½. The datasets are different, but the t-test cannot see the difference.

Kolmogorov-Smirnov Test （例1）

Kolmogorov-Smirnov Test （例1） • the percentile plot of this data (in red) along with the behavior expected for the above lognormal distribution (in blue)

Kolmogorov-Smirnov Test （例2） • Situations in which the treatment and control groups are smallish datasets (say 20 items each) that differ in mean, but substantial non-normal distribution masks the difference. For example, consider the datasets: controlB={1.26, 0.34, 0.70, 1.75, 50.57, 1.55, 0.08, 0.42, 0.50, 3.20, 0.15, 0.49, 0.95, 0.24, 1.37, 0.17, 6.98, 0.10, 0.94, 0.38} treatmentB= {2.37, 2.16, 14.82, 1.73, 41.04, 0.23, 1.32, 2.91, 39.41, 0.11, 27.44, 4.51, 0.51, 4.50, 0.18, 14.68, 4.66, 1.30, 2.06, 1.19} • These datasets were drawn from lognormal distributions that differ substantially in mean.TheKS testdetects this difference, thet-testdoes not. Of course, if the user knew that the data werenon-normally distributed, s/he would knownot to applythe t-test in the first place.

Kolmogorov-Smirnov Test （例2） • Sorted controlB={0.08, 0.10, 0.15, 0.17, 0.24, 0.34, 0.38, 0.42, 0.49, 0.50, 0.70, 0.94, 0.95, 1.26, 1.37, 1.55, 1.75, 3.20, 6.98, 50.57}

Kolmogorov-Smirnov Test （例2）

Kolmogorov-Smirnov Test （例2） the percentile plot of this data (in red) along with the behavior expected for the above lognormal distribution (in blue).

Quadrat Analysis (QA) -- Kolmogorov-Smirnov (K-S) Test • The critical value at the 5% level is given by: where n is the number of quadrats in a two-sample case -- where n1 and n2 are the numbers of quadrats in the two sets of distributions

Quadrat Analysis: Variance-Mean Ratio (VMR) • Test if the observed pattern is different from a random pattern (generated from a Poisson distribution which mean = variance) • Treat each cell as an observation and count the number of points within it, to create the variable X • Calculate variance and mean of X, and create the variance to mean ratio: variance / mean

Quadrat Analysis: Variance-Mean Ratio (VMR) • For an uniform distribution, the variance is zero. • we expect a variance-mean ratio close to 0 • For a random distribution, the variance and mean are the same. • we expect a variance-mean ratio around 1 • For a clustered distribution, the variance is relatively large • we expect a variance-mean ratio above 1

Significance Test for VMR • = the mean of the observed distribution • , where xi is the number of points in a quadrat, ni is the number of quadrats with xipoints, and n is the total number of quadrats

Weakness of Quadrat Analysis • Results may depend on quadrat size and orientation • Is a measure of dispersion, and not really pattern, because it is based primarily on the density of points, and not their arrangement in relation to one another • Results in a single measure for the entire distribution, so variations within the region are not recognized (could have clustering locally in some areas, but not overall)

Weakness of Quadrat Analysis • For example, quadrat analysis cannot distinguish between these two, obviously different, patterns

Nearest-Neighbor Index (NNI) • Uses distances between points as its basis. • Compares the observed average distance between each point and its nearest neighbors with the expected average distance that would occur if the distribution were random: NNI= robs / r exp For random pattern, NNI = 1 For clustered pattern, NNI < 1 For dispersed pattern, NNI > 1

(Standard error) Nearest-Neighbor Index (NNI) – Significance test

Mean distance NNI

Nearest-Neighbor Index (NNI) • Advantages • NNI takes into account distance • No quadrat size problem to be concerned with • However, NNI not as good as might appear -- • Index highly dependent on the boundary for the area • its size and its shape (perimeter) • Fundamentally based on only the mean distance • Doesn’t incorporate local variations (could have clustering locally in some areas, but not overall) • Based on point location only and doesn’t incorporate magnitude of phenomena at that point

Nearest-Neighbor Index (NNI) • An “adjustment for edge effects” available but does not solve all the problems

Nearest-Neighbor Index (NNI) • Some alternatives to the NNI are • the G and F functions, based on the entire frequency distribution of nearest neighbor distances, and • the K function based on all interpoint distances.

Spatial Autocorrelation • Most statistical analyses are based on the assumption that the values of observations in each sample are independent of one another • Positive spatial autocorrelation violates this, because samples taken from nearby areas are related to each other and are not independent

Spatial Autocorrelation • In ordinary least squares regression (OLS), for example, the correlation coefficients will be biased and their precision exaggerated • Bias implies correlation coefficients may be higher than they really are • They are biased because the areas with higher concentrations of events will have a greater impact on the model estimate • Exaggerated precision (lower standard error) implies they are more likely to be found “statistically significant” • they will overestimate precision because, since events tend to be concentrated, there are actually a fewer number of independent observations than is being assumed.

Spatial Autocorrelation Several measures available: • Join Count Statistic • Moran’s I • Geary’s Ratio C • General (Getis-Ord) G • Anselin’s Local Index of Spatial Autocorrelation (LISA) Discuss them later ……

LINE PATTERN ANALYZERS • Two general types of linear features • Vectors (lines with arrows) • Networks • Spatial attributes of linear features • Length • Orientation and Direction • Spatial attribute of network features • Connectivity or Topology

Spatial Attributes of Linear Features -- Length • Linear distance (x1,y1 ) c a (x1,y2 ) (x2,y2 ) b

Spatial Attributes of Linear Features -- Length • Great circle distance D of locations A and B where a and b are the latitude readings of locations A and B || is the absolute difference in longitude between A and B

Spatial Attributes of Linear Features – Orientation and Direction • Orientation • Directional e.g. West-East orientation • Non-directional (from … to …) e.g. To describe a fault line -- from location y to location x = from location x to location y • Direction • Dependent on the beginning and ending locations: from location y to location x  from location x to location y

Directional Statistics – Directional Mean Directional Mean =Average direction of a set of vectors

Directional Statistics – Directional Mean Y + + = ? X

Directional Statistics – Circular Variance • Shows the angular variability of the set of vectors Y X

Directional Statistics – Circular Variance • For a set of n vectors, • , all vectors have the same direction or no circular variability • , all vectors are in opposite directions

Network Analysis • Connectivity – how different links are connected • Vertices: junctions or nodes • Links/edges: the lines joining the vertices

Spatial Statistics