1 / 38

Spatial Analysis: successive levels of sophistication

Descriptive Statistics for Spatial Distributions Review Standard Descriptive Statistics Centrographic Statistics for Spatial Data Mean Center, Centroid, Standard Distance Deviation, Standard Distance Ellipse Density Kernel Estimation, Mapping.

floresl
Download Presentation

Spatial Analysis: successive levels of sophistication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Descriptive Statistics for Spatial DistributionsReview Standard Descriptive Statistics Centrographic Statistics for Spatial DataMean Center, Centroid, Standard Distance Deviation, Standard Distance EllipseDensity Kernel Estimation, Mapping Briggs Henan University 2010

  2. Spatial Analysis:successive levels of sophistication  • Spatial data description: classic GIS capabilities • Spatial queries & measurement, • buffering, map layer overlay • Exploratory Spatial Data Analysis (ESDA): • searching for patterns and possible explanations • GeoVisualization through data graphing and mapping • Descriptive spatial statistics: Centrographic statistics • Spatial statistical analysis and hypothesis testing • Are data “to be expected” or are they “unexpected” relative to some statistical model, usually of a random process • Spatial modeling or prediction • Constructing models (of processes) to predict spatial outcomes (patterns) Briggs Henan University 2010

  3. Standard Statistical Analysis Two parts: • Descriptive statistics Concerned with obtaining summary measures to describe a set of data For example, the mean and the standard deviation 2. Inferential statistics Concerned with making inferences from samples about a populations Similarly, we have Descriptive and Inferential Spatial Statistics Briggs Henan University 2010

  4. Spatial Statistics Descriptive Spatial Statistics: Centrographic Statistics (This time) single, summary measures of a spatial distribution - Spatial equivalents of mean, standard deviation, etc.. Inferential Spatial Statistics: Point Pattern Analysis (Next time) Analysis of point location only--no quantity or magnitude (no attribute variable) --Quadrat Analysis --Nearest Neighbor Analysis, Ripley’s K function Spatial Autocorrelation (Weeks 5 and 6) One attribute variable with different magnitudes at each location The Weights Matrix Global Measures of Spatial Autocorrelation (Moran’s I, Geary’s C, Getis/Ord Global G) Local Measures of Spatial Autocorrelation (LISA and others) Prediction with Correlation and Regression (Week 7) Two or more attribute variables Standard statistical models Spatial statistical models Briggs Henan University 2010

  5. Standard Statistical Analysis:A Quick Review 1. Descriptive statistics • Concerned with obtaining summary measures to describe a set of data • Calculate a few numbers to represent all the data • we begin by looking at one variable (“univariate”) • Later , we will look at two variables (bivariate) Three types: • Measures of Central Tendency • Measures of Dispersion or Variability • Frequency distributions I hope you are already familiar with these. I will quickly review the main ideas. Briggs Henan University 2010

  6. Formulae for mean Standard Descriptive StatisticsCentral Tendency • Central Tendency: single summary measure for one variable: • mean (average) • median (middle value) --50% larger and 50% smaller --rank order data and select middle number 3. mode (most frequently occurring) These may be obtained in ArcGIS by: --opening a table, right clicking on column heading, and selecting Statistics --going to ArcToolbox>Analysis>Statistics>Summary Statistics

  7. Calculation of mean and median Mean 296.15 / 34 = 8.71 Median (7.69 + 7.8)/2 = 7.75 (there are 2 “middle values”) Note: data for Taiwan is included Briggs Henan University 2010

  8. Formulae for variance å n 2 å 2 ( X X ) n å X [( X ) / N ] - 2 i - i = = = 1 i i 1 N N Standard Descriptive StatisticsVariability or Dispersion • Dispersion: measures of spread or variability • Variance • average squared distance of observations from mean • Standard Deviation (square root of variance) • “average” distance of observations from the mean Definition Formula Computation Formula These may be obtained in ArcGIS by: --opening a table, right clicking on column heading, and selecting Statistics --going to ArcToolbox>Analysis>Statistics>Summary Statistics

  9. Calculation of Variance and Standard Deviation Variance from Definition Formula 1361.370/34 = 40.04 Variance from Computation Formula [3940.924 – (296.15 * 296.15)/34]/34 =40.04 Standard Deviation = 40.04 =6.33 Note: data for Taiwan is included Briggs Henan University 2010

  10. Classic Descriptive Statistics: UnivariateFrequency distributions A count of the frequency with which values occur on a variable US population, by age group: 50 million people age 45-59 (data for 2000) Source: http://www.census.gov/compendia/statab/ US Bureau of the Census: Statistical Abstract of the US • Often represented by the area under a frequency curve This area represents 100% of the data 100% In ArcGIS, you may obtain frequency counts on a categorical variable via: --ArcToolbox>Analysis>Statistics>Frequency

  11. Frequency Distributions for China Province Data Symetric Distribution Height of bar shows frequency There are 16 provinces with percent urban between 38.4% and 50.8% (mode) Mode = (38.1+50.8)/2 =44.5 Mean = 48.97 Median = 44.0 Symetric distribution: mean = median = mode Skewed Distribution (right skew) Height of bar shows frequency There are 17 provinces with illiteracy between 5.4% and 10.7% (mode) Mode = (5.4+10.7)/2 =8.05 Mean = 8.7 Median = (7.69 + 7.8)/2 = 7.75 Symetric distribution: mean > median “tail” extends to right Mean is “pulled” to the right

  12. Frequency Distributions for China Province Data: Variability Symetric Distribution Standard deviation: A measure of “the average” distance of each observation from the mean Standard deviation = 14.8 Skewed Distribution (right skew) Standard deviation = 6.33 On average, illiteracy values are closer to the mean. There is less “spread” in this data “tail” extends to right

  13. Caution—these values are incorrect! • Why? • Incorrect to calculate mean for percentages • Each percentage has a different base population • Should calculate weighted mean wi =population of each province • Very common error in GIS because we use aggregated data frequently Briggs Henan University 2010

  14. Correct Values! • Unweighted mean = 8.7 • Weighted mean = 7.75 • Weighted mean is smaller. Why? • The largest provinces Highest rates in have lower illiteracy small provinces Briggs Henan University 2010

  15. Calculation of weighted mean Unweighted mean 296.15 / 34 = 8.71 Weighted mean 10,445,390,141 / 1,347,382,600 = 7.75 Note: we should also calculate a weighted standard deviation Briggs Henan University 2010

  16. Centrographic StatisticsDescriptive statistics for spatial distributionsMean CenterCentroidStandard Distance DeviationStandard Distance EllipseDensity Kernel Estimation(Add Frequency Distributions and mapping—use GeoDA to produce) Briggs Henan University 2010

  17. Centrographic Statistics Measures of CentralityMeasures of Dispersion • Mean Center -- Standard Distance • Centroid -- Standard Deviational Ellipse • Weighted mean center • Center of Minimum Distance • Two dimensional (spatial) equivalents of standard descriptive statistics for a single-variable (univariate). • Used for point data • May be used for polygons by first obtaining the centroid of each polygon • Best used to compare two distributions with each other • 1990 with 2000 • males with females (O&U Ch. 4 p. 77-81) Briggs Henan University 2010

  18. Mean Center • Simply the mean of the X and the mean of the Y coordinates for a set of points • Sum of differences between the mean X and all other Xs is zero (same for Y) • Minimizes sum of squared distances between itself and all points Distant points have large effect: Values for Xinjiang will have larger effect Provides a single point summary measure for the location of a set of points Briggs Henan University 2010

  19. Centroid • The equivalent for polygons of the mean center for a point distribution • The center of gravity or balancing point of a polygon • if polygon is composed of straight line segments between nodes, centroid given by “average X, average Y” of nodes (there is an example later) • Calculation sometimes approximated as center of bounding box • Not good • By calculating the centroids for a set of polygons can apply Centrographic Statistics to polygons Briggs Henan University 2010

  20. Centroids for Provinces of China Briggs Henan University 2010

  21. Centroids for Provinces of China Briggs Henan University 2010

  22. Warning: Centroid may not be inside its polygon • For Gansu Province, China, centroid is within neighboring province of Qinghai • Problem arises with crescent- shaped polygons Briggs Henan University 2010

  23. Weighted Mean Center • Produced by weighting each X and Y coordinate by another variable (Wi) • Centroids derived from polygons can be weighted by any characteristic of the polygon • For example, the population of a province Briggs Henan University 2010

  24. 4,7 7,7 10 10 4,7 7,7 7,3 2,3 5 5 6,2 0 0 10 10 5 5 7,3 2,3 6,2 0 0 Calculating the centroid of a polygon or the mean center of a set of points. (same example data as for area of polygon) Calculating the weighted mean center. Note how it is pulled toward the high weight point. Briggs Henan University 2010

  25. Center of Minimum Distance or Median Center • Also called point of minimum aggregate travel • That point (MD) which minimizessum of distances between itself and all other points (i) • No direct solution. Can only be derived by approximation • Not a determinate solution. Multiple points may meet this criteria—see next bullet. • Same as Median center: • Intersection of two orthogonal lines (at right angles to each other), such that each line has half of the points to its left and half to its right • Because the orientation of the axis for thelines is arbitrary, multiple points may meet this criteria. Source: Neft, 1966 Briggs Henan University 2010

  26. Median and Mean Centers for US Population Median Center: Intersection of a north/south and an east/west line drawn so half of population lives above and half below the e/w line, and half lives to the left and half to the right of the n/s line Mean Center: Balancing point of a weightless map, if equal weights placed on it at the residence of every person on census day. Source: US Statistical Abstract 2003 Briggs Henan University 2010

  27. Formulae for standard deviation of single variable Standard Distance Deviation • Represents the standard deviation of the distance of each point from the mean center • Is the two dimensional equivalent of standard deviation for a single variable • Given by: which by Pythagorasreduces to: ---essentially the average distance of points from the center Provides a single unit measure of the spread or dispersion of a distribution. We can also calculate a weighted standard distance analogous to the weighted mean center. Or, with weights Briggs Henan University 2010

  28. 4,7 7,7 10 7,3 2,3 5 6,2 0 10 5 0 Standard Distance Deviation Example Circle with radii=SDD=2.9 Briggs Henan University 2010

  29. Standard Deviational Ellipse: concept • Standard distance deviation is a good single measure of the dispersion of the points around the mean center, but it does not capture any directional bias • doesn’t capturethe shape of the distribution. • The standard deviation ellipse gives dispersion in two dimensions • Defined by 3 parameters • Angle of rotation • Dispersion (spread) along major axis • Dispersion (spread) along minor axis The major axis defines the direction of maximum spreadof the distribution The minor axis is perpendicular to itand defines the minimum spread Briggs Henan University 2010

  30. Standard Deviational Ellipse: calculation • Formulae for calculation may be found in references such as • Lee and Wong pp. 48-49 • Levine, Chapter 4, pp.125-128 • Basic concept is to: • Find the axis going through maximum dispersion (thus derive angle of rotation) • Calculate standard deviation of the points along this axis (thus derive the length (radii) of major axis) • Calculate standard deviation of points along the axis perpendicular to major axis (thus derive the length (radii) of minor axis) Briggs Henan University 2010

  31. Mean Center & Standard Deviational Ellipse: example There appears to be no major difference between the location of the software and the telecommunications industry in North Texas. Briggs Henan University 2010

  32. Implementation in ArcGIS In ArcToolbox Median Center for a set of points • To calculate centroid for a set of polygons, with ArcGIS: ArcToolbox>Data Management Tools>Features>Feature to Point (requires ArcInfo) • To calculate using GeoDA: • Tools>Shape>Polygons to Centroids Standard deviation ellipse Centroid for a set of points Standard distance Briggs Henan University 2010

  33. Density Kernel Estimation • commonly used to “visually enhance” a point pattern • Is an example of “exploratory spatial data analysis” (ESDA) Kernel=10,000 Kernel=5,000 Briggs Henan University 2010

  34. low low high high • SIMPLE Kernel option (see example above) • A “neighborhood” or kernel is defined around eachgrid cell consisting of all grid cells with centers within the specified kernel (search) radius • The number of points that fall within that neighborhood is totaled • The point total is divided by the area of the neighborhood to give the grid cell’s value • Density KERNEL option • a smoothly curved surface is fitted over each point • The surface value is highest at the location of the point, and diminishes with increasing distance from the point, reaching zero at the kernel distance from the point. • Volume under the surface equals 1 (or the population value if a population variable is used) • Uses quadratic kernel function described in Silverman (1986, p. 76, equation 4.5). • The density at each output grid cell is calculated by adding the values of all the kernel surfaces where they overlay the grid cell center.

  35. Implementation in ArcGIS • If specify a “population field” software calculates as if there are that number of points at that location. • The search radius: • the size of the neighborhood or kernel which is successively defined around every cell (simple kernel) or each point (density kernel) • Output cell size: • Size of each raster cell • Search radius and output cell size are based on measurement units of the data (here it is feet) • It is good to “round” them (e.g. to 10,000 and 1,000)

  36. What have we learned today? • We have learned about descriptive spatial statistics, often called Centrographic Statistics • Next time, we will learn about Inferential Spatial Statistics Briggs Henan University 2010

  37. Project for you • The China data on my web site has population data for the provinces of China in 2008 • Obtain population counts for 2000, 1990 and/or any other year • Calculate the weighted mean center of China’s population for each year • Be sure to use the same set of geographic units each time • For example, if you do not have data for Taiwan or Hong Kong for one year, omit these geographic units for all years Briggs Henan University 2010

  38. Texts O’Sullivan, David and David Unwin, 2010. Geographic Information Analysis. Hoboken, NJ: John Wiley, 2nd ed. Other Useful Books: Mitchell, Andy 2005. ESRI Guide to GIS Analysis Volume 2: Spatial Measurement & Statistics. Redlands, CA: ESRI Press. Allen, David W 2009. GIS Tutorial II: Spatial Analysis Workbook. Redlands, CA: ESRI Press. Wong, David W.S. and Jay Lee 2005. Statistical Analysis of Geographic Information. Hoboken, NJ: John Wiley, 2nd ed. Ned Levine and Associates, Crime Stat III Manual, Washington, D.C. National Institutes of Justice, 2004 with later updates. http://www.icpsr.umich.edu/CrimeStat/ Density Kernel Estimation Silverman, B.W. 1986. Density Estimation for Statistics and Data Analysis. New York: Chapman and Hall.

More Related