390 likes | 674 Views
Geography 625. Intermediate Geographic Information Science. Week2: The pitfalls and potential of spatial data. Instructor : Changshan Wu Department of Geography The University of Wisconsin-Milwaukee Fall 2006. Outline. Introduction The bad news: the pitfalls of spatial data
E N D
Geography 625 Intermediate Geographic Information Science Week2: The pitfalls and potential of spatial data Instructor: Changshan Wu Department of Geography The University of Wisconsin-Milwaukee Fall 2006
Outline • Introduction • The bad news: the pitfalls of spatial data • The good news: the potential of spatial data
1. Introduction Why spatial data require spatial analytic techniques, distinct from standard statistical analysis that might be applied to any old ordinary data? Anything special with the spatial data? Number of cases of Lyme disease (Huxhold and Martin)
1. Introduction Bad news: many of the standard techniques and methods documented in standard statistics textbooks have significant problems when we try to apply them to the analysis of the spatial distributions. Good news: Geospatial referencing provides us with a number of new ways of looking at data and the relations among them. (e.g. distance, adjacency, interaction, and neighbor)
2. Pitfalls of Spatial Data Spatial data always violate the fundamental requirement of conventional statistical analysis • Spatial autocorrelation • Modifiable areal unit problem • Ecology fallacy • Scale • Nonuniformity of space • Edge effect
2. Pitfalls of Spatial Data - Spatial autocorrelation Data from locations near one another in space are more likely to be similar than data from locations remote from one another. • Example: • Housing market • Elevation change • Temperature (African American Population Concentration)
2. Pitfalls of Spatial Data - Spatial autocorrelation • The nonrandom distribution of phenomena in space has various consequences for conventional statistic analysis. • Biased parameter estimates • Data redundancy (affecting the calculation of confidence intervals y x
2. Pitfalls of Spatial Data - Spatial autocorrelation Three general possibilities Positive autocorrelation: nearby locations are likely to be similar to one another. Negative autocorrelation: observations from nearby observations are likely to be different from one another. Zero autocorrelation: no spatial effect is discernible, and observations seem to vary randomly through space
2. Pitfalls of Spatial Data - Spatial autocorrelation Positive Negative Zero (Random)
2. Pitfalls of Spatial Data - Spatial autocorrelation • Spatial autocorrelation diagnostic measures • Joins count statistics • Moran’s I • Geary’s C • Variogram cloud
2. Pitfalls of Spatial Data - Spatial autocorrelation Spatial autocorrelation structure: spatial variation across a study area • First order spatial variation: occurs when observations across a study region vary from space to space due to changes in the underlying properties of the local “environment”. • Second order: due to local interaction effects between observations.
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem Many geographic data are aggregates of data at a more detailed level • National census: collected at the household level but reported for practical and privacy reasons at various levels of aggregation (block, block group, tract, county, state, etc.) • Traffic Analysis Zone (TAZ) • School district
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem Modifiable Areal Unit Problem: the aggregation units used are arbitrary with respect to the phenomena under investigation, yet the aggregation units used will affect statistics determined on the basis of data reported in this way. If the spatial units in a particular study were specified differently, we might observe very different patterns and relationships.
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem Openshaw and Taylor (1979) showed that with the same underlying data it is possible to aggregate units together in ways that can produce correlations anywhere between -1.0 to +1.0.
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem Two issues Scale issue: involves the aggregation of smaller units into larger ones. Generally speaking, the larger the spatial units, the stronger the relationship among variables. Aggregation (smoothed)
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem Modifiable Area: Units are arbitrary defined and different organization of the units may create different analytical results.
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem Potential problems in almost every field that utilizes spatial data E.g. boundaries of electoral districts In the 2000 U.S. presidential election, Al Gore, with more of the population vote than George Bush, but failed to become president. A different aggregation of U.S. counties into states could have produced a different outcome (switch just one northern Florida county to Georgia or Alabama would have produced a different outcome)
2. Pitfalls of Spatial Data - Modifiable Areal Unit Problem What are the reasons for this problem? • Problems of data? • Problems of spatial units? What are the solutions for this problem? • Using the most disaggregated data • Produce a optimal zoning system • Others?
2. Pitfalls of Spatial Data - Ecological Fallacy The Ecological Fallacy is a situation that can occur when a researcher or analyst makes an inference about an individual based on aggregate data for a group. (Reference: http://jratcliffe.net/research/ecolfallacy.htm)
2. Pitfalls of Spatial Data - Ecological Fallacy Example: we might observe a strong relationship between income and crime at the county level, with lower-income areas being associated with higher crime rate. • Conclusion: • Lower-income persons are more likely to commit crime • Lower-income areas are associated with higher crime rates • Lower-income counties tend to experience higher crime rates
2. Pitfalls of Spatial Data - Ecological Fallacy Issues: • Identifying associations between aggregate figures is defective ? • Inferences drawn about associations between the characteristics of an aggregate population and the characteristics of sub-units within the population are wrong? What should we do? Be aware of the process of aggregating or disaggregating data may conceal the variations that are not visible at the larger aggregate level
2. Pitfalls of Spatial Data - Ecological Fallacy Relationship between ecological fallacy and modifiable areal unit problem?
2. Pitfalls of Spatial Data - Scale The geographical scale at which we examine a phenomenon can affect the observations we make and must always be considered prior to spatial analysis • Problems of data representation • Is there an optimal scale?
2. Pitfalls of Spatial Data - Nonuniformity of Space Nonuniformity: space is not uniform Area with high crime rates? Crime locations
2. Pitfalls of Spatial Data - Edge Effects Edge effects arise where an artificial boundary is imposed on a study, often just to keep it manageable. Spatial interpolation
3. Potential of Spatial Data - Introduction • Potential insight provided by examination of the locational attributes of data • Distance • Adjacency • Interaction • Neighborhood
3. Potential of Spatial Data - Distance Distance between the spatial entities of interest can be calculated with spatial data Euclidean distance Network distance Others (e.g. travel time)
3. Potential of Spatial Data - Adjacency Adjacency can be thought of as the nominal, or binary, equivalent of distance. Two spatial entities are either adjacent or not. Can be defined differently Example 1: two entities are adjacent if they share a common boundary (e.g. Illinois and Wisconsin) Example 2: two entities are adjacent if they are within a specified distance
3. Potential of Spatial Data - Interaction Interaction may be considered as a combination of distance and adjacency and rests on the intuitively obvious idea that nearer things are “more related” than distant things, a notion often referred to as the first law of geography.
3. Potential of Spatial Data - Neighborhood Different definitions Example 1: a particular spatial entity as the set of all other entities adjacent to the entity we are interested in. Example 2: a region of space associated with that entity and defined by distance from it.
3. Potential of Spatial Data - Neighborhood Adjacency Distance Neighborhood Interaction
3. Potential of Spatial Data - Matrix representation Distance matrix A B C D E F A B C D E F
3. Potential of Spatial Data - Matrix representation Adjacency d<= 50 = A B C D E F A B C D E F
3. Potential of Spatial Data - Proximity Polygons The proximity polygon of any entity is that region of the space which is closer to the entity than it is to any other. Applications: Service area delineation (e.g. schools, hospital, supermarket, etc.)
3. Potential of Spatial Data - Proximity Polygons • Delaunay triangulation • Potential applications: • TIN model • Others