370 likes | 554 Views
GGIT 538 Spatial Data Analysis. Instructor : Dr. H. Şebnem Düzgün Room:K4-123 duzgun@metu.edu.tr. Basic Aim of the Course. Introduce the certain spatial statistical concepts and their use in GIS so that the students can use them in their studies at GGIT. OUTLINE. Introduction
E N D
GGIT 538Spatial Data Analysis Instructor: Dr. H.Şebnem Düzgün Room:K4-123 duzgun@metu.edu.tr
Basic Aim of the Course Introduce the certain spatial statistical concepts and their use in GIS so that the students can use them in their studies at GGIT.
OUTLINE • Introduction • 1.1. Introduction • 1.1.1. Scope of spatial statistics • 1.2. Spatial versus non-spatial data analysis • 1.2.1. Relatiaonship between classes of spatial entities • 1.2.1. Facts on attributes of spatial entities • 1.3. Types of spatial phenomena and relationships • 1.4. Problem types in spatial data analysis • 1.4.1. Problems of spatially discrete point data • 1.4.2. Problems of spatially continuous point data • 1.4.3. Problems of area data • 1.4.4. Problems of spatial interaction data Introduction to Spatial Data Analysis
CHAPTER I • Introduction • Spatial statistics deals with ways of analyzing all varieties of data in a spatial context. Some of the examples of the kind of problems can be listed as: • Seismologist collect data on the regional distribution of earthquakes. Does this distribution show any pattern or predictability over space? • Public health specialist collect data on the occurrence of diseases. Does the distribution of cases of a disease form a pattern in space? Is there some association with possible sources of environmental pollution? Introduction to Spatial Data Analysis
Police wish to investigate if there is any spatial pattern to the distribution of certain crime locations. Does the rate of crime in particular areas correlate with socio-economic characteristics of the area? Geologist wish to estimate the extent of a mineral deposit over a particular region, given data on borehole samples taken from locations scattered across the area. How can we make sensible estimates? A groundwater hydrologistcollects data on the concentration of a toxic chemical in samples collected from a series of wells. Can we use these samples to construct a regional map of likely contamination?
Retailerswish to use socio-economic data, available for small areas from the population census, to assess the likely demand for their products if they open or expand an outlet. How are we to classify such areas? The same retailers collect information on movements of shoppers from residential zones to stores. Can we build models of such flows? Can we predict changes in such flows if we expand an outlet or open a new one?
The subject of spatial data analysis is relevant in many different fields such as: Geographers Statisticians Economists Sociologists Epidemiologists Planners Biologists Environmental scientists Earth scientists Engineers
Scope of Spatial Statistics • General concepts in spatial analysis: Spatial versus non-spatial data analysis, problem types, kinds of spatial phenomena and relationships. (Chapter 1) • Review of basic statistics: Random variables, expectations, probability distributions, maximum likelihood estimation, stationary and anisotropy. (Chapter 2) • General Concepts in Spatial Data Analysis:Visualizing spatial, data,exploring spatial data, modeling spatial data (Chapter 3) • Point patter analysis: Visualizing, exploring and modeling the point patterns. (Chapter 4 & 5) • Spatially continuous dataanalysis: Visualizing, exploring and modeling the spatially continuous data. (Chapter 6) • Analysis of area data: Visualizing, exploring and modeling the area data. (Chapter 7)
1.2. Spatial Versus Non-spatial Data Analysis • Spatial data analysis deals with the situation whereobservational data are available on some process operating in space and methods are sought to describe or explain the behavior of this process and its possible relationship to otherspatial phenomena. • The main purpose of the analysis is: • To increase our basic understanding of the process • To assess the evidence in favor of various hypotheses concerning it • To predict values in areas where observations have not been made
Spatial data analysis is involved when the data are spatially located and explicit consideration is given to possible importance of their spatial agreement or in the interpretation of results. E.g. Consider the relationship between number of plant species and geographical area for a set of small islands. It is empirically suggested that the logarithm of the number of species is related to the logarithm of the area of the island. Reason: As area increases there is a greater possibility of a range of available habitats
Spatial data analysis has nothing to do at this stage. In other words one of the variables involved (area), which is geographical, does not itself make the analysis a spatial one. However, if we search for whether the isolation of an island is an important factor, in terms of its distance from other islands or from a continental area, this hypothesis is handled in the context of spatial data analysis. If the basic concern is to analyze the spatial interaction, it is tried to determine whether there is an association between asetofpointsandasetoflines or setofpointsandsetofareas
E.g. Testing for the association between the occurrence of mineral deposits (point data) and configurations of geological lineaments (line data). Testing the hypothesis that there is a link between childhood leukemia (point data) and proximity to high voltage power lines (line data). Testing the existence of a relationship between a set of plants (point data) and soil type (aerial unit). Testing the existence of a relationship between the incidence of Alzheimer’s disease (point data) and the presence of aluminum in water sampled in a set of water supply zones (aerial units).
E.g. Consider it is intended to model spatial variation in precipitation in California.Suppose we take a set of 30 monitoring stations, distributed across the state. Figure 1.1. Locations of rainfall measurement sites in California
For each of the points we have recordings of: Average annual precipitations (Y) Altitude (X1) Latitude (X2) Distance from coast (X3) A standard multiplelinearrepression model is fitted to the data and it is found that three of the independent variables are significant predictors of rainfall with which 60 % of variation is explained by them. (Non-spatial data analysis)
Then the residuals (the differences between the observed values of precipitation at the stations and those predicted by regression model) are mapped in order to see if any spatial pattern exists. This indicated that there is a clustering of negative residuals on the leeward side of the mountains. In other words the model over predicts precipitation at these locations. This leads the researcher to introduce a new variable which takes value of 1 if the location of the station is in the lee of themountain, 0 otherwise. With this variable added to theregression model, the explained variation rose to 74 %.(Spatial data analysis)
Relationship between Classes of Spatial Entities Sometimes it is necessary to transform one class of objects into another one. Point to area transformation Use of Thiessen Polygons Area to point transformation Use of centroids This notion of "new objects for old" relates to the subject of relations between entities. This relation can be of many types, such as: If the basic concern is to analyze the spatial arrangements of points, this involves the measurement of distances betweenpoints; distanceisaspatialrelation.
E.g. Comparing the distribution of set of disease cases with a set of healthy controls, which involve distance measurements. If the basic concern is to analyze aerial data, simple information about spatial adjacency may be of interest.Usually spatial proximity is linked to attribute information. Inmany cases, it is searched for whether areas close to eachother on the ground have similar values on one or moreattributes. E.g. Do set of neighboring health districts tend to have the same mortality rate? Do adjacent pixels in remote sensing tend to have similar electromagnetic reflectance?
Facts on Attributes of Spatial Entities If the attributes are treated alone, ignoring the spatial relationships between sample locations, it cannot be claimed to be doing spatial data analysis. In order to undertake spatial data analysis it is required as a minimum, information on location and usually both location and attributes. If it is desired to study the spatial arrangement or pattern of entities then this is essentially a geometric question and collecting only the data for locations of entities will be sufficient. If it is aimed to compare the arrangements of different types of entities or to study spatial pattern in measurements taken at locations, then it is needed to make use of both attribute and location information.
1.3.TypesofSpatialPhenomenaandRelationships There are different types of spatial phenomena and spatial relationships that may be involved in spatial data analysis. These are basically: Entity view of the space Field view of the space * Entity view: The space is considered as something filled with “objects”. The spatial phenomena being analyzed are usually conceptualized aspoints,linesorareas.
Points Plants, people, shops, soil pits, the epicenters of earthquakes, etc. LinesRoads, streams, fault lines, etc. Areas Countries, voting areas, health regions, land covers, etc. Note that representing objects in a space as points, lines and areas are always scale dependent.
*Field view: The space is considered as something covered with "surfaces'. In this view the emphasis is on the continuity of spatial phenomena. Phenomena in natural environment, such as temperature, relief, atmospheric pressure, soil or rock characteristics, etc. are observed and measured anywhere on the earth's surface. In practice however, such variables are "discretised". In other words they are sampled at a set of discrete locations and represented as a continuously varying field. The relation between kind of spatial phenomena and problem types Entity viewPointpatternandareadata Field viewSpatially continuous data
In entity view spatial objects have features or attributes attached to with them; on the other hand, in field view features are associated with a field as an attribute varying continuously over space. Such attributes are measured according to one of the classic measurement scales: Nominal Ordinal Interval / ratio
Measurement Scale Entity Attribute Nominal Ordinal Interval Tree Tree species Short, medium,long Age Point Steam Clean or polluted 1st, 2nd or higher order Line Pollution density Area Land Land-use class High, medium, low, quality Table 1.1. Attributes of spatial entities according to measurement scale Discharge
1.4. Problem Types in Spatial Data Analysis There are basically four classes of problems encountered in spatial data analysis: 1. Problems of spatially discrete point data 2. Problems of spatially continuous point data 3. Problems of area data 4. Problems of spatial interaction data
1.Problemsofspatiallydiscretepointdata: This type of problems deal with data for a set of point events or a point pattern. These points sometimes have simple attributes with them distinguishing one kind of event from another. The main concern in such analysis is to analyze the pattern of the event locations.
E.g. The locations of craters in a volcanic field The locations of certain tree type in a forest The locations of the centers of biological cells in a section of tissue The locations of certain crime type in a neighborhood The locations of cases of a certain disease in an area The locations of certain cancer type in a part of the country
Figure 1.2. Locations of cases of Legionaires' disease in Glasgow
2. Problems of spatially continuous point data: This class of problems arise where there are again set of points but the pattern of these locations is not itself the subject of analysis. Rather, there is a variable/variables measured at these sites and the problem is to understand the process generating these values and possibly then to use this information to make predictions where there is no measurement.
E.g. Rainfall measurements Temperature for weather stations Groundwater levels Radon gas levels Geochemical data Climate measures Ore grade Soil & rock properties
Figure 1.3. Rainfall maps in England and Wales Prediction errors (mm) of precipitation in England and Wales Location of rainfall measurement sites in England and Wales Contoured precipitation levels(mm) in England and Wales
3. Problems of area data: This class of problems concerns area data which have been aggregated to a set of aerial units, such as counties, districts, census zones, etc. In this case there are one or more variables whose values are measured over this set of zones. The problem is to understand the spatial arrangement of these values, to detect pattern and to examine relationships among the set of variables.
E.g. Child mortality rate Socio-economic data Census data Voting data Prevalence of human blood groups Emissions of nitrogen and ammonia
4. Problems of spatial interaction data: This class of problems examines data on flows that link a set of locations (areas or points). The basic aim is to understand the arrangement of flows, to build models of such flows and maybe to use this information in making predictions about how the flow may change under certain scenarios.
E.g. Business trips made by air within a country Migration for provinces of a country Patients treated from different districts at a hospital The relative attractiveness of different shopping centers as branch sites for a financial district The effect of opening a new swimming pool The impact of new housing district on existing flows