500 likes | 700 Views
GIS in the Sciences ERTH 4750 (38031). Interfacing R with GIS. Xiaogang (Marshall) Ma School of Science Rensselaer Polytechnic Institute Tuesday, Apr 02, 2013. A review of our course outline. MapInfo. Week 1 (Jan. 22/25): Introduction to GIS
E N D
GIS in the Sciences ERTH 4750 (38031) Interfacing R with GIS Xiaogang (Marshall) Ma School of Science Rensselaer Polytechnic Institute Tuesday, Apr 02, 2013
A review of our course outline MapInfo • Week 1 (Jan. 22/25): Introduction to GIS • Week 2 (Jan. 29/Feb. 1): Geographic information and spatial data types • Week 3 (Feb. 5/8): Spatial referencing • Week 4 (Feb 12/15): Geostatistical computing • Week 5 (Feb. 19/22): Exploring and visualizing spatial data • Week 6 (Feb. 26/Mar. 1): Modeling spatial structure from point samples • Week 7 (Mar. 5/8): Spatial prediction from point samples (Part 1) • Week 8 (Mar. 12/15: no classes - spring break) • Week 9 (Mar. 19/22): Spatial prediction from point samples (Part 2) • Week 10 (Mar. 26/29): Assessing the quality of spatial predictions • Week 11 (Apr. 2/5): Interfacing R spatial statistics with GIS • Week 12 (Apr. 9/12: no class - Grand Marshal week) • Week 13 (Apr. 16/19): Efficient and effective result presentation with GIS • Week 14 (Apr. 23/26): Tuesday: Guest Lecture • Week 15 (Apr. 30): Tuesday: Short final project presentations R Mashup
Acknowledgements • This lecture is partly based on: • Bivand, R.S., Pebesma, E.J., Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R. Springer, New York, NY, 374pp. • Rossiter, D.G., 2012. R and GIS. Exercise in distance course Applied Geostatistics. ITC, University of Twente
Contents • GIS and R: Overview • Classes for spatial data in R • Handling coordinate reference systems in R • Spatial data import and export in R • Examples
1 GIS and R: Overview • GIS: a computerized system that facilitates the phases of data entry, data analysis and data presentation especially in cases when we are dealing with georeferenced data. • Geographic phenomena: continuous field, discrete field, collection of objects • Computer representations: raster and vector • Spatial autocorrelation and topology • Spatial reference: definitions, physical/geometric constructs and tools required to describe the geometry and motion of objects near and on the Earth’s surface • Datum (vertical and horizontal): a surface that represents the shape of the earth • Map projection: map the curved surface of the Earth (as modeled by an ellipsoid) onto a flat plane
So, data analysis is an essential part of a GIS • Statistics is a primary way of data analysis • Descriptive and Inferential statistics • Geostatistics: statistics on a population with known location • R: an open-source environment for statistical computing and visualization • Exploring and visualizing spatial data: spatial structure, regional trends, local spatial dependence and anisotropy • Modeling spatial structure from point samples • Spatial prediction from point samples • Assessing the quality of spatial predictions
GIS and R • Users approaching R with experience of GIS will want to see ‘layers’, ‘vectors’, ‘rasters’, and ‘georeferences’, etc. • They desire that R accommodates familiar concepts and well-known data models and formats • For statistician users of R, ‘everything’ is a data.frame, a rectangular table with rows of observations on columns of variables • They want their geo-data analysis result can be visualized and presented in main-stream GIS software programs, and thus extend the usage of the result.
In R, classes have grown that look like GIS data models to GIS people, and look and behave like data frames from the point of view of applied statisticians. • We have used the R packages sp, gstatand lattice quite often in the past few weeks • sp: provides classes for importing, manipulating and exporting spatial data in R, and for methods including print/show, plot, subset, names, dim, summary, and a number of methods specific to spatial data handling • gstat: creates gstatobjects that hold all the information necessary for univariate or multivariate geostatistical prediction (simple, ordinary or universal (co)kriging), or its conditional or unconditional Gaussian or indicator simulationequivalents • lattice: a high-level data visualization system with an emphasis on multivariate data
A simplified view Data import Data export Data analysis GIS (MapInfo, ArcGIS, Google Earth, etc.) R GIS (MapInfo, ArcGIS, Google Earth, etc.)
The real world is much more complicated • Combining spatial data from different sources often means that much more insight is needed into the data formats and data models involved • Methods are needed for handling and combining multisource datasets • Transformation of coordinate reference systems • Data import and export • Mash-up for themed studies
2 Classes for spatial data in R • Data frames are containers for data used everywhere in S • view data as a rectangle of rows of observations on columns of variables • New-style (S4) classes were introduced in the S language at release 4 • The central advantage of new-style classes is that they have formal definitions that specify the name and type of the components, called slots, that they contain.
For spatial objects the foundation class is the Spatial class, with just two slots. • The first is a bounding box, a matrix of numerical coordinates with column names c(‘min’, ‘max’), and at least two rows, with the first row eastings (x-axis) and the second northings (y-axis). • The second is a CRS class object defining the coordinate reference system, and may be set to ‘missing’, represented by NA in R, by CRS(as.character(NA)), its default value. • Operations on Spatial* objects should update or copy these values to the new Spatial* objects being created.
Build a simple Spatial object from a bounding box matrix, and a missing coordinate reference system > m <- matrix(c(0, 0, 1, 1), ncol = 2, dimnames = list(NULL, + c("min", "max"))) > crs<- CRS(projargs = as.character(NA)) > S <- Spatial(bbox = m, proj4string = crs)
SpatialPoints • The SpatialPoints class is the first subclass of Spatial in S > getClass("SpatialPoints") Slots: Name: coordsbbox proj4string Class: matrix matrix CRS Extends: "Spatial" Known Subclasses: Class "SpatialPointsDataFrame", directly Class "SpatialPixels", directly Class "SpatialGrid", by class "SpatialPixels", distance 2 Class "SpatialPixelsDataFrame", by class "SpatialPixels", distance 2 Class "SpatialGridDataFrame", by class "SpatialGrid", distance 3
SpatialLines • SpatialLinesobject contains the bounding box and projection information for the list of Lines objects stored in its lines slot. > getClass("SpatialLines") Slots: Name: lines bbox proj4string Class: list matrix CRS Extends: "Spatial" Known Subclasses: "SpatialLinesDataFrame"
SpatialPolygons • A SpatialPolygons object is a set of Polygons objects with the additional slots of a Spatial object to contain the bounding box and projection information of the set as a whole > getClass("SpatialPolygons") Slots: Name: polygons plotOrderbbox proj4string Class: list integer matrix CRS Extends: "Spatial" Known Subclasses: "SpatialPolygonsDataFrame"
3 Handling coordinate reference systems in R • Datum: a surface that represents the shape of the earth • Projection: render the surface of an ellipsoid as a plane • Project and datum together are referred to as a coordinate reference systems (CRS) • Most countries have multiple CRS, often for very good reasons. This section can be covered in the general topic of ‘Spatial data import and export in R’ (next section), but is worthy to be discussed separately
The EPSG List • The European Petroleum Survey Group (EPSG; now Oil & Gas Producers (OGP) Surveying & Positioning Committee) began collecting a geodetic parameter data set starting in 1986, based on earlier work in member companies. • The EPSG list is under continuous development and/or update. A copy of the list is provided in the rgdalpackage, because it permits the conversion of a large number of CRS into the PROJ.4 style description. • Datum transformation is based on transformation to the World Geodetic System of 1984 (WGS84), or inverse transformation from it to an alternative specified datum.
The PROJ.4 CRS Specification • PROJ.4 is an open-source of library of projection functions; it is not part of R or the sp package, and may be used stand-alone or linked in another program. • The PROJ.4 library uses a ‘tag=value’ representation of coordinate reference systems, with the tag and value pairs enclosed in a single character string • The only values used autonomously in CRS class objects are whether the string is a character NA(missing) value for an unknown CRS, and whether it contains the string longlat, in which case the CRS contains geographical coordinates
Browse CRS information of a spatial object > data(meuse) > coordinates(meuse) = ~x + y > str(meuse) We can browse the CRS information either by read the @proj4string slot in the result of str(meuse) or by using the proj4string function > proj4string(meuse) For the meuse data the result is NA
Specify a CRS with the EPSG database (1) > ?meuse The metadata of meuse dataset shows that the coordinates are in the RDH (Rijksdriehook = Dutch triangulation) CRS We use the make_EPSG utility function of the rgdal package to load the denitions into a list, and then search with grep for the string Amersfoort, which is the name of the origin of the Rijksdriehook (Dutch triangulation) coordinate system > EPSG <- make_EPSG() > (EPSG[grep("Amersfoort", fixed = T, EPSG$note), ]) > rm(EPSG) There are two systems ‘RD old’ and ‘RD new’. meuse data uses ‘RD new’, which is corresponding to reference 28992 in the EPSG database > proj4string(meuse) <- CRS("+init=epsg:28992") We can now browse the WRS information of meuse > proj4string(meuse)
Specify a CRS with the EPSG database (2) Unfortunately, the EPSG is not (as of June 2012) up-to-date; in particular it is missing a key issue: the offset of the center of the Earth in the RD system to that in the “WGS84” CRS The Netherlands Geodetic Commission [5] has published the following adjustment +towgs84=565.237,50.0087,465.658,-0.406857,0.350733,-1.87035,4.0812 Add the WGS84 displacement to the CRS definition for the Meuse dataset > proj4string(meuse) <- CRS(paste(proj4string(meuse), + "+towgs84=565.237,50.0087,465.658,-0.406857,0.350733,-1.87035,4.0812")) We can now browse the updated WRS information of meuse > proj4string(meuse)
Result of proj4string(meuse) [1] " +init=epsg:28992 +proj=sterea +lat_0=52.15616055555555 +lon_0=5.38763888888889 +k=0.9999079 +x_0=155000 +y_0=463000 +ellps=bessel +units=m +no_defs +towgs84=565.237,50.0087,465.658,-0.406857,0.350733,-1.87035,4.0812" Now browse the records of the coordinates > head(coordinates(meuse)) Result x y [1,] 181072 333611 [2,] 181025 333558 [3,] 181165 333537 [4,] 181298 333484 [5,] 181307 333330 [6,] 181390 333260
Transformation of CRS • The rgdalpackage provides methods to convert between projections and datums; this dual process we can call “transformation” between CRS • The work is done by the spTransform method, taking any Spatial* object, and returning an object with coordinates transformed to the target CRS.
An example of CRS transformation Transform the Meuse dataset from the Rijksdriehook CRS to geographic coordinates on the WGS84 ellipsoid > meuse.wgs84 <- spTransform(meuse, CRS("+proj=longlat +datum=WGS84")) We can now browse the updated WRS information of meuse > proj4string(meuse.wgs84) Result [1] " +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0" Now browse the records of the coordinates > head(coordinates(meuse)) Result x y [1,] 5.758560 50.99153 [2,] 5.757887 50.99105 [3,] 5.759880 50.99086 [4,] 5.761770 50.99037 [5,] 5.761887 50.98899 [6,] 5.763064 50.98836
4 Spatial data import and export in R • Transformation between spobjects in R and formats of external GIS programs • A number of open source projects made the transformation possible, such as: • Geospatial Data Abstraction Library ( http://www.gdal.org ) • PROJ.4 Cartographic Projections Library ( http://proj.maptools.org )
Vector File Formats • Spatial vector data are points, lines, polygons, and fit the equivalent spclasses. • GIS vector data can be either topological or simple. Legacy GIS were topological, desktop GIS were simple • The spvector classes are simple, meaning that for each polygon all coordinates are stored without checking that boundaries have corresponding points
Using OGR Drivers in rgdal • Using the OGR vector functions of the Geospatial Data Abstraction Library, interfaced in rgdal,lets us read spatial vector data for which drivers are available. • A driver is a software component plugged-in on demand – here the OGR library tries to read the data using all the formats that it knows, using the appropriate driver if available. • OGR also supports the handling of coordinate reference systems directly, so that if the imported data have a specification, it will be read.
The availability of OGR drivers differs from platform to platform, and can be listed using the ogrDriversfunction. The function also lists whether the driver supports the creation of output files. • The readOGRfunction: reads an OGR data source and layer into a suitable Spatial vector object • It takes at least two arguments – they are the data source name (dsn) and the layer (layer), and may take different forms for different drivers. • The writeOGRfunction: allowing data to be written out using supported drivers • Like readOGR, it uses drivers to handle different data formats
Other vector import/export functions (only with shapefiles) • The shapefilespackage is written without external libraries, using file connections. It can be very useful when a shapefile is malformed, because it gives access to the raw numbers. • The maptoolspackage contains a local copy of the library used in OGR for reading shapefiles(the DBF reader is in the foreignpackage), and provides a low-level import read.shape function, a helper function getinfo.shape to identify whether the shapefile contains points, lines, or polygons.
Raster File Formats • There are now a number of R packages that support image import and export • Such as the rimageand biOpspackages and the EBImagepackage in the Bioconductorproject • The requirements for spatial raster data handling include respecting the coordinate reference system of the image, so that specific solutions are needed • Using arguments to readGDAL, subregions or bands may be selected, and the data may be decimated, which helps handle large rasters. • The writeGDALfunction can be used directly for drivers that support file creation.
Other raster import/export functions • There is a simple readAsciiGridfunction in maptoolsthat reads ESRI™ Arc ASCII grids into SpatialGridDataFrame objects; it does not handle CRS and has a single band. The companion writeAsciiGridis for writing Arc ASCII grids. • It is also possible to use connections to read and write arbitrary binary files, provided that the content is not compressed.
5 Examples • Read and write MapInfo data • Export meuse data to Google Earth
Read and write MapInfo data With this example we only show the data import and export First, load the libraries needed > require(sp) > require(gstat) > require(rgdal) > require(maptools) > require(lattice) Set the working directory, where the MapInfo datasets ("CITY_125.DAT“, "CITY_125.ID“, "CITY_125.IND“, "CITY_125.MAP“, and "CITY_125.TAB") located > setwd("E:/RnGIS")
Show the metadata of the layer CITY_125, including CRS information > ogrInfo("CITY_125.TAB", layer="CITY_125") Result Source: "CITY_125.TAB", layer: "CITY_125" Driver: MapInfo File number of rows 125 Feature type: wkbPoint with 2 dimensions Extent: (-157.8583 18.40056) - (-66.10611 61.21806) CRS: +proj=longlat +ellps=GRS80 +towgs84=0,0,0,-0,-0,-0,0 +no_defs Number of fields: 4 name type length typeName 1 City 4 50 String 2 State 4 2 String 3 Tot_hu 0 0 Integer 4 Tot_Pop 0 0 Integer
Read this source and layer into a suitable Spatial vector object > CITY125 <- readOGR("CITY_125.TAB", layer="CITY_125") Returned notice OGR data source with driver: MapInfo File Source: "CITY_125.TAB", layer: "CITY_125" with 125 features and 4 fields Feature type: wkbPoint with 2 dimensions
Show the class of CITY125 > class(CITY125) Result [1] "SpatialPointsDataFrame" attr(,"package") [1] "sp" Show the CRS information of CITY125 > proj4string(CITY125) Result "+proj=longlat +ellps=GRS80 +towgs84=0,0,0,-0,-0,-0,0 +no_defs“
Plot CITY125 in a diagram > plot(CITY125, pch=1, col="red", border="grey60", + axes=TRUE, asp=1, + main="Big cities in USA") Result
Add name of cities in the diagram > text(coordinates(CITY125), + labels=as.character(CITY125$City), + font=2, cex=0.25, pos=4) Result
Export the spatial object to a MapInfo file. dsn="." means exporting to the working folder > writeOGR(CITY125, dsn=".", + layer="CITY125RwriteToMapInfo", + driver="MapInfo File", overwrite_layer=TRUE) Result: in the working folder there will be a list of generaged MapInfo files
Export meuse data to Google Earth • Here we show a simple example of exporting meuse data Zn values as a KML file. In coming lab there will be a few enriched examples with exporting result of kriging predication into Google Earth Load the meuse data > data(meuse) > coordinates(meuse) = ~ x + y Specify the CRS infromation > EPSG <- make_EPSG() > (EPSG[grep("Amersfoort", fixed=T, EPSG$note), ]) > proj4string(meuse) <- CRS("+init=epsg:28992") > proj4string(meuse) <- CRS(paste(proj4string(meuse), + "+towgs84=565.237,50.0087,465.658, + -0.406857,0.350733,-1.87035,4.0812"))
Transform the Meuse dataset from the Rijksdriehook CRS to geographic coordinates on the WGS84 ellipsoid > meuse.wgs84 <- spTransform(meuse, + CRS("+proj=longlat +datum=WGS84")) Export the point coverage of Meuse Zn values as a KML file. > writeOGR(meuse.wgs84["zinc"], "meuseZnPoints.kml", + "zinc", driver = "KML", overwrite_layer = TRUE) In the working folder there will be a KML file generated, which can be opened in Google Earth
We can also export the result of predication to Google Earth (details will be in the coming Friday lab)
Reading assignments for this week • Handout: Chapter 4 Spatial Data Import and Export. In: Bivand, R.S., Pebesma, E.J., Gómez-Rubio, V., 2008. Applied Spatial Data Analysis with R. Springer, New York, NY, 374pp. • Chapter 9 Selecting and Querying Data. In: MapInfo Professional 11.0 User Guide
Next classes • Friday class: • Interfacing R and GIS – practices • Next week • no class – Grand Marshal week • Term assignment