400 likes | 815 Views
Geospatial Data and Spatial Data Analysis Tools For Ecologists. University of California – Santa Barbara www.nceas.ucsb.edu Rick Reeves / March 17, 2005. Presentation Goals. Overview: Geospatial Data Analysis Defining and distinguishing between spatial, geospatial, geographic data
E N D
Geospatial Data and Spatial Data Analysis Tools For Ecologists University of California – Santa Barbara www.nceas.ucsb.edu Rick Reeves / March 17, 2005
Presentation Goals • Overview: Geospatial Data Analysis • Defining and distinguishing between spatial, geospatial, geographic data • Addressing the particular attributes of geospatial data • Inventory of Geospatial Data Types • Primary data types and common sources for data • Survey of Geoprocessing Software Tools • Key issues driving choice of geospatial processing software • A Tour of NCEAS Scientific Computing Web Site • Spatial Datasets, Tools, Tutorials, and Project Archives • Some Examples: Geospatial Data Analysis at NCEAS • From the Annals of the NCEAS Scientific Programmer: ‘Real World’ solutions to Ecological research challenges
Meet the Scientific Programmer • Rick’s Academic and Professional Background • Undergraduate: Environmental Remote Sensing • Graduate: Spatial Operations Research / Location-Allocation Heuristic Development • Spatial Modeling branch of Geographic Data Analysis • Problem Domain: Transportation and Facility Location within networks • Professional: Software Development, geospatial database development, training curriculum development
Spatial Data: A Hierarchical Definition • Spatial Data • Observations are distributed in multidimensional space • X / Y / Z coordinates attached to each data element • Geospatial Data • Spatial Data with attached Geographic coordinates • Latitude / Longitude, UTM • Optional: data subjected to a map projection transformation • Geographic Data • Geospatial Data that captures ‘Earth System’ phenomena • Terrain height • Drainage Network • Land surface cover or urban Land Use • Meteorological / climate data forecasts • Ecologists may work with any or all during a project
Overview: Geospatial / Geographic Data • Two Broad Primary Categories • Raster: A multi dimensional, regularly-spaced grid of values (samples) • Dimensions: Northing, Easting, Altitude, Time • Examples: Satellite Image, Digital Terrain, land surface cover maps • Vector: Three primary shapes stored in drawing-optimized format • Point, Line, Polygon, (TIN, vector field) • Thousands of datasets exist in hundreds of formats • Remote Sensing Imagery / Digital Elevation Models • Surface Features (political, physiographic) as points/lines/polygons • Meteorological data (observed / forecasted (short-and long-term)) • File format standards set by Industry, Government, user community • Data Ingestion: First Step in Geospatial Analysis • Data input / format conversion / spatial registration
Geospatial Data Analysis • Geospatial Information Analysis: 3 Categories • From O’Sullivan & Unwin (2003) • Spatial Data Manipulation:Investigate the relationships between geographic dataset layers • Examples: ‘point-in-polygon’, buffer zones around spatial features • GIS software typically used to view/ manipulate / create layers • Spatial/Statistical Data Analysis: Descriptive and Explanatory: What is there? How do we categorize it? • Data points treated as statistical ‘population’, compared to others • Spatial Modeling: Construct models to explore and understand geospatial systems • Based on ‘abstraction’ of domain-specific problem into a systems framework. Some examples: • Predicting network flows; optimizing facility locations among demands • Lessons learned building model as valuable as model’s ‘answers’
The Challenge of Geospatial Analysis • Geospatial Data violate some key statistical assumptions • Must be addressed in the experimental design and sampling scheme • Require specialized assessment techniques to factor out effects • Spatial Autocorrelation • Samples are NOT randomly selected from normally-distributed population • In fact, nearby samples more likely to be similar than distant ones • Autocorrelated data points introduce redundancy into the sample set • Spatial Scaling • AKA Modifiable Areal Unit Problem • Statistical relationships in an area may change at different aggregations • The placement of sampling grid can introduce artifacts • Nonuniform sampling space, edge effects • Geospatial Data Attributes have explanatory power • Spatial relationships may be causes for observed phenomena
Selecting Geospatial Software Tools • Geospatial software: layered software architecture • Data layer: Efficiently store geospatial data • Feature Set + spatial coordinates • Analytic Layer: Spatial/statistical analysis algorithms • Statistical packages increasingly contain geospatial analysis tools • Visualization Layer: Creates data views (AKA maps) • Geospatial tools broadly divided in two categories • Geographic Information Systems (GIS) • Three software layers are each extensive, ‘feature rich’ • Geospatial Analysis Packages • Data layer is ‘thinner’, Analytic layer ‘thicker’ • Visualization layer built on existing data plotting tools
Geospatial Software Tools: GIS ‘Value Added’ • Data layer is optimized for efficient geospatial data storage/processing • Raster and Vector Data storage, ‘mixed mode’ operations • Georeferencing tools for data layer projection, spatial registration • Map Algebra tools foster analysis and creation of data layers • Comprehensive cartographic tools for output map design
Geospatial Software Tools: GIS Caveats • Underdeveloped geostatistical processing tools • Vendors pressured to include them in product • Yet validation data and algorithm details not available • Often, these are critical tools for ecological analysis • Steep Learning Curve • Identifying, mastering ‘essential’ features a challenge • Cost: GIS Software can be expensive • Upfront purchase and yearly license fees • Time investment in training and data maintenance • Workload • If non-GIS must be used for part of analysis, time must be spent moving between s/w packages
Geospatial Software Tools: GIS Caveats • Underdeveloped geostatistical processing tools • Vendors pressured to include them in product • Yet validation data and algorithm details not available • Often, these are critical tools for ecological analysis • Steep Learning Curve • Identifying, mastering ‘essential’ features a challenge • Cost: GIS Software can be expensive • Upfront purchase and yearly license fees • Time investment in training and data maintenance • Workload • If non-GIS must be used for part of analysis, time must be spent moving between s/w packages
Geospatial Software Tools: Choosing • Some Suggested Selection Criteria • Research Objectives should drive choice of tools • Identify the project’s core geospatial processing needs • Platform Flexibility • Select tools supported on multi-platforms (hardware/OpSys) • Widely supported/used platforms foster collaberation • Solution ‘Visibility’ • Can you obtain the details of the algorithm? • Does the community recognize the accuracy of the algorithm? • Costs of implementing your research idea in software • Scripted solutions using integrated environments are best • R, SAS, MATLAB • Avoid development in high-level programming languages
Geospatial Software Tools: Choosing • Select GIS for core needs: • Construct, compare, create multiple spatial data layers • Simultaneously analyzing vector and raster data • Creating detailed production quality study site maps • Your data is exclusively in the GIS product format • You require spatial analysis tools unavailable outside GIS • Select Geospatial Analysis tools for core needs: • Spatial/Statistical data analysis is the focus • Your mapping requirements are modest • two-dimensional data plots with geographic coordinates, legend • You need in-depth understanding of algorithms used • Or, you wish to extend / modify the algorithms
Sources for Geospatial Software Tools • Commercial Software Products • For-profit corporations sell or license their software • Major players produce comprehensive products • ESRI ArcGIS is the dominant GIS vendor • Their goal: Provide solution for every geospatial application • Other vendors offer tailored solutions • Examples: ENVI / IDL, ERDAS: Remote Sensing oriented GIS • Example: S Plus Spatial Statistics: Geospatial statistics and spatial data visualization enhancements to statistical package • Example: MATLAB has mapping and image processing toolkits • Example: SAS offers GIS, geospatial software tools • Commercial products often drive geospatial data formats • Example: ESRI Shape File, ERDAS IMG file
Sources for Geospatial Software Tools • Open Source Software • Broad-based effort by worldwide scientific and research community • Distributed under General Public License (GPL) • Software development and maintenance by the user community • Most significant geospatial analysis products: R, GRASS GIS • Examples of others: PostGIS, GDAL libraries • Visit FreeGIS.org, or the open software foundation sites.
Tradeoffs: Commercial GIS Software • Centralized documentation and product support….. • At a price of $100s to $1000s per year • Comprehensive, integrated software product • Data/Analytic/Visualization layers populated w/ features • Steep learning curve: Where are my ‘essential features?’ • Training always available – at a cost…. • Details of proprietary geospatial algorithms usually unavailable
Tradeoffs: Open Source GIS Software • Open Source Software • Distributed under General Public License (GPL) • Software development and maintenance by the user community • Most significant geospatial analysis products: R, GRASS GIS • Many applications available via the Internet but…. • Quality, features, support, and documentation are inconsistent • Algorithms and even source code are freely available • Open Source software drawbacks are shrinking as user support community evolves and matures • But active participation in the community is advised for those wishing to stay technically proficient
Sources for Geospatial Data • Government Agencies • National Mapping and Survey Agencies: surface cover data • USGS • Research Centers: Climate forecasting models • NOAA, NASA, NCDC • For-Profit Corporations • The highest-quality UNCLASSIFIED imagery now acquired by the private sector • Sometimes, no-cost government data is resold to public • Data widely available via the Internet • Many data sets available at no- or low-cost • Notable Exception: Satellite Remote Sensing data • Some discounts available to education and/or research entities • The best sites allow ‘search by geographic coordinates’ • Examples from NCEAS Scientific Computing web site
Popular Geospatial Data Formats • Meteorological and Climatalogical Data • Historical measurements • Short-term model-based forecasts (3 – 10 days from now) • Long-term predictions (10 – 100 years): General Circulation Models • Widely-Used Formats: Gridded Binary (GRIB), NetCDF • Political and Physiographic features • Country Boundaries • Road Networks • Drainage Networks • Widely-Used Formats: Digital Line Graphs (DLG), ESRI Shape Files (.shp) • Most GIS/Geospatial packages ingest these formats • Or conversion utilities are available to ingest them
Popular Geospatial Data Formats • Remote Sensing Imagery • Many operational systems provide many kinds of images • Multispectral Imagery: Landsat, SPOT, IKONOS • Data Formats tend to be sensor-specific • Most GIS can ingest most imagery types • Portal sites Commercial: http://www.vterrain.org/Imagery/commercial.html Govt: http://www.nationalgeographic.com/maps/map_links.html • Digital Terrain Models • Raster Grid datasets containing elevation measurements • Available for complete Earth land surface • Primary format: USGS Digital Elevation Model (DEM) • AKA National Elevation Dataset (NED) • Portal sites: USGS: http://gisdata.usgs.net/Website/Seamless/ Terrainmap.org: http://www.terrainmap.org/
Tour of the Scientific Computing Web Site • Links to Data Sources • Links to Geospatial Software Sources • Links to Tutorials and Research Papers • Archive of NCEAS Research Projects http://www.nceas.ucsb.edu/scicomp
Example: Spatial Modeling: Optimization • Route vehicles along network using environmental costs as a metric • Simultaneously locate facilities along shipment routes that mitigate environmental costs • Optimal Location of species reserve sites • Develop and compare performance of alternate solution methods • Mathematically optimal but operationally impractical • Heuristicallyderived Near-optimal, usable solution
Selecting Species Reserves Locations Dr. Ross Gerrard, UCSB Biogeography Lab, 1996
Example: Spatial Data Manipulation • Elevation zone threshold calculation • Digital Elevation Models for selected worldwide sites • Classify sites into 100 meter ‘wide’ elevation zones • General Circulation Model climate data extraction • Identify, obtain, import GCM data files • Import the data into GIS as raster grid • Overlay point file, extract matching climate values
Spatial Analysis: Arc GIS and R Platforms • ESRI Shape files exported to the R programming environment • R Geostatistical and Spatial Analysis methods can then be applied
A Sampling: R Geospatial Analysis packages • clim.pact: Climate data analysis and downscaling tools • GeoR: Geostatistical Data Analysis: variograms, et. al • maptools: read/manipulate polygon data (ESRI .shp) • shapefiles: read/manipulate ESRI shape files • sgeostat: Geostatistical modeling code • splancs: Spatial and space-time point patterns • spstat: Spatial Point Pattern analysis
Concluding thoughts • NCEAS Associates are extensively use geospatial data in many creative ways • Geospatial Data Analysis requires specialized techniques • GIS and geospatial analysis available from commercial vendors and open source community • Choosing geospatial data and tools can be overwhelming and distract from the primary ‘science mission’ • Scientific Programming Team has geospatial expertise, and can assist NCEAS Associates in this domain • Coming soon: Short course on the R Programming Language!