320 likes | 478 Views
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler. Michael P. Finn. High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014. Collaborators. Shaowen Wang, Anand Padmanabhan , Yan Liu
E N D
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler Michael P. Finn High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014
Collaborators • Shaowen Wang, AnandPadmanabhan, Yan Liu • University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial Information Laboratory • David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel • USGS, Center of Excellence for Geospatial Information Science (CEGIS) • Kristina H. Yamamoto • USGS, National Geospatial Technical Operations Center • BabakBehzad • UIUC, Department of Computer Science • Eric Shook • Kent State University, Department of Geography • Qingfeng (Gene) Guan • China University of Geosciences
Where Do We Want to Go? • Geospatial Analytics • Spatial Modeling • Geovisualization (GeoViz/ Visual Analytics) • For Decision Makers (agencies/ citizens) • Protect natural resources • Empower cultures • Provide for our future
So: • Where have we been? • Where are we now? • Where do we want to go?
Data • Analog Digital • “Big” Data • Spatial Data (geometric structure) • Data: Open? – mostly • Findable, Accessible, Exploitable (standard format) • Example: USGS Data holdings • 8 Layers of the National Map • Soon: Hyperspectral cubes and LiDAR point clouds
The National Map- Elevation: Quality Levels http://nationalmap.gov/3DEP/neea.html
Big Spatial Data • Geographic data of high resolution and covering large areas creates big spatial data • Remotely-sensed images • One-meter resolution NAIP images for Dent County, Missouri (1,955 km²) require 800 GB of storage space (more than 4 Pb equivalent for U.S.) • Atlanta footprint of 0.33 m resolution color images is almost 1 Tb of data • Satellite images with finer than one meter resolution • LiDAR data of level 1 (8 pts per square meter), level 2 (2 points per square meter)
Big Spatial Data • USGS 3DEP – Level 2 LiDAR for all of U.S. except Alaska which is acquiring level 5 IfSAR • Data volume for point cloud, intensity images, and bare Earth elevation model – 7 to 9 petabytes • Processing and file creation usually doubles to triples the storage requirements • Other geospatial data – USGS National Hydrography Dataset based on 1:24,000 scale about 700 GB (equivalent resolution 12 m; accuracy 25 m RMSE) • New project to extract hydrography from level 2 lidar • How big will the vector (< 1 m Resolution) dataset be that results?
Software • Computer compiled/ scripting languages • Manipulate data • Software • Commercial? Open? Modifiable code? Functional? • Tools: SAS (SPSS)/ R/ MATLAB, etc., etc….. • GIS Software: Esri ArcGIS/ QGIS • and image processing S/W: Imagine/ ENVI • Libraries: GDAL • Example software: mapIMG (based on CGTP; open)
Geospatial Methods, Technologies, and Applications • Analytical Cartography • Mathematical Cartography • Since roughly the 18th Century • Quantitative Geography • Since 1960s • GIS (and image processing S/W) • Since about the 1970s • combining data & software GIS Packages • Legacy of primarily commercial software • Open Source Software • Since roughly 1980s • OpenGIS? • early wide-spread but often spotty “open” GIS • Foundation for maturity, expansion, and further openness
Here we are/ where are we going? • Open GIS: Technology and Applications (exploitable) • Hardware and Operating Systems evolving • Data Storage trying to keep pace with Big Data • Advanced GeoViz on cusp of exploding • HPC High-Performance Spatial Computing • Increasing Spatiotemporal fidelity • Cyberinfrastructure
CyberGIS • Cyberinfrastructure(eScience) • HPC & GIScience • A balance/ interaction between theory/ data (Rey, 2013) • Collaborative Research • Standards (for interoperability)
NSF CyberGIS Project • NSF Software Infrastructure for Sustained Innovation Award • http://cybergis.org • USGS/ CEGIS Participation • Cyberinfrastructure resources • XSEDE • Blue Waters supercomputer allocation • Open Science Grid • Integration • CyberGIS Toolkit • CyberGIS Gateway • GISolve middleware services
CyberGIS Software Environment From Liu et al. (2014)
CyberGIS Toolkit Software Components PABM – Parallel Agent-Based Modeling pRasterBlaster – Parallel Map Reprojection Parallel PySAL(Python Spatial Analysis Library) Spatial Text An open and reliable software toolbox for high-end users Hide compute complexity A rigorous software building, testing, packaging, and deployment framework Focused on computational intensity, performance, scalability, and portability in various CI environments Easy to configure and use
Scalable Raster Processing • Need for scalable map reprojection in CyberGIS analytics • Spatial analysis and modeling • Distance calculation on raster cells requires appropriate projection • Visualization • Reprojection for faster visualization on Web Mercator base maps • pRasterBlaster integration in CyberGIS Toolkit and Gateway • Software componentization: librasterblaster, pRasterBlaster, MapIMG • Build, test, and documentation • Gateway user interface
Performance Profiling • Performance profiling is an important tool for developing scalable and efficient high performance applications • Performance profiling identified computational bottlenecks in pRasterBlaster • Demonstration of one example of the value of profilers for pRasterBlaster in the next slides
A Computational Bottleneck: Analysis • Spatial data-dependent performance anomaly • The anomaly is data dependent • Four corners of the raster dataset were processed by processors whose indexes are close to the two ends • Exception handling in C++ is costly • Coordinate transformation on nodata area was handled as an exception • Solution • Remove C++ exception handling part
A Computational Bottleneck: Summary • Symptom • Processors responsible for polar regions spent more time than those processing equatorial region • Cause • Corner cells were mapped to invalid input raster cells generating exceptions • C++ exception handling was expensive • Solution • Removed C++ exception handling • Corner cells need not to be processed • They now contribute less time of computation
pRasterBlaster Component View CyberToolkit pRasterBlaster librasterblaster MapIMG via API Cyberinfrastructure Service Providers GIS Programmers End Users
Performance • Test: • On an XSEDE supercomputer (Trestles at the San Diego Supercomputing Center) • Using a parallel file system (Luster) and MPI I/O (vs. traditional Network File System (NFS)) • 40GB data • Processor cores were increased from 256 to 1024
Obstacles, Issues, Challenges • Parallel I/O (particularly raster) is the proverbial long pole in tent • Raster decomposes nicely (embarrassingly parallel) • File I/O (especially output file re-composition) is a huge bottleneck • Lessons learned; one of our prime contributions to the community (to date): optimized parallel I/O for raster • GeoTIFF(SPTW – Simple Parallel TIFF Writer) led by David Mattli, USGS • HDF5 parallel work by BabakBahzad, UIUC
Computational Challenges • Converting legacy (linear) code to HPC (parallel) environment requires a lot of skilled manpower • Scaling to large-scale analysis using HPC resources is difficult • Cyberinfrastructure-based computational analysis needs in-depth knowledge and expertise on computational performance profiling and analysis
Geospatial AnalyticsSpatial Modeling/ Geovisualization • Solving “Changing World” Problems • Smart Decisions • Protecting Natural Resources • Democratizing Science • Empowering cultures • Products and Services for society and its citizens Data & Software Solving (Geospatial) Problems
References • Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012).A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data.Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH. • Finn, Michael P., Yan Liu, David M. Mattli, BabakBehzad,Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, AnandPadmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag. • Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and BabakBehzad(2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia. • Liu, Yan, Michael P. Finn, BabakBehzad,andEric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Batltimore, Maryland. • Liu, Yan, AnandPadmanabhan, and ShaowenWang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: 10.1002/cpe.3256. • Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February. • http://cegis.usgs.gov/ • http://nationalmap.gov/3DEP/ • http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php • http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page • http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler Questions? http://cegis.usgs.gov/index.html High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014