310 likes | 343 Views
ESMPy and OpenClimateGIS: Python Interfaces for High Performance Grid Remapping and Geospatial Dataset Manipulation. Ryan O ’ Kuinghttons, Ben Koziol, Robert Oehmke Cecelia DeLuca, Gerhard Theurich Peggy Li, Joseph Jacob Cooperative Institute for Research in Environmental Sciences
E N D
ESMPy and OpenClimateGIS: Python Interfaces for High Performance Grid Remapping and Geospatial Dataset Manipulation Ryan O’Kuinghttons, Ben Koziol, Robert Oehmke Cecelia DeLuca, Gerhard Theurich Peggy Li, Joseph Jacob Cooperative Institute for Research in Environmental Sciences NOAA Environmental Software Infrastructure and Interoperability Project European Geosciences Union General Assembly Vienna, Austria April 22, 2016
ESMF and ESMPy • The Earth System Modeling Framework (ESMF) is open source software for building modeling components, and coupling them together to form weather prediction, climate, coastal, and other applications. • Provides infrastructure for time management, data communications, metadata and I/O, running models as web services, grid remapping • Supports a full Fortran and limited C and Python interfaces • ESMF provides a mature high performance regridding package • Transforms data from one grid to another by generating and applyinginterpolation weights • Supports structured and unstructured, global and regional, 2D and 3D grids, with many options • Fully parallel and highly scalable • The Python interface to ESMF (ESMPy) offers access to the regridding functionality and other related features of ESMF.
OCGIS • OpenClimateGIS (OCGIS) is a standalone Python package enabling dynamic access to and manipulation of high resolution climate data • Subsetting, coordinate transformations, temporal averaging, and other computations • Data conversions between CSV, Shapefile, GRIDSPEC, and UGRID • Data conversions between ESMPy and OCGIS bring together GIS capabilities with high performance regridding functionality to create a more unified set of Python tools for Earth system modeling • One area of interest is connecting high resolution hydrological models with the high performance climate models
ESMPy Overview • High performance regridding is applied as a callable Python object • NumPy array access to distributed data (parallelism for FREE) • Many regridding methods including first-order conservative • Data objects can be created from NetCDF files in standard metadata formats • Supported grids and methods for regridding with ESMPy include: • Bilinear, higher order patch [1,2], first order conservative[3], or nearest neighbor regridding • Global or regional 2D or 3D logically rectangular Grids • 2D or 3D unstructured Meshes composed of triangles, quadrilaterals or hexahedrons • 1D streams of observational data or unconnected sets of points (LocStream) 2D Unstructured Mesh From www.ngdc.noaa.gov Regional Grid FIM Unstructured Grid
OpenClimateGIS Overview • Python package designed to ease the “localization” and accessibility of high-dimensional scientific datasets • Primary Features: geospatial subsetting, standardized calculation, bundling, format conversion, access to OpenDAP datasets. • Additional dependencies: • GDAL, Shapely, Fiona, netCDF4, osgeo Developed by the NESII Group in association with the NCPP Project under funding provided by the NOAA Climate Program Office. https://www.earthsystemcog.org/projects/openclimategis/ https://github.com/NCPP/ocgis
ESMPy – OCGIS Integration • ESMPy and OCGIS have complementary capabilities • OCGIS allows access to and manipulation of high resolution data sets • ESMPy provides high performance regridding and access to distributed NumPy data • There are several ways to create an integrated workflow • OCGIS can preprocess data files and convert between data formats • ESMPy Field object is an output format of OCGIS • ESMPy can read OCGIS outputs (NetCDF) in parallel, for high performance regridding • OCGIS offers serial regridding using ESMPy • Parallel processing requires clever use of integrated capabilities… • OCGIS is implemented and used in single processor mode • ESMPy is fully parallel IF objects are created in parallel • Conversion between serial and distributed objects is next..
Integrated Workflow Example ** Green text indicates steps that can be done in serial or parallel 1: Preprocess files using OCGIS (subsetting) 2: Read distributed ESMPy objects ESMF command line application allows parallel regrid weight generation with output to file-based output in a single step Object processor ID Data file 0 1 2 3 4: Write parallel object to files for use by downstream applications 3: Compute and apply regridding weights Object processor ID 0 1 Data file 2 3
Supported Data Conventions ESMPy grid files use the following standard data file formats: • Climate and Forecast (CF) grid conventions • UGRID - candidate CF convention for unstructured grids [3], used to represent grids with arbitrary polygons with no gaps • GRIDSPEC – accepted CF convention for logically rectangular grids [4] • SCRIP – Spherical Coordinate Remapping and Interpolation Package [5] • Legacy format for 2D logically rectangular or 2D unstructured grids • ESMF • Custom format for unstructured grids, more efficient storage than SCRIP or CF when used with ESMF codes OCGIS has a rich set of conversion routines between the following: • CF grid conventions (above) • Shapefile – geospatial vector data format used by GIS software [6] • CSV – comma separated value
Interfaces ESMPy has objects for data (Field) and underlying distribution (Grid/Mesh): • Grid - logically rectangular discretization object grid=ESMF.Grid(filename=“gridspec.nc”, filetype=ESMF.FileFormat.GRIDSPEC) grid=ESMF.Grid(max_index=numpy.array([7,8,9]),coord_sys=ESMF.CoordSys.CART) • Mesh - unstructured mesh discretization object mesh = ESMF.Mesh(filename=“ugrid.nc”, filetype=ESMF.FileFormat.UGRID) • Field – data object built on a grid or mesh with optional mask • derived type of numpy.ndarray field = ESMF.Field(dstgrid, "dstfield”, meshloc=ESMF.MeshLoc.ELEMENT, ndbounds=[1, 365, 1]) OCGIS has a very compact interface for a wide range of capabilities: ops = ocgis.OcgOperations(dataset=rd, geom=path_ugid_shp, select_ugid=select_ugid, agg_selection=True, prefix='subset_nc', output_format='nc’, add_auxiliary_files=False)
Regridding r1to2 = Regrid(field1, field2, regrid_method=RegridMethod.CONSERVE) where: f(phi,theta) = 2 + cos(theta)**2 * cos(2*phi) Source grid: fv1.9x2.5_050503.nc - 1.9x2.5 CAM finite volume grid Destination grid: wr50a_090614.nc - Regional 205x275 grid Mean relative error Maximum relative error Conservation error = 3.19E-03 = 1.93E-02 = 7.11E-15
Conservative Regridding • Conservative regridding is important in Earth system modeling to preserve the total integral of a field throughout the operation (e.g. water content) • The algorithm used by ESMF computes interpolation weights between cell i on the source grid and j on the destination grid using: where fij is the fraction of the source cell contributing to the destination cell and Ai and Aj are the relative areas of the source and destination cells. • Options exist for: • Using internally computed (default) or user supplied areas • Computing areas and distances using great-circle (default) or straight line distances on the surface of the sphere
Enabling Hydrological Studies • Hydrological impact studies can be improved when forced with data from climate models; hydrological feedbacks can affect climate • A technology and scale gap exists: • Many hydrological models have limited scalability, run on desktop computers, and have watershed-sized domains • Many climate models are highly parallel, run on high performance supercomputers and have global domains • However, scales are slowly converging (e.g. high resolution climate models, hydrological systems of greater extent) • Provides scientists opportunities to explore new coupled model configurations and modes of coupling • Provides programmers opportunities to develop tools to handle this coupling interface
High Resolution Data Task: Subset high resolution climate precipitation data to local scale and then regrid to catchment basins Source data: CF formatted precipitation data file for the continental United States on a logically rectangular grid (nldas_met_update.obs.daily.pr.1990.nc) Output: Multi-dimensional precip values (including time) on a subset of catchment basins in region of interest after conservative regridding
High Performance Results • Test done on IBM iDataPlex (yellowstone) with 128 and 256 cores • Source grid has 2,647,454 elements with up to 58396 nodes • Weight file generation takes minutes, application takes seconds Conservative regridding result with CONUS NHDPlus catchments using exact solution:
Status and Future Work • Both ESMPy and OCGIS are in production and fully supported • Upcoming development: • Read and write ESMF formatted weight files • Write ESMF Fields in parallel • Seamless conversions between serial and distributed objects in ESMPy • Python 3 support
Requirements, Supported Platforms, Limitations, etc... Requirements: ESMPy: • Python 2.6, 2.7 • Numpy 1.6.1/2 (ctypes) • ESMF installation (with NetCDF) • OCGIS (additional dependencies): • netCDF4 • Shapely • Fiona • osgeo Testing: • Nightly regression testing • Travis CI integration Supported Platforms: • Linux, Darwin, and Cray • Gfortran • OpenMP • Linux, Darwin, Windows Installation: • ESMPy: python setup.py build --ESMFMKFILE=<path_to_esmf.mk> install • OCGIS: python setup.py install conda install -c conda-forge esmpyocgis
Selected Users • UV-CDAT (PCMDI) – Ultrascale Visualization Climate Data Analysis Tools • cfpython (University of Redding) – Implementation of the CF data model for reading, writing and processing of data and metadata • Iris (Met Office) – Python library for visualizing meteorological and oceanographic data sets. • PyFerret (NOAA) – Python based interactive visualization and analysis environment • Community Surface Dynamics Modeling System (CU-Boulder) – Tools for hydrological and other surface modeling processes • OCGIS – climate4impact portal (IS-ENES): Tools for climate modelers to tailor high resolution climate data • OCGIS – ClimatePipes (kitware): User- friendly data access, manipulation, analysis and visualization of community climate models
Contact Us! Email:esmf_support@list.woc.noaa.gov or ocgis_support@list.woc.noaa.gov Website:https://earthsystemcog.org/projects/esmpy/ or https://earthsystemcog.org/projects/openclimategis/ References: Khoei S.A., Gharehbaghi A. R., The superconvergent patch recovery technique and data transfer operators in 3d plasticity problems. Finite Elements in Analysis and Design, 43(8), 2007. Hung K.C, Gu H., Zong Z., A modified superconvergent patch recovery method and its application to large deformation problems. Finite Elements in Analysis and Design, 40(5-6), 2004. D. Ramshaw, Conservative rezoning algorithm for generalized two-dimension meshes. Journal of Computational Physics,59, 1985 UGRID documentation: https://github.com/ugrid-conventions/ugrid-conventions, accessed Dec. 19, 2014 GridSpec whitepaper: https://ice.txcorp.com/trac/modave/wiki/CFProposalGridspec, accessed Dec. 19, 2014 Jones, P.W. SCRIP: A Spherical Coordinate Remapping and Interpolation Package. http://www.acl.lanl.gov/climate/software/SCRIP. Los Alamos National Laboratory Software Release LACC 98-45 Shapefile whitepaper: http://www.esri.com/library/whitepapers/pdfs/shapefile.pdf, accessed Dec. 19, 2014
ESMPy Regridding Plotting the solution with matplotlib shows error on the order of 10-7
OCGIS Utilities https://github.com/NCPP/ocgis/blob/master/examples/ipynb/USGS-Tech-Stack-20151114.ipynb
OCGIS Utilities https://github.com/NCPP/ocgis/blob/master/examples/ipynb/USGS-Tech-Stack-20151114.ipynb
OCGIS Utilities https://github.com/NCPP/ocgis/blob/master/examples/ipynb/USGS-Tech-Stack-20151114.ipynb
OCGIS Utilities https://github.com/NCPP/ocgis/blob/master/examples/ipynb/USGS-Tech-Stack-20151114.ipynb
OCGIS Utilities https://github.com/NCPP/ocgis/blob/master/examples/ipynb/USGS-Tech-Stack-20151114.ipynb
OCGIS Utilities https://github.com/NCPP/ocgis/blob/master/examples/ipynb/USGS-Tech-Stack-20151114.ipynb
OCGIS Utilities https://github.com/NCPP/ocgis/blob/master/examples/ipynb/USGS-Tech-Stack-20151114.ipynb
ctypes bindings to ESMF • ESMPy is connected to ESMF using ctypes bindings to the C interface Interfacingwithctypes: _ESMF.ESMC_GridGetCoord.restype = ctypes.POINTER(ctypes.c_void_p) _ESMF.ESMC_GridGetCoord.argtypes = [ctypes.c_void_p, ctypes.c_int, ctypes.c_uint, numpy.ctypeslib.ndpointer(dtype=numpy.int32), numpy.ctypeslib.ndpointer(dtype=numpy.int32), ctypes.POINTER(ctypes.c_int)] gridCoordPtr = _ESMF.ESMC_GridGetCoord(grid.struct.ptr, coordDim, staggerloc, exclusiveLBound, exclusiveUBound, ctypes.byref(lrc)) # adjustboundstobe 0 based exclusiveLBound = exclusiveLBound - 1 Allocating Numpy array buffers for memory allocated in ESMF: buffer = numpy.core.multiarray.int_asbuffer( ctypes.addressof(pointer.contents), numpy.dtype(ESMF2PythonType[self.type]).itemsize*size) array = numpy.frombuffer(buffer, ESMF2PythonType[self.type]) Switching between Fortran and C array striding: array = numpy.reshape(array, self.size_local[stagger], order='F')