230 likes | 462 Views
Best Practices to Promote Data Interoperability. Chris Lynnes Joe Glassy Technology Infusion Working Group. Outline. Data interoperability: what and why? Factors affecting data interoperability Implementations that support interoperability. What is Data Interoperability?.
E N D
Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group
Outline • Data interoperability: what and why? • Factors affecting data interoperability • Implementations that support interoperability
What is Data Interoperability? Data interoperability exists when a data user is able to work with (view, analyze, process, etc.) a data provider's science data or model output “transparently,” without having to reformat the data, write special tools to read or extract the data, or rely on specific proprietary software. Quicker data usability, easier portability, more transparency – S. Volz
Illustration: Panoply • DATASET COMPARISON • North American Reanalysis from NCDC • Atmospheric Infrared Sounder (AIRS) from GES DISC PROCEDURE Cut and paste NARR OPeNDAP URL Double-click variable to display Repeat for AIRS
What good is data interoperability? • Makes it easier to write tools that work with many datasets... • ...Which increases the ability to work with multiple datasets together... • ...And promotes user-satisfaction and early experiences with ( {your|my|our} data)... • ...Which enhances a dataset’s life-cycle economics.
There is no single path to interoperability… Factors affecting data interoperability
File Formats • Standard formats • More economical to develop general tools • Format is well documented • APIs* exist • Many datasets enabled by one set of code modules • “Self-describing” formats • Contain embedded metadata to interpret the content, context, and/or structure of the file *Application Programming Interfaces
File Structures • Coordinates: where and named how? • Latitude, longitude • Vertical dimension: altitude, pressure, sigma level, depth, ... • Time • Flat vs. hierarchical • Simple vs. complex
Usage Metadata • Inside file vs. separate file • Easy for users to lose a separate file • A key benefit of self-describing formats • Variable-level metadata • Units • Fill Value • Scale / offset • File-level metadata • Standards (e.g., CF-1, HDF-EOS, ISO 19115)
Grids • Common grids enable dataset comparison, merging, etc. • Reprojection from one grid to another usually loses information • Tradeoff • Most appropriate grid for a dataset vs.... • ...most commonly used grid in the “community” • Keep in mind that the potential community may be much broader than you think
Names and Units • Variable names • Standard names (CF-1) • Unique names within file • Some tools have difficulty with hierarchies having variables with the same name in different branches • Dimension / coordinate names • Latitude, longitude, time, altitude/pressure • Unit names • Standard units • Unit conversion • Note that altitude <-> pressure requires additional information • Filenames • Descriptive filenames: dataset, version, data date/time…
Sidebar: Data Identifiers • Filenames, even descriptive ones, may not be completely reliable as unique identifiers • Identifiers are ideally embedded within the data file • Uniquely identifying datasets and data files helps: • Catalog interoperability • Transparency / provenance • Citation metrics • See Ruth Duerr’s talk on recommendations for unique identifiers for datasets and granules • Future tools may make use of these embedded identifiers: look up references, get related data...
CF-1 • Climate-Forecast convention • Popular in modeling community • Extending to point and satellite data • Coordinate system: Key for tool usage • Latitude + longitude • Specifications for both regular L3 grids and L2 swaths • Time, vertical • Recognizable via units (e.g. “degrees_north”) • Standard variable names: Key for model incorporation • Most often associated with netCDF • Also applicable in OPeNDAP • Work is underway to apply to HDF5
OPeNDAP • Open-Source Project for a Network Data Access Protocol • Client-Server framework • Standard web (GET) request syntax • Remote fine-grained access to data files • Presents a standard data model and “format” to clients • Supports multiple formats on the back end • HDF, netCDF, ASCII, GRIB, binary • Multiple server implementations • Hyrax, THREDDS, ERDDAP, GDS, Dapper, PyDAP, TSDS... • Client support in many tools • IDV, McIDAS-V, GrADS, Matlab, IDL, Ferret, Panoply
Web Coverage Service • Client-Server framework • Open Geospatial Consortium protocol • Standard web (GET) request syntax • Multiple response formats, including GeoTIFF, netCDF/CF-1 and HDF-EOS • Includes spatial subsetting • BUT: • Client support is still nascent outside GIS community • Some datatypes are difficult or impossible to fit into WCS (e.g., limb-scanning profiles)
Semantic Web • Enables machine recognition of: • names • relationships • Effective for: • Metadata • Small ASCII data • Use of semantic web to make Earth Science data interoperable is still in its experimental phase
Data Tools for Use with Interoperable Data • Panoply • http://www.giss.nasa.gov/tools/panoply/ • IDV • http://www.unidata.ucar.edu/software/idv/ • McIDAS-V • http://www.ssec.wisc.edu/mcidas/software/v/ • GrADS • http://www.iges.org/grads/ • Ferret • http://ferret.wrc.noaa.gov/Ferret/
Summary • Data users benefit from data interoperability • More tools available to handle more datasets • Consider format, structure, grids, metadata and naming • If interoperability cannot be built in at data production, some tools (OPeNDAP, WCS, semantic web) can compensate... • ...IF the metadata and information content of the data are sufficient
References • Practical Data Interoperability for Earth Scientists http://www.esdswg.org/techinfusion/downloads/pdies/view • Recommendations for Data Level Interoperability http://tiwg.wik.is/Interoperability/Interoperability_Recommendations • HDF http://www.hdfgroup.org/ • HDF-EOS http://hdfeos.org/ • netCDF http://www.unidata.ucar.edu/software/netcdf/ • OPeNDAP: http://www.opendap.org • CF-1 http://cf-pcmdi.llnl.gov/ • Web Coverage Service http://en.wikipedia.org/wiki/Web_Coverage_Service
OPeNDAP URL examples • Get metadata in XML http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf.ddx • Get data slice in ASCII: http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf.ascii?H2OMMRStd[0:1:44][0:1:29][4:1:5] • Data access URL for clients (IDV, Panoply): http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf