200 likes | 386 Views
HDF-EOS 2/5 to netCDF Converter. Bob Bane, Richard Ullman, Jingli Yang Data Usability Group NASA/Goddard Space Flight Center. Introduction. Status report Properties of netCDF and HDF-EOS Conversion strategy. Status Report. Last year - hdfeos52netcdf HDF-EOS 5 -> netCDF COARDS compatible
E N D
HDF-EOS 2/5 to netCDFConverter Bob Bane, Richard Ullman, Jingli Yang Data Usability Group NASA/Goddard Space Flight Center
Introduction • Status report • Properties of netCDF and HDF-EOS • Conversion strategy
Status Report • Last year - hdfeos52netcdf • HDF-EOS 5 -> netCDF • COARDS compatible • Current • Uses he25 interoperability library, so does both HDF-EOS 2 and 5 • CF compatible
Data Formats and Conventions • Generic data containers • HDF, netCDF • Conventions for domain-specific metadata • HDF-EOS, COARDS/CF • HDF -> HDF-EOS • netCDF -> COARDS/CF
netCDF • netCDF files contain: • Variables • multi-dimensional arrays of basic data types (character/integer/float) • Dimensions • named sizes for dimensions of variables • Attributes • named one-dimensional arrays • properties of variables
netCDF Conventions • Metadata is stored in attributes • Conventions for names: “units” • Coordinate vector • Variable with the same name as a dimension • Value is a vector of same size as the dimension • Is a mapping between (0,1,2…) dimension indexing and physical quantities for dimension
COARDS Conventions • Cooperative Ocean/Atmospheric Research Data Service • Conventions for use of netCDF • Order of dimensions for variables • Names of attributes (“Units”, “_FillValue”) • Coordinate variables • http://ferret.wrc.noaa.gov/noaa_coop/coop_cdf_profile.html
CF Conventions • Climate and Forecast • Follow-on to COARDS • Tighter • Many attributes optional in COARDS are required in CF • More capable • Multi-dimensional geolocation support • http://www.cgd.ucar.edu/cms/eaton/cf-metadata/
HDF • Hierarchical Data Format • HDF files contain: • Datasets • multi-dimensional arrays of basic data types • Dimensions • Named sizes of dataset dimensions • Groups • Named groups of datasets (and groups) • Attributes • Named properties of datasets and groups, similar to netCDF
HDF-EOS • Conventions and API for HDF • HDF-EOS files contain: • Fields (datasets) • Points • Individually geolocated measurements • Swaths • Groups of data and geolocation fields, and mappings between them • Grids • Groups of data fields with rectilinear geolocation
HDF-EOS (cont) • HDF-EOS 2 over HDF4 • HDF-EOS 5 over HDF5 • HDF5 very different from HDF4 • HDF-EOS 2/5 near identical API • Our he25 library allows uniform access to HDF-EOS 2/5, so converter works for both • Looks/works like HDF-EOS 5 • On HDF-EOS 4 files, translates in/out
Observations • HDF-EOS is “bigger” than netCDF • Additional structured metadata (ODL) • HDF-EOS API calls for geolocation • netCDF file ~= HDF-EOS Swath/Grid • Both are groups of related datasets
Conversion Strategies • One HDF-EOS file -> one netCDF file • Alternative is one Swath/Grid -> one file • COARDS/CF - if original HDF-EOS followed conventions, converted netCDF will also • Most HDF-EOS data producers are aware of COARDS/CF • Skip HDF-EOS Point datasets • Reconsider this if real world Point data emerges
Conversion Strategies (cont) • Convert data to enable future processing • Geolocation data, attributes (units) • Other metadata less important • Could transfer ODL metadata as a string, but why? • Can always go back to the original file and use good HDF-EOS tools
Conversion in General HDF-EOS Swath s1 Dimensions(lat,lon,time) Datafield f1(lat,lon,time) Geofield f2(lat,lon,time) Swath s2 Dimensions(lat,lon,time) Datafield f3(lat,lon,time) Geofield f4(lat,lon.time) netCDF Dimensions(lat,lon,time,s2_time) Variable s1_f1(lat,lon,time) Variable s1_GEO_f2(lat,lon,time) Variable s2_f3(lat,lon,s2_time) Variable s2_GEO_f4(lat,lon,s2_time) • Flatten HDF-EOS hierarchy • Encode names, types in variable names
Swaths netCDF Dimensions(lat,glat,lon,glon,time,s2_time) Attributes: s2_DimensionMap: “lat/glat, lon/glon” s2_DMOffsets: (0,0) s2_DMIncrements: (1,1); Variable s2_f3(lat,lon,s2_time) Attributes: coordinates: “s2_GEO_f3” Variable s2_GEO_f4(glat,glon,s2_time) HDF-EOS Swath s2 Dimensions(lat, glat ,lon, glon, time) DimensionMap(lat, glat, 0, 1) DimensionMap(lon, glon, 0, 1) Datafield f3(lat,lon,time) Geofield f4(glat,glon.time) • Swath name, geofield type encoded in variable names • Record dimension map in global attributes
Grids netCDF HDF-EOS Dimensions(lat,lon,time) Variable lat(lat) = (lowright,…upright) Variable lon(lon) = (lowleft, … upleft) Variable g1_f1(lat,lon,time) Grid g1 Dimensions(lat,lon,time) Corners(upleft, upright, lowleft, lowright) Datafield f1(lat,lon,time) • Grid geolocation becomes coordinate variables
Converter • C command-line application • hdfeos2netcdf HDF_file netCDF_file • Should be portable to all HDF-EOS5/netCDF platforms • Naturally uses all libraries
Where is the Software? • http://hdfeos.gsfc.nasa.gov • ‘Tools’ category • System ‘hdfeos2netcdf’
Big Picture HDF-EOS File Attributes fa1: “fa value” Swath s1 Attributes: sa1: “sa value” Dimensions(lat,lon,time) Datafield f1(lat,lon,time) Geofield f2(lat,lon,time) Swath s2 Dimensions(lat,lon,time) Datafield f3(lat,lon,time) Geofield f4(lat,lon.time) netCDF File Attributes: fa1: “fa value” s1_sa1: “sa value” Dimensions(lat,lon,time,s2_time) Variable s1_f1(lat,lon,time) Variable s1_GEO_f2(lat,lon,time) Variable s2_f3(lat,lon,s2_time) Variable s2_GEO_f4(lat,lon,s2_time)