310 likes | 415 Views
NetCDF-4 Interoperability with HDF4 and HDF5 Ed Hartnett Unidata, 8/4/9. Purpose of Interoperability Features: World Conquest. The purpose of the interoperability features is to allow users to use netCDF programs on non-netCDF data archives.
E N D
NetCDF-4 Interoperability with HDF4 and HDF5Ed Hartnett Unidata, 8/4/9
Purpose of Interoperability Features: World Conquest • The purpose of the interoperability features is to allow users to use netCDF programs on non-netCDF data archives. • NetCDF-Java can read many data formats; the idea is to bring some of this functionality to the C/Fortran/C++ libraries.
Warning and Request • HDF4 and HDF5 interoperability features are still being tested. They are not ready for operational use yet. • The interoperability features are available in the netCDF daily snapshot release. • Please use them and send feedback to: support-netcdf@unidata.ucar.edu
Overview • HDF4 Interoperability • What is HDF4 and why bother with it? • Reading HDF4 files with netCDF. • Limitations and request for help. • HDF5 Interoperability • What is HDF5 and why bother with it? • Reading HDF5 files with netCDF. • Limitations.
What is HDF4? • The original HDF format, superseded by HDF5. • HDF4 has built-in 32-bit limits that make it unattractive for new data sets. It is still actively supported by The HDF Group, but no new features are added. • Get more info about HDF4 at: http://www.hdfgroup.org/products/hdf4
Why Read HDF4? • Some important data sets are distributed in HDF4, for example the Aqua/Terra satellite data.
HDF4 Background • HDF4 has several different APIs. The one of greatest interest to netCDF users is the SD (Scientific Data) API. • The SD API is (intentionally) very similar to the netCDF classic data model.
Confusing: HDF4 Includes NetCDF v2 API • A netCDF V2 API is provided with HDF4 which writes SD data files. • This must be turned off at HDF4 install-time if netCDF and HDF4 are to be linked in the same application. • There is no easy way to use both HDF4 with netCDF API and netCDF with HDF4 read capability in the same program.
Reading HDF4 SD Files • Starting with version 4.1, netCDF will be able to read HDF4 files created with the “Scientific Dataset” (SD) API. • This is read-only: NetCDF can't write HDF4! • The intention is to make netCDF software work automatically with important HDF4 scientific data collections.
Building NetCDF to Read HDF4 • This is only available for those who also build netCDF with HDF5. • HDF4, HDF5, zlib, and other compression libraries must exist before netCDF is built. • Build like this: ./configure –with-hdf5=/home/ed –enable-hdf4
Compiling with HDF4 • Include netcdf header file as usual. • Include locations of netCDF, HDF5, and HDF4 include directories: • -I/loc/of/netcdf/include -I/loc/of/hdf5/include -I/loc/of/hdf4/include
Linking with HDF4 • The HDF4 and HDF5 libraries (and associated libraries) are needed and must be linked into all netCDF applications. The locations of the lib directories must also be provided: • -L/loc/of/netcdf/lib -L/loc/of/hdf5/lib -L/loc/of/hdf4/lib • -lmfhdf -ldf -ljpeg -lhdf5_hl -lhdf5 -lz
Use nc-config to Help with Compile Flags • The nc-config utility is provided to help with compiler flags: $ ./nc-config --cflags -I/usr/local/include $ ./nc-config --libs -L/usr/local/lib -lnetcdf -L/machine/local/lib -lhdf5_hl -lhdf5 -lz -lm -lhdf4 $ ./nc-config --flibs -M/usr/local/lib -lnetcdf -L/machine/local/lib -lhdf5_hl -lhdf5 -lz -lm -lhdf4
Implementation Notes • You don't need to identify the file as HDF4 when opening it with netCDF, but you do have to open it read-only. • The HDF4 SD API provides a named, shared dimension, which fits easily into the netCDF model. • The HDF4 SD API uses other HDF4 APIs, (like vgroups) to store metadata. This can be confusing when using the HDF4 data dumping tool hdp.
C Code to Read HDF4 SD File /* Create a file with one SDS, containing our phony data. */ sd_id = SDstart(FILE_NAME, DFACC_CREATE); sds_id = SDcreate(sd_id, PRES_NAME, DFNT_INT32, DIMS_2, dim_size); SDwritedata(sds_id, start, NULL, edge, (void *)data_out); if (SDendaccess(sds_id)) ERR; if (SDend(sd_id)) ERR; /* Now open with netCDF and check the contents. */ if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR; if (nc_inq(ncid, &ndims_in, &nvars_in, &natts_in, &unlimdim_in)) ERR; ...
ncdump and HDF4 SD Files • With HDF4 reading enabled, ncdump works on HDF4 files. • Sample MODIS file: ../ncdump/ncdump -h MOD29.A2000055.0005.005.2006267200024.hdf netcdf MOD29.A2000055.0005.005.2006267200024 { dimensions: Coarse_swath_lines_5km\:MOD_Swath_Sea_Ice = 406 ; Coarse_swath_pixels_5km\:MOD_Swath_Sea_Ice = 271 ; Along_swath_lines_1km\:MOD_Swath_Sea_Ice = 2030 ; Cross_swath_pixels_1km\:MOD_Swath_Sea_Ice = 1354 ; variables: float Latitude(Coarse_swath_lines_5km\:MOD_Swath_Sea_Ice, Coarse_swath_pixels_5km\:MOD_Swath_Sea_Ice) ; Latitude:long_name = "Coarse 5 km resolution latitude" ; Latitude:units = "degrees" ; ...
HDF-EOS Not Understood • Many HDF4 data sets of interest follow the HDF-EOS metadata standard. • Stored as a long text string in global attributes, the HDF-EOS metadata looks messy. // global attributes: :HDFEOSVersion = "HDFEOS_V2.9" ; :StructMetadata.0 = "GROUP=SwathStructure\n\tGROUP=SWATH_1\n\t\tSwathName=\"MOD_Swath_Sea_Ice\"\n\t\tGROUP=Dimension\n\t\t\\tOBJECT=Dimension_1\n\t\t\t\tDimensionName=\"Coarse_swath_lines_5km\"\n\t\t\t\tSize=406\n\t\t\tEND_OBJECT=Dimension_1\n\t\t\tOBJECT=Dimension_2\n\t\t\t\tDimensionName=\"Coarse_swath_pixels_5km\"\n\t\t\t\tSize=271\n\t\t\t...
HDF4 Read Testing • Tested in libsrc4/tst_interops2.c, which creates some HDF4 files with the SD API, and then reads them with netCDF. • If –enable-hdf4-file-tests is used with netCDF configure, some Aura/Terra satellite data files are downloaded from Unidata FTP site, then read by libsrc4/tst_interops3.c.
HDF4 Interoperability Limitations • File must be opened read-only. • Only HDF4 SD data files are currently understood. • This feature cannot be used at the same time as HDF4's netCDF v2 API, because HDF4 steals the netCDF v2 API function names. So you must use –disable-netcdf when building HDF4. (It might also work to –disable-v2 for the netCDF build.)
Future HDF4 Work • More tests. • Support for HDF4 image types. • Test support for compressed data. • Add some support for HDF-EOS metadata in the libcf library, using the HDF-EOS toolkit.
Request for User Help – What Data to Read? • Please send me pointers to scientifically important HDF4 datasets. • The intention is not to read any HDF4 data, just those of wide scientific interest.
Contribute Code to Write HDF4? • Some programmers use the netCDF v2 API to write HDF4 files. • It would not be too hard to write the glue code to allow the v2 API -> HDF4 output from the netCDF library. • The next step would be to allow netCDF v3/v4 API code to write HDF4 files. • Writing HDF4 seems like a low priority to our users. I would be happy to help any user who would like to undertake this task.
What is HDF5? • HDF5 is an extremely general data storage format with many advanced features: on-the-fly compression, parallel I/O, a rich data model, etc. • Starting with netCDF-4.0, netCDF has been able to use HDF5 as a storage layer, exposing some of the advanced features. • But, until version 4.1, only HDF5 files created with netCDF-4 could be understood by netCDF-4.
Why Read HDF5 Files? • Many important datasets are available in HDF5 format, including data from the Aqua satellite.
Rules for Reading HDF5 Files • NetCDF-4.1 provides read-only access to existing HDF5 files if they do not violate some rules: • Must not use circular group structure. • HDF5 reference type (and some other obscure types) are not understood. • Write access still only possible with netCDF-4/HDF5 files.
HDF5 Version 1.8 Background • In version 1.8, HDF5 introduced “dimension scales” as a way of supporting shared dimensions. • Also in version 1.8, HDF5 introduced ordering by creation, rather than ordering alphabetically. • But most data providers don't use these features, but instead use HDF5 1.6.
NetCDF-4.1 Relaxes Some Restrictions for HDF5 Files • Before netCDF-4.1, HDF5 files had to use creation ordering and dimension scales in order to be understood by netCDF-4. • Starting with netCDF-4.1, read-only access is possible to HDF5 files with alphabetical ordering and no dimension scales. (Created by HDF5 1.6 perhaps.) • HDF5 may have dimension scales for all dimensions, or for no dimensions (not for just some of them).
HDF5 C Code to Write HDF5 File /* Create file. */ if ((fileid = H5Fcreate(FILE_NAME, H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT)) < 0) ERR; /* Create the space for the dataset. */ dims[0] = LAT_LEN; dims[1] = LON_LEN; if ((pres_spaceid = H5Screate_simple(DIMS_2, dims, dims)) < 0) ERR; /* Create a variable. It will not have dimension scales. */ if ((pres_datasetid = H5Dcreate(fileid, PRES_NAME, H5T_NATIVE_FLOAT, pres_spaceid, H5P_DEFAULT)) < 0) ERR; if (H5Dclose(pres_datasetid) < 0 || H5Sclose(pres_spaceid) < 0 || H5Fclose(fileid) < 0) ERR;
NetCDF C Code to Read HDF5 File /* Read the data with netCDF. */ if (nc_open(FILE_NAME, NC_NOWRITE, &ncid)) ERR; if (nc_inq(ncid, &ndims_in, &nvars_in, &natts_in, &unlimdim_in)) ERR; if (ndims_in != 2 || nvars_in != 1 || natts_in != 0 || unlimdim_in != -1) ERR; if (nc_close(ncid)) ERR;
Future Plans for HDF5 Interoperability • More testing. • Proper handling of reference types. This will require (probably) an extension of the netCDF APIs. • Better handling of strange group structures, if this proves necessary to read important data.
Summary • With the 4.1 release, the netCDF C/Fortran/C++ libraries allow read-only access to some existing HDF4 and HDF5 data archives. • The intention is not to develop a completely general translation, but instead to focus on datasets of significance to the Earth science community. • Write capability is quite possible, but we don't plan on providing it because the demand for this is low.