250 likes | 456 Views
Deutscher Wetterdienst. Data Formats and Tools. R.W. Mueller, R.Hollmann, C.Träger-Chatterjee. Content. Overview HDF5 netCDF Binary ASCII Conclusion. Overview. New data formats have been developed
E N D
Deutscher Wetterdienst Data Formats and Tools R.W. Mueller, R.Hollmann, C.Träger-Chatterjee
Content • Overview • HDF5 • netCDF • Binary • ASCII • Conclusion
Overview • New data formats have been developed • better handling of manifold information provided by satellite data, reanalysis or model data • optimise computing performance (IO-process) • reduce disk space needed • Requirements to the data format: storage of … • … the data itself with high resolution in space-time • Different data layers possible • … the meta-information, e.g. • Calibration coefficients • Geolocation and projection • Statistical error information • Gain and offset • Whatever the operator would like to add as meta information
Overview – the favourites • For satellite data two formats are important. Different but related – both with associated data model • HDF: Hierarchical data format • netCDF: network Common Data Form • Further formats for satellite data: • HRIT raw data format not discussed, focus on products • Specific Binary Format always possible, no common data model • ASCII no data model, quite seldom
HDF – Hierarchical Data Format • HDF5 - general purpose library and file format for storing scientific data • Create and store almost any kind of scientific data structure • e.g. images, arrays of vectors, structured and unstructured grids, … • one can also mix and match different data formats in HDF5 files • Efficient storage and I/O • created to address the data management needs of high performance, data intensive computing environments • As a result, library and format emphasize storage and I/O efficiency (especially on parallel machines), including file compression
HDF – Hierarchical Data Format • The most recent version is HDF5, but a lot of data are still in HDF4 format. • Both are machine independent (no big / little endian problem) • Information, tools, examples and the HDF software (library) available at http:/hdf.ncsa.uiuc.edu/HDF5 and http://hdf.ncsa.uiuc.edu/hdf4.html • Widely used, e.g.: • MODIS (HDF4) • Eumetsat, e.g. all SAFs (HDF5)
HDF command line tools • No downward compatibility • many hdf5 command line tools and interfaces (e.g. implemented in f90,c programs) can not be used for HDF4 files. • h5dump - dumps displays the input of the hdf file in ASCII • h5ls - lists the contents of a file, enables fast checks if the needed data is in there • h5import - imports ASCII to hdf5 • configuration file is needed, hence some basic knowledge about HDF data model and structure required
HDF5 as ASCII using h5dump Common data model but in detail it can look quite different, comments in red !! HDF5 "TRS_SR_20040708_1200_V000.hdf" { filename GROUP "/" { definition of a group GROUP "Data" { DATASET "TRS" { definition of the dataset DATATYPE H5T_STD_I16BE def. of the data type DATASPACE SIMPLE { ( 3712, 3712 ) / ( 3712, 3712 ) } the dimension DATA { the data (0,0): -32767, -32767, -32767, -32767, -32767, -32767, -32767, (0,7): -32767, -32767, -32767, -32767, -32767, -32767, -32767,…. (883,707): 495, 455, 436, 436, 378, 323, 378, 416, 342, 277, 296, ……} ATTRIBUTE "Gain" { ….. definition of attributes continued on the next slide
HDF5 as ASCII using h5dump ATTRIBUTE "Gain" { …. Gain and offset DATATYPE H5T_IEEE_F32BE used to reduce needed DATASPACE SCALAR disk space (possible to DATA { save data as integer) (0): 0.25 } } ATTRIBUTE "Offset" {….. DATATYPE and DATASPACE…. DATA { (0): 0 } } ATTRIBUTE "nodatavalue" {…. DATATYPE and DATASPACE…… DATA { Attribute is also used (0): -32767 for unit, title,… }
HDF5 as ASCII using h5dump GROUP "Geolocation" { definition of a new group, and the DATASET "projection" { dataset needed to define the projection DATATYPE H5T_COMPOUND { H5T_STRING { STRSIZE 128; STRPAD H5T_STR_NULLTERM; CSET H5T_CSET_ASCII; CTYPE H5T_C_S1; } "reference ellipsoid"; H5T_ARRAY { [10] H5T_IEEE_F32LE } "parameter"; } DATASPACE SIMPLE { ( 1 ) / ( 1 ) } DATA { (0): { "geostationary view", "WGS-84", [ 1856, 1856, 667.204, 667.204, -1, -1, -1, -1, -1, -1 ] } }} DATASET "region" { a group usually consists of different datasets }}}
HDF GUI Tools -HDFView- • The complex data model might act as a deterrend for beginners • Graphical User Interface HDFView overcomes this handicap. It is a tool for browsing and editing HDF4 and HDF5 files using a GUI • Relatively easy to install and available for many platforms, e.g Windows, Solaris, AIX, Linux • Everything can be managed with buttons and mouse clicks • Data can be saved as ASCII table • Images can be generated and saved. • http://www.hdfgroup.org/hdf-java-html/hdfview/index.html
HDF Tools – CMSAF GUI • Software available for CM-SAF customers via www.cmsaf.eu • Features: • visualisation of CM-SAF products (in HDF5 format) • simple data analysis • Export (ASCII, lat/lon grid) • Uses free IDL Virtual Machine
HDF Tools CM-SAF GUI More on this topic in the exercise session
netCDF • Information, tools, examples and the netCDF library are available at:http://www.unidata.ucar.edu/software/netcdf/ • Widely used, e.g.: • Reanalysis data of National Centers for Environmental prediction (NCEP) and European Centre for Medium Weather forecast (ERA40) • HOAPS, Hamburg Ocean Atmosphere Parameters and Fluxes from Satellite Data • CM-SAF selected monthly means
netCDF command line tools • ncdump - file shows the input of the netCDF file • ncgen - converts ascii to netcdf and vica versa • sounds easy but a configuration file (CDL file) is needed • some basic knowledge about the net CDF data model and structure • however, easier to handle for beginners as HDF5 • example of ASCII CDL configuration file:
netCDF as ASCII netcdf SRBmm200604 { dimensions: lat = 501 ; lon = 741 ; time = UNLIMITED ; // (0 currently) variables: float lat(lat) ; lat:long_name = "latitude" ; lat:units = "degree" ; float lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees" ; float Z(lat, lon) ; Z:units = "Watt" ; Z:valid_range = 0., 1400. ; data: lat = 35, 35.05, 35.1, 35.15, 35.2, 35.25, 35.3, 35.35, 35.4, … ; lon= 44,45,….; Z=300,340,…; }
netCDF Tools – Integrated Data Viewer (IDV) • Free GIS tool • Display data / generate maps • Imports netCDF
netCDF GUI Tool CDAT • Open source integrated environment for data analysis and visualisation. • Mainly netCDF, but can also deal with GRIB and HDF. • Import of binary and ASCII data possible. • Available for different platforms but not for Windows!
Binary Data • Usually used instead of ASCII • to reduce disk space and to increase the computing performance. • Machine readable format not readable by humans • Usually files with / without header and data as defined data type e.g. float (2.44) or integer (4) • Reading and writing with e.g. C, C++, Fortran • Formats are not common indivdual read / write routines needed • some tools can read and visualise binary data. e.g. • CDAT, GRADS, idl • data is not self-explanatory The length of the header and the data type has to be known
Binary Data and ASCIII • Examples for binary data: • International Satellite Cloud Climatology Project (ISCCP). http://isccp.giss.nasa.gov • AVHRR based USGS land use maps.
ASCII • readable with a text editor • a quite unusual format • sometimes provided by the data centre for subsets of the data on request, e.g. CM-SAF • 2006 9 27 6 0 71.93 • 2006 9 27 6 15 109.75 • 2006 9 27 6 30 73.28 • 2006 9 27 6 45 96.04 • 2006 9 27 7 0 84.16 • 2006 9 27 7 15 91.51 • 2006 9 27 7 30 110.54 • 2006 9 27 7 45 122.44 • 2006 9 27 8 0 166.66
Conclusion • HDF5 • Header, describing the data. Data in binary format • HDF-View, CM-SAF GUI • Official format of CM-SAF Daten • netCDF • Header, describing the data. Less cryptic than HDF5. Data in binary format • Diverse GIS, e.g. ArcView, Integrated Data Viewer, CDAT • On demand some CM-SAF data can be provided in netCDF. • Binary • Instead of ASCII, to reduce disk space • ASCII • Readable with a text editor.