300 likes | 513 Views
HDF5 library and tools. Kent Yang The HDF Group ESDSWG SPG Oct. 21, 2010. Why HDF5?. HDF4 shortcomings Limits on object and file size (<2GB) Limited number of objects (<20K) I/O performance Code complexity . HDF5 . Recognized by communities HDF-EOS5 and netCDF-4 built on HDF5
E N D
HDF5 library and tools Kent Yang The HDF Group ESDSWG SPG Oct. 21, 2010 9th ESDSWG meeting
Why HDF5? • HDF4 shortcomings • Limits on object and file size (<2GB) • Limited number of objects (<20K) • I/O performance • Code complexity 9th ESDSWG meeting
HDF5 • Recognized by communities • HDF-EOS5 and netCDF-4 built on HDF5 • Widely used by many organizations • 2002 R&D 100 Award • Not compatible with HDF4 9th ESDSWG meeting
Some HDF5 Features Data elements stored physically adjacent to each other Contiguous (default) Better access time for subsets; extensible Chunked Improves storage efficiency, transmission speed Chunked & Compressed 9th ESDSWG meeting
Accessing data in contiguous dataset M rows M seeks are needed to find the starting location of the element. Data is read/written using M disk accesses. Performance may be very bad. 9th ESDSWG meeting
Motivation for chunking storage M rows Two seeks are needed to find two chunks. Data is read/written using two disk accesses. For this pattern chunking helps with I/O performance. 9th ESDSWG meeting
Chunking storage • Chunk cache can be used to speed up the performance • Chunk size cannot be changed after the dataset is created • Do not make chunk sizes too small (e.g. 1x1)! • Metadata overhead for each chunk • Each chunk is read individually • Many small reads inefficient 9th ESDSWG meeting
HDF5 compression filters • GZIP (deflate) • SZIP – Rice algorithm developed at JPL • Good for floating-point numbers • Quick decoding time • Shuffle • Use with GZIP or SZIP to gain better compression ratio • Scale + offset • performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits 9th ESDSWG meeting
HDF5 compression filters • Shuffle • Use with GZIP and SZIP to gain better compression ratio • How Shuffling works? • Four 32-bit integers: 1, 23, 43, 56 • In hexadecimal form: 0x01, 0x17, 0x2B, 0x38 Easy to compress 9th ESDSWG meeting
High Level APIs • Included along with the HDF5 library • Simplify steps for creating, writing, and reading objects. • Do not entirely ‘wrap’ HDF5 library 9th ESDSWG meeting
HDF5 Platforms Supported • Systems • AIX • Various Linux • Solaris • Windows • Mac OS • FreeBSD • CrayXT3 • Open VMS • Compilers • IBM C and Fortran • GNU C, gfortran, g95 • Intel C and Fortran • PGI C and Fortran • Sun C and Fortran • Windows Visual Studio and intelfortran 9th ESDSWG meeting
HDFView • A Java tool can view and edit HDF5 file contents URL: http://www.hdfgroup.org/hdf-java-html/hdfview/ 9th ESDSWG meeting
HDF5 Command-line tools • h5ls • h5dump • h5repack • h5diff What these tools can do for you 9th ESDSWG meeting
h5dump • Structure • Dataset • Binary • XML Examine file contents and dump file contents in an ASCII or binary file 9th ESDSWG meeting
h5dump: Object Headers > h5dump -HSDS.h5 HDF5 "SDS.h5" { GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } } } DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) } } } } 9th ESDSWG meeting
h5ls • Dump file contents but show the contents like Unix ls command. • h5ls -r SDS2.h5 • /Floats Group • /Floats/DoubleArray Dataset {10, 5} • /Floats/FloatArray Dataset {4, 3} • /Floats/subs Group • /IntArray Dataset {5, 6} 9th ESDSWG meeting
h5repack • Remove inaccessible objects / junk spaces • Change storage layout • Apply compression filter Copies a file to a new file with different storage layouts and compression filters 9th ESDSWG meeting
h5repack • Remove inaccessible objects • h5repack tools_junk.h5 tmp.h5 • Change layout • h5repack tools_bad_layout.h5 tmp.h5 • h5repack -l CHUNK=16x16 tools_bad_layout.h5 tmp.h5 • Change compression • h5repack -f GZIP=6 tmp.h5 tmp2.h5 9th ESDSWG meeting
h5diff • Like Unix diff • Can apply to • Individual dataset • Whole file Show differences between two files or two objects 9th ESDSWG meeting
Others • h5copy - Copies an object within a file or across files • h5import - Imports binary/ASCII data into an HDF5 file • h5check – Verifies whether an HDF5 file is compliant with the HDF5 File format specification • …… 9th ESDSWG meeting
HDF5 Compile Scripts • h5cc – HDF5 C compiler command • h5fc – HDF5 F90 compiler command • h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90 9th ESDSWG meeting 21
Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib 9th ESDSWG meeting 22
Tools under the development • h5watch • Allows the user to monitor the growth of a dataset • Prints out the new elements appended to a dataset • h5edit • New tool for editing an HDF5 file • Initial implementation will include creating and deleting attributes 9th ESDSWG meeting
HelpDesk Send emails to help@hdfgroup.org Requests from NASA: within 2 days 9th ESDSWG meeting
Update HDF-EOS website • Software • Evaluating many packages • Examples • Adding examples for many • NASA products • Forums • Moderating the forum http://hdfeos.org 9th ESDSWG meeting
NCL/IDL/MATLAB examples • Many examples from different NASA data centers’ • Example codes and plots • URLs: http://hdfeos.org/zoo 9th ESDSWG meeting
An example to access AIRS Swath … data=eos_file->radiances_L2_Standard_cloud_cleared_radiance_product(:,:,0) ; read specific subset of data field ; In order to read the radiances data field from the HDF-EOS2 file, the group ; under which the data field is placed must be appended to the data field in NCL. For more information, ; visit section 4.3.2 of http://hdfeos.org/software/ncl.php. data@lat2d=eos_file->Latitude_L2_Standard_cloud_cleared_radiance_product ; associate longitude and latitude data@lon2d=eos_file->Longitude_L2_Standard_cloud_cleared_radiance_product data@_FillValue=-9999 ; … res@gsnCenterString="radiances at Channel=567" plot(2)=gsn_csm_contour_map_polar(xwks,data_2,res) res@gsnCenterString="radiances at Channel=1339" plot(3)=gsn_csm_contour_map_polar(xwks,data_3,res) delete(plot) ; cleaning up resources used delete(data) NCL 9th ESDSWG meeting
HDF5 and CF • No restrictions for any CF attributes to be created/added inside an HDF5 file • We will provide example codes on how one can add CF attributes to an HDF5 file 9th ESDSWG meeting
More information • About HDF5 : http://hdfgroup.org/HDF5 • More HDF5 tutorials: http://hdfeos.org/workshops/ws14/agenda.php 9th ESDSWG meeting
Thank you ! 9th ESDSWG meeting