690 likes | 703 Views
Introduction to HDF5. Barbara Jones The HDF Group The 13 th HDF & HDF-EOS Workshop November 3-5, 2009. Before We Begin …. HDF-EOS Home Page: http://hdfeos.org/ Workshop Info: http://hdfeos.org/workshops/ws13/workshop_thirteen.php The HDF Group Page: http://hdfgroup.org/
E N D
Introduction to HDF5 Barbara Jones The HDF Group The 13th HDF & HDF-EOS Workshop November 3-5, 2009 HDF/HDF-EOS Workshop XIII
Before We Begin … HDF-EOS Home Page: http://hdfeos.org/ Workshop Info: http://hdfeos.org/workshops/ws13/workshop_thirteen.php The HDF Group Page: http://hdfgroup.org/ HDF5 Home Page: http://hdfgroup.org/HDF5/ HDF Helpdesk: help@hdfgroup.org HDF Mailing Lists: http://hdfgroup.org/services/support.html HDF/HDF-EOS Workshop XIII
HDF = Hierarchical Data Format HDF5 is the second HDF format • Development started in 1996 • First release was in 1998 HDF4 is the first HDF format • Originally called HDF • Development started in 1987 • Still supported by The HDF Group HDF/HDF-EOS Workshop XIII
HDF5 is like… 5 HDF/HDF-EOS Workshop XIII
HDF5 is designed … • for high volume and/or complex data • for every size and type of system (portable) • for flexible, efficient storage and I/O • to enable applications to evolve in their use of HDF5 and to accommodate new models • to support long-term data preservation HDF/HDF-EOS Workshop XIII
HDF5 Technology HDF5 is a data model, library and file format for managing data. HDF/HDF-EOS Workshop XIII
HDF5 Technology • HDF5 (Abstract) Data Model • Defines the “building blocks” for data organization and specification • Files, Groups, Datasets, Attributes, Datatypes, Dataspaces, … • HDF5 Library (C, Fortran 90, C++ APIs) • Also Java Language Interface and High Level Libraries • HDF5 Binary File Format • Bit-level organization of HDF5 file • Defined by HDF5 File Format Specification • Tools For Accessing Data in HDF5 Format • h5dump, h5repack, HDFView, … HDF/HDF-EOS Workshop XIII
HDF5 Abstract Data Modela.k.a. HDF5 Logical Data Modela.k.a. HDF5 Data Model HDF/HDF-EOS Workshop XIII
HDF5 File lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 An HDF5 file is a container that holds data objects. Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configuration: Standard 3 HDF/HDF-EOS Workshop XIII
HDF5 Groups and Links HDF5 groups and links organize data objects. / SimOut Viz Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configuration: Standard 3 lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 HDF/HDF-EOS Workshop XIII
HDF5 Objects The two primary HDF5 objects are: • HDF5 Group: A grouping structure containing zero or more HDF5 objects • HDF5 Dataset: Raw data elements, together with information that describes them (There are other HDF5 objects that help support Groups and Datasets.) HDF/HDF-EOS Workshop XIII
HDF5 Groups • Used to organize collections • Every file starts with a root group • Similar to UNIX directories • Path to object defines it • Objects can be shared: • /A/k and/B/l are the same “/” C A B temp l k temp = Group = Dataset HDF/HDF-EOS Workshop XIII
HDF5 Datasets HDF5 Datasets organize and contain your “raw data values”. They consist of: • Your raw data • Metadata describing the data: - The information to interpret the data (Datatype) - The information to describe the logical layout of the data elements (Dataspace) - Characteristics of the data (Properties) - Additional optional information that describes the data (Attributes) HDF/HDF-EOS Workshop XIII
HDF5 Dataset Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer (optional) Attributes Properties Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 HDF/HDF-EOS Workshop XIII
HDF5 Dataspaces An HDF5 Dataspace describes the logical layout for the data elements: • Array • multiple elements in dataset organized in a multi-dimensional (rectangular) array • maximum number of elements in each dimension may be fixed or unlimited • NULL • no elements in dataset • Scalar • single element in dataset HDF/HDF-EOS Workshop XIII
HDF5 Dataspaces Two roles: Dataspace contains spatial information (logical layout) about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimension = 10 HDF/HDF-EOS Workshop XIII
HDF5 Datatypes The HDF5 datatype describes how to interpret individual data elements. HDF5 datatypes include: • integer, float, unsigned, bitfield, … • user-definable (e.g., 13-bit integer) • variable length types (e.g., strings) • references to objects/dataset regions • enumerations - names mapped to integers • opaque • compound (similar to C structs) HDF/HDF-EOS Workshop XIII
HDF5 Dataset 3 5 V Datatype: 16-byte integer Dataspace: Rank = 2 Dimensions = 5 x 3 HDF/HDF-EOS Workshop XIII
HDF5 Properties • Properties (also known as Property Lists) are characteristics of HDF5 objects that can be modified • Default properties handle most needs • By changing properties one can take advantage of the more powerful features in HDF5 HDF/HDF-EOS Workshop XIII
Storage Properties Data elements stored physically adjacent to each other Contiguous (default) Better access time for subsets; extensible Chunked Improves storage efficiency, transmission speed Chunked & Compressed HDF/HDF-EOS Workshop XIII
HDF5 Attributes (optional) • An HDF5 attribute has a name and a value • Attributes typically contain user metadata • Attributes may be associated with - HDF5 groups - HDF5 datasets - HDF5 named datatypes • An attribute’s value is described by a datatype and a dataspace • Attributes are analogous to datasets except… - they are NOT extensible - they do NOT support compression or partial I/O HDF/HDF-EOS Workshop XIII
HDF5 Abstract Data Model Summary • The Objects in the Data Model are the “building blocks” for data organization and specification • Files, Groups, Links, Datasets, Datatypes, Dataspaces, Attributes, … • Projects using HDF5 “map” their data concepts to these HDF5 Objects HDF/HDF-EOS Workshop XIII
HDF5 Software HDF/HDF-EOS Workshop XIII
HDF5 Software Layers & Storage Tools … High Level APIs API h5dump tool h5repack tool HDFview tool Java Interface HDF5 Data Model ObjectsGroups, Datasets, Attributes, … HDF5Library Language Interfaces Tunable PropertiesChunk Size, I/O Driver, … C, Fortran, C++ Memory Mgmt Datatype Conversion Chunked Storage Version Compatibility and so on… Internals Filters Virtual File Layer Split Files Posix I/O Custom MPI I/O I/O Drivers Storage HDF5 File Format ? File on Parallel Filesystem Split Files File Other HDF/HDF-EOS Workshop XIII
HDF5 API and Applications aClimate Model MATLAB Applications EOS library Domain DataObjects … HDF5 Library Storage HDF/HDF-EOS Workshop XIII
HDF5 Home Page HDF5 home page: http://hdfgroup.org/HDF5/ • Two releases: HDF5 1.8 and HDF5 1.6 HDF5 source code: • Written in C, and includes optional C++, Fortran 90 APIs, and High Level APIs • Contains command-line utilities (h5dump, h5repack, h5diff, ..) and compile scripts HDF pre-built binaries: • When possible, include C, C++, F90, and High Level libraries. Check ./lib/libhdf5.settings file. • Built with and require the SZIP and ZLIB external libraries HDF/HDF-EOS Workshop XIII
Useful Tools For New Users h5dump: Tool to “dump” or display contents of HDF5 files h5cc, h5c++, h5fc: Scripts to compile applications HDFView: Java browser to view HDF4 and HDF5 files http://www.hdfgroup.org/hdf-java-html/hdfview/ HDF/HDF-EOS Workshop XIII
h5dump Utility h5dump [options] [file] -H, --header Display header only – no data -d <names> Display the specified dataset(s). -g <names> Display the specified group(s) and all members. -p Display properties. <names> is one or more appropriate object names. HDF/HDF-EOS Workshop XIII
“/” Example of h5dump Output HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } } } } ‘dset’ HDF/HDF-EOS Workshop XIII
HDF5 Compile Scripts • h5cc – HDF5 C compiler command • h5fc – HDF5 F90 compiler command • h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90 HDF/HDF-EOS Workshop XIII
Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c Will show the correct paths and libraries used by the installed HDF5 library. Will show the correct flags to specify when building an application with that HDF5 library. HDF/HDF-EOS Workshop XIII
Browsing HDF5 Files with HDFView HDF/HDF-EOS Workshop XIII
HDFView Structure of File Contents of Dataset HDF/HDF-EOS Workshop XIII
HDFView File Menu HDF/HDF-EOS Workshop XIII
HDF-EOS5 File in HDFView HDF/HDF-EOS Workshop XIII
Introduction to HDF5 Programming Model and APIs HDF/HDF-EOS Workshop XIII
Operations Supported by the API • Create objects (groups, datasets, attributes, complex data types, …) • Assign storage and I/O properties to objects • Perform complex subsetting during read/write • Use variety of I/O “devices” (parallel, remote, etc.) • Transform data during I/O • Make inquiries on file and object structure, content, properties HDF/HDF-EOS Workshop XIII
General Programming Paradigm • Properties of object are optionally defined • Creation properties • Access properties • Object is opened or created • Object is accessed, possibly many times • Object is closed HDF/HDF-EOS Workshop XIII
Order of Operations • An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -because- the dataset open call requires a file handle as an argument. • Objects can be closed in any order. HDF/HDF-EOS Workshop XIII
The General HDF5 API • Currently C, Fortran 90, Java, and C++ bindings. • C routines begin with prefix H5? ? is a character corresponding to the type of object the function acts on Example Functions: H5D :Dataset interface e.g.,H5Dread H5F :File interface e.g.,H5Fopen H5S : dataSpace interface e.g.,H5Sclose HDF/HDF-EOS Workshop XIII
HDF5 Defined Types For portability, the HDF5 library has its own defined types: hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or unsigned long long) herr_t: function return value hvl_t: variable length datatype For C, include hdf5.h in your HDF5 application. HDF/HDF-EOS Workshop XIII
The HDF5 API • For flexibility, the API is extensive • 300+ functions • This can be daunting… but there is hope • A few functions can do a lot • Start simple • Build up knowledge as more features are needed Victronix Swiss Army Cybertool 34 HDF/HDF-EOS Workshop XIII
Basic Functions H5Fcreate (H5Fopen) create (open) File H5Screate_simple/H5Screate create dataSpace H5Dcreate (H5Dopen) create (open) Dataset H5Dread, H5Dwrite access Dataset H5Dclose close Dataset H5Sclose close dataSpace H5Fclose close File NOTE: The order specified above is not required. HDF/HDF-EOS Workshop XIII
Other Common Functions DataSpaces: H5Sselect_hyperslab (Partial I/O) H5Sselect_elements (Partial I/O) H5Dget_space Groups: H5Gcreate, H5Gopen, H5Gclose Attributes: H5Acreate, H5Aopen_name, H5Aclose, H5Aread, H5Awrite Property lists: H5Pcreate, H5Pclose H5Pset_chunk, H5Pset_deflate HDF/HDF-EOS Workshop XIII
High Level APIs • Included along with the HDF5 library • Simplify steps for creating, writing, and reading objects. • Do not entirely ‘wrap’ HDF5 library HDF/HDF-EOS Workshop XIII
Example HDF5 Code HDF/HDF-EOS Workshop XIII
Steps to Create a File • Decide on properties the file should have and create them if necessary: • Creation properties, like size of user block • Access properties (improve performance) • Use default properties (H5P_DEFAULT) 2. Create the file 3. Close the file and the property lists, as needed HDF/HDF-EOS Workshop XIII
Code: Create a File hid_t file_id; herr_t status; file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5Fclose (file_id); “/” (root) Note: Return codes not checked for errors in code samples. HDF/HDF-EOS Workshop XIII
Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer Properties Chunked Compressed Dataset Components HDF/HDF-EOS Workshop XIII
A Steps to Create a Dataset 1. Define dataset characteristics a) Datatype – integer b) Dataspace - 4x6 c) Properties if needed, or use H5P_DEFAULT 2. Decide where to put it • Obtain location ID: • Group ID puts it in a Group • File ID puts it in Root Group 3. Create dataset in file 4. Close everything “/” (root) HDF/HDF-EOS Workshop XIII