290 likes | 464 Views
High level view of HDF5 Data structures and library. HDF Summit Boeing Seattle September 19, 2006. Mesh Example, in HDFView. HDF5 Data Model. HDF5 data model. HDF5 file – container for scientific data Primary Objects Groups Datasets Additional ways to organize data Attributes
E N D
High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006
HDF5 data model • HDF5 file – container for scientific data • Primary Objects • Groups • Datasets • Additional ways to organize data • Attributes • Sharable objects • Storage and access properties Everything else is built from these parts.
Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info time = 32.4 Chunked pressure = 987 compressed temp = 56 HDF5 Dataset
Dataspaces • Dataspace – spatial info about a dataset • Rank and dimensions • Permanent part of dataset definition • Subset of points, for partial I/O • Needed only during I/O operations • Apply to datasets in memory or in the file Rank = 2 Dimensions = 4x6
Datatypes (array elements) • Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound
Datatypes • HDF5 atomic types • normal integer & float • user-definable (e.g. 13-bit integer) • variable length types (e.g. strings) • pointers - references to objects/dataset regions • enumeration - names mapped to integers • array • HDF5 compound types • Comparable to C structs • Members can be atomic or compound types
HDF5 dataset: array of records 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Record
Attributes • Attribute – data of the form “name = value”, attached to an object • Operations scaleddown versions of dataset operations • Not extendible • No compression • No partial I/O • Optional for the dataset definition • Can be overwritten, deleted, added during the “life” of a dataset
A mechanism for collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes “Groups” “/” harry tom dick b a c
HDF5 objects are identified and located by their pathnames “/” • / (root) • /x • /foo • /foo/temp • /foo/bar/temp foo x bar temp temp
Groups & their members can be shared “/” tom harry dick R P P • /tom/P • /dick/R • /harry/P
Better subsetting access time; extendable chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable File B Metadata in one file, raw data in another. Dataset “Fred” Split file File A Metadata for Fred Data for Fred Special Storage Options
HDF5 Software stack Tools & Applications HDF I/O Library HDF File
Structure of HDF5 Library • Object API (C, Fortran 90, Java, C++) • Specify objects and transformation properties • Invoke data movement operations and data transformations • Library internals • Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.) • Virtual file I/O (C only) • Perform byte-stream I/O operations (open/close, read/write, seek) • User-implementable I/O (stdio, network, memory, etc.)
Writing – move from memory to disk memory disk
disk memory (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array Partial I/O – move just part of a dataset disk memory (a) Hyperslab from a 2D array to the corner of a smaller 2D array
memory disk (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O – move just part of a dataset disk memory
Layers – parallel example Application I/O flows through many layers from application to disk. Parallel computing system (Linux cluster) Computenode Computenode Computenode Computenode I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Disk architecture & layout of data on disk
Virtual I/O layer Object API (C, Fortran 90, Java, C++) Library internals Virtual file I/O (C only)
Virtual file I/O drivers File Family MPI I/O Memory Network Stdio “Storage” File File Family Memory Network Virtual file I/O layer • A public API for writing I/O drivers • Allows HDF5 to interface to disk, the network, memory, or a user-defined device
Apps: simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models UDM SAF hdf5mesh IDL HDF-EOS appl-specificAPIs LANL LLNL, SNL Grids COTS NASA HDF5 virtual file layer (I/O drivers) HDF5 serial & parallel I/O Split Files MPI I/O Custom Stdio Stream Storage ? Across the networkor to/from another application or library HDF5 format User-defined device Split metadata and raw data files File on parallel file system File Common application-specificdata models HDF5 data model & API
Other info • Runs almost anywhere • Most workstations • Big ASCI machines, Cray, Compaq • TeraGrid and other clusters • QA • Daily regression tests on key platforms • Meets NASA’s highest technology readiness level
Other HDF Software • NCSA HDF • Java tools • Command-line utilities • Regression and performance testing software • Commercial (IDL, Matlab, HDF Explorer, etc.) • Community (EOS, ASCI, etc.) • Integration with other software (SRB, etc.)
HDF Information Center http://hdfgroup.org/ HDF Help email address hdfhelp@hdfgroup.org/ HDF users mailing list hdfnews@hdfgroup.org/ HDF Information