1 / 28

High level view of HDF5 Data structures and library

High level view of HDF5 Data structures and library. HDF Summit Boeing Seattle September 19, 2006. Mesh Example, in HDFView. HDF5 Data Model. HDF5 data model. HDF5 file – container for scientific data Primary Objects Groups Datasets Additional ways to organize data Attributes

Download Presentation

High level view of HDF5 Data structures and library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High level view of HDF5 Data structures and library HDF Summit Boeing Seattle September 19, 2006

  2. Mesh Example, in HDFView

  3. HDF5 Data Model

  4. HDF5 data model • HDF5 file – container for scientific data • Primary Objects • Groups • Datasets • Additional ways to organize data • Attributes • Sharable objects • Storage and access properties Everything else is built from these parts.

  5. Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info time = 32.4 Chunked pressure = 987 compressed temp = 56 HDF5 Dataset

  6. Dataspaces • Dataspace – spatial info about a dataset • Rank and dimensions • Permanent part of dataset definition • Subset of points, for partial I/O • Needed only during I/O operations • Apply to datasets in memory or in the file Rank = 2 Dimensions = 4x6

  7. Datatypes (array elements) • Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound

  8. Datatypes • HDF5 atomic types • normal integer & float • user-definable (e.g. 13-bit integer) • variable length types (e.g. strings) • pointers - references to objects/dataset regions • enumeration - names mapped to integers • array • HDF5 compound types • Comparable to C structs • Members can be atomic or compound types

  9. HDF5 dataset: array of records 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Record

  10. Attributes • Attribute – data of the form “name = value”, attached to an object • Operations scaled­down versions of dataset operations • Not extendible • No compression • No partial I/O • Optional for the dataset definition • Can be overwritten, deleted, added during the “life” of a dataset

  11. A mechanism for collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes “Groups” “/” harry tom dick b a c

  12. HDF5 objects are identified and located by their pathnames “/” • / (root) • /x • /foo • /foo/temp • /foo/bar/temp foo x bar temp temp

  13. Groups & their members can be shared “/” tom harry dick R P P • /tom/P • /dick/R • /harry/P

  14. Better subsetting access time; extendable chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable File B Metadata in one file, raw data in another. Dataset “Fred” Split file File A Metadata for Fred Data for Fred Special Storage Options

  15. HDF5 Software

  16. HDF5 Software stack Tools & Applications HDF I/O Library HDF File

  17. Structure of HDF5 Library • Object API (C, Fortran 90, Java, C++) • Specify objects and transformation properties • Invoke data movement operations and data transformations • Library internals • Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.) • Virtual file I/O (C only) • Perform byte-stream I/O operations (open/close, read/write, seek) • User-implementable I/O (stdio, network, memory, etc.)

  18. Writing – move from memory to disk memory disk

  19. disk memory (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array Partial I/O – move just part of a dataset disk memory (a) Hyperslab from a 2D array to the corner of a smaller 2D array

  20. memory disk (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O – move just part of a dataset disk memory

  21. Layers – parallel example Application I/O flows through many layers from application to disk. Parallel computing system (Linux cluster) Computenode Computenode Computenode Computenode I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Disk architecture & layout of data on disk

  22. Virtual I/O layer Object API (C, Fortran 90, Java, C++) Library internals Virtual file I/O (C only)

  23. Virtual file I/O drivers File Family MPI I/O Memory Network Stdio “Storage” File File Family Memory Network Virtual file I/O layer • A public API for writing I/O drivers • Allows HDF5 to interface to disk, the network, memory, or a user-defined device

  24. Apps: simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models UDM SAF hdf5mesh IDL HDF-EOS appl-specificAPIs LANL LLNL, SNL Grids COTS NASA HDF5 virtual file layer (I/O drivers) HDF5 serial & parallel I/O Split Files MPI I/O Custom Stdio Stream Storage ? Across the networkor to/from another application or library HDF5 format User-defined device Split metadata and raw data files File on parallel file system File Common application-specificdata models HDF5 data model & API

  25. Other info • Runs almost anywhere • Most workstations • Big ASCI machines, Cray, Compaq • TeraGrid and other clusters • QA • Daily regression tests on key platforms • Meets NASA’s highest technology readiness level

  26. Other HDF Software • NCSA HDF • Java tools • Command-line utilities • Regression and performance testing software • Commercial (IDL, Matlab, HDF Explorer, etc.) • Community (EOS, ASCI, etc.) • Integration with other software (SRB, etc.)

  27. Thank you

  28. HDF Information Center http://hdfgroup.org/ HDF Help email address hdfhelp@hdfgroup.org/ HDF users mailing list hdfnews@hdfgroup.org/ HDF Information

More Related