1 / 70

HDF5 Tutorial

HDF5 Tutorial. LCI April 28, 2008. Outline. Why HDF5? Introduction to HDF5 data and programming models HDF5 tools and utilities HDF5 advanced topics Introduction to parallel HDF5 HDF5 features that affect performance (or caching and buffering in HDF5). Why HDF5?. Matter & the universe.

jasongarcia
Download Presentation

HDF5 Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDF5 Tutorial LCI April 28, 2008 LCI Tutorial

  2. Outline • Why HDF5? • Introduction to HDF5 data and programming models • HDF5 tools and utilities • HDF5 advanced topics • Introduction to parallel HDF5 • HDF5 features that affect performance (or caching and buffering in HDF5) LCI Tutorial

  3. Why HDF5? LCI Tutorial

  4. Matter & the universe Life and nature Weather and climate August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Answering big questions … LCI Tutorial

  5. … involves big data … LCI Tutorial

  6. … varied data … Thanks to Mark Miller, LLNL LCI Tutorial

  7. … and complex relationships … SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match LCI Tutorial

  8. … on big computers … LCI Tutorial

  9. … and on little computers … LCI Tutorial

  10. How do we… • Describe our data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and repositories? • Achieve storage and I/O efficiency? • Give applications and tools easy access our data? LCI Tutorial

  11. HDF started right here at NCSA LCI Tutorial

  12. Efficient storage, I/O Scientific data file format CommonData models I/O software & tools StandardAPIs HDF solution LCI Tutorial

  13. The HDF5 Format LCI Tutorial

  14. palette An HDF5 file is a container… …into which you can put your data objects. lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 LCI Tutorial

  15. “/” (root) “/foo” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array HDF5 structures for organizing objects LCI Tutorial

  16. Introduction to HDF5 Data and Programming Models Tutorial Part I LCI Tutorial

  17. Mesh Example, in HDFView LCI Tutorial

  18. HDF5 Data Model LCI Tutorial

  19. HDF5 data model • HDF5 file – container for scientific data • Primary Objects • Groups • Datasets • Additional ways to organize data • Attributes • Sharable objects • Storage and access properties Everything else is built from these parts. LCI Tutorial

  20. Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 HDF5 Dataset LCI Tutorial

  21. Dataspaces • Two roles • Dataspace contains spatial info about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition • Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimensions = 12 LCI Tutorial

  22. Datatypes (array elements) • Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound LCI Tutorial

  23. Datatypes • HDF5 atomic types • normal integer & float • user-definable (e.g. 13-bit integer) • variable length types (e.g. strings) • pointers - references to objects/dataset regions • enumeration - names mapped to integers • array • HDF5 compound types • Comparable to C structs • Members can be atomic or compound types LCI Tutorial

  24. HDF5 dataset: array of records 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Record LCI Tutorial

  25. Attributes • Attribute – data of the form “name = value”, attached to an object • Operations scaled down versions of dataset operations • Not extendible • No compression • No partial I/O • Optional for the dataset definition • Can be overwritten, deleted, added during the “life” of a dataset • Size under 64K in releases before HDF5 1.8.0 LCI Tutorial

  26. A mechanism for collections of related objects Every file starts with a root group Similar to UNIX directories Can have attributes Groups “/” C A B l k m LCI Tutorial

  27. Path to HDF5 object in a file “/” • / (root) • /x • /foo • /foo/temp • /foo/bar/temp foo x bar temp temp LCI Tutorial

  28. Shared objects “/” A C B R P P • /A/P • /B/R • /C/P LCI Tutorial

  29. Better subsetting access time; extendable chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable File B Metadata in one file, raw data in another Dataset “Fred” split file File A Metadata for Fred Data for Fred Special Storage Options LCI Tutorial

  30. HDF5 Software LCI Tutorial

  31. HDF5 software stack Tools & Applications HDF I/O Library HDF File LCI Tutorial

  32. Structure of HDF5 Library • Object API (C, Fortran 90, Java, C++) • Specify objects and transformation properties • Invoke data movement operations and data transformations • Library internals • Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.) • Virtual file I/O (C only) • Perform byte-stream I/O operations (open/close, read/write, seek) • User-implementable I/O (stdio, network, memory, etc.) LCI Tutorial

  33. Writing – move from memory to disk memory disk LCI Tutorial

  34. disk memory (b) Regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array Partial I/O Move just part of a dataset disk memory (a) Hyperslab from a 2D array to the corner of a smaller 2D array LCI Tutorial

  35. memory disk (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (d) Union of hyperslabs in file to union of hyperslabs in memory. Partial I/O Move just part of a dataset disk memory LCI Tutorial

  36. Layers – parallel example Application I/O flows through many layers from application to disk. Parallel computing system (Linux cluster) Computenode Computenode Computenode Computenode I/O library (HDF5) Parallel I/O library (MPI-I/O) Parallel file system (GPFS) Switch network/I/O servers Disk architecture & layout of data on disk LCI Tutorial

  37. Virtual I/O layer Object API (C, Fortran 90, Java, C++) Library internals Virtual file I/O (C only) LCI Tutorial

  38. Virtual file I/O drivers File Family MPI I/O Memory Network Stdio “Storage” File File Family Memory Network Virtual file I/O layer • A public API for writing I/O drivers • Allows HDF5 to interface to disk, the network, memory, or a user-defined device LCI Tutorial

  39. Apps: simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models UDM SAF hdf5mesh IDL HDF-EOS appl-specificAPIs LANL LLNL, SNL Grids COTS NASA HDF5 virtual file layer (I/O drivers) HDF5 serial & parallel I/O Split Files MPI I/O Custom Stdio Stream Storage ? Across the networkor to/from another application or library HDF5 format User-defined device Split metadata and raw data files File on parallel file system File Common application-specificdata models HDF5 data model & API LCI Tutorial

  40. Other info • Runs almost anywhere • Most workstations • Big ASC machines, Cray, Compaq • TeraGrid and other clusters • QA • Daily regression tests on key platforms • Meets NASA’s highest technology readiness level LCI Tutorial

  41. Other HDF Software • THG HDF • Java tools • Command-line utilities • Regression and performance testing software • Commercial (IDL, Matlab, HDF Explorer, etc.) • Community (EOS, ASCI, etc.) • Integration with other software (SRB, etc.) LCI Tutorial

  42. Creating an HDF5 file with HDF5 tools HDFView, h5mkgrp, h5import LCI Tutorial

  43. A B Example: create this HDF5 file “/” (root) 4x6 array of floats LCI Tutorial 3-D array of floats

  44. Example: create this HDF5 file • HDFView • h5mkgrp file.h5 /B • h5import A.txt -c A.conf -o file.h5 LCI Tutorial

  45. Introduction to HDF5 Programming model and APIs Programming model for sequential access LCI Tutorial

  46. HDF5 Software stack Tools & Applications HDF I/O Library HDF File LCI Tutorial

  47. Structure of HDF5 Library • Object API (C, Fortran 90, Java, C++) • Specify objects and transformation properties • Invoke data movement operations and data transformations • Library internals • Performs data transformations and other prep for I/O • Configurable transformations (compression, etc.) • Virtual file I/O (C only) • Perform byte-stream I/O operations (open/close, read/write, seek) • User-implementable I/O (stdio, network, memory, etc.) LCI Tutorial

  48. Goals of HDF5 Library • Flexible API to support a wide range of operations on data • High performance access in serial and parallel computing environments • Compatibility with common data models and programming languages Because of these goals, the HDF5 API is rich and large LCI Tutorial

  49. Operations supported by the API • Create groups, datasets, attributes, linkages • Create complex data types • Assign storage and I/O properties to objects • Complex subsetting during read/write • Flexible I/O (parallel, remote, etc.) • Ability to transform data during I/O • Query about file and structure and properties • Query about object structure, content, properties LCI Tutorial

  50. Characteristics of the HDF5 API • For flexibility, the API is extensive – 300+ functions • This can be daunting, at first • But there is hope • You can do a lot with a just few functions • So start simple, and build up your knowledge • The library functions are categorized by object type • Once you learn the system, it’s much less daunting • And there is an “H5Lite” API if all you want to do are simple things. LCI Tutorial

More Related