1 / 89

Tutorial II: HDF5 and NetCDF-4

Tutorial II: HDF5 and NetCDF-4. 10 th International LCI Conference Albert Cheng, Neil Fortner The HDF Group Ed Hartnett Unidata/UCAR. Outline. 8:30 – 9:30 Introduction to HDF5 data, programming models and tools 9:30 – 10:00 Advanced features of the HDF5 library 10:30 – 11:30

amena
Download Presentation

Tutorial II: HDF5 and NetCDF-4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial II: HDF5 and NetCDF-4 10th International LCI Conference Albert Cheng, Neil Fortner The HDF Group Ed Hartnett Unidata/UCAR 10th International LCI Conference - HDF5 Tutorial

  2. Outline 8:30 – 9:30 Introduction to HDF5 data, programming models and tools 9:30 – 10:00 Advanced features of the HDF5 library 10:30 – 11:30 Advanced features of the HDF5 library (continued) 11:30 – 12:00 Introduction to Parallel HDF5 1:00 – 2:30 Introduction to Parallel HDF5 (continued) and Parallel I/O Performance Study 3:00 – 4:30 NetCDF-4 10th International LCI Conference - HDF5 Tutorial

  3. Introduction to HDF5 Data, Programming Modelsand Tools 10th International LCI Conference - HDF5 Tutorial

  4. What is HDF? 10th International LCI Conference - HDF5 Tutorial

  5. HDF is… • HDF stands for Hierarchical Data Format • A file format for managing any kind of data • Software system to manage data in the format • Designed for high volume or complex data • Designed for every size and type of system • Open format and software library, tools • There are two HDF’s: HDF4 and HDF5 • Today we focus on HDF5 10th International LCI Conference - HDF5 Tutorial

  6. Brief History of HDF 1987 At NCSA (University of Illinois), a task force formed to create an architecture-independent format and library: AEHOO (All Encompassing Hierarchical Object Oriented format) Became HDF Early NASA adopted HDF for Earth Observing System project 1990’s 1996 DOE’s ASC (Advanced Simulation and Computing) Project began collaborating with the HDF group (NCSA) to create “Big HDF” (Increase in computing power of DOE systems at LLNL, LANL and Sandia National labs, required bigger, more complex data files). “Big HDF” became HDF5. 1998 HDF5 was released with support from National Labs, NASA, NCSA 2006 The HDF Group spun off from University of Illinois as non-profit corporation 10th International LCI Conference - HDF5 Tutorial

  7. Why HDF5? In one sentence ... 10th International LCI Conference - HDF5 Tutorial 7

  8. Matter and the universe Life and nature August 24, 2001 August 24, 2002 Total Column Ozone (Dobson) 60 385 610 Weather and climate Answering big questions … 10th International LCI Conference - HDF5 Tutorial 8

  9. … involves big data … 10th International LCI Conference - HDF5 Tutorial 9

  10. … varied data … LCI Tutorial Thanks to Mark Miller, LLNL 10th International LCI Conference - HDF5 Tutorial 10

  11. … and complex relationships … SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match 10th International LCI Conference - HDF5 Tutorial 11

  12. … on big computers … … andsmallcomputers … 10th International LCI Conference - HDF5 Tutorial 12

  13. How do we… • Describe our data? • Read it? Store it? Find it? Share it? Mine it? • Move it into, out of, and between computers and repositories? • Achieve storage and I/O efficiency? • Give applications and tools easy access our data? 10th International LCI Conference - HDF5 Tutorial 13

  14. Solution: HDF5! • Can store all kinds of data in a variety of ways • Runs on most systems • Lots of tools to access data • Emphasis on standards (HDF-EOS, CGNS) • Library and format emphasis on I/O efficiency and storage 10th International LCI Conference - HDF5 Tutorial

  15. HDF5 Philosophy A single platform with multiple uses • One general format • One library, with • Options to adapt I/O and storage to data needs • Layers on top and below • Ability to interact well with other technologies • Attention to past, present, future compatibility 10th International LCI Conference - HDF5 Tutorial

  16. Who uses HDF5? 10th International LCI Conference - HDF5 Tutorial

  17. Who uses HDF5? • Applications that deal with big or complex data • Over 200 different types of apps • 2+million product users world-wide • Academia, government agencies, industry 10th International LCI Conference - HDF5 Tutorial

  18. NASA EOS remote sense data • HDF format is the standard file format for storing data from NASA's Earth Observing System (EOS) mission. • Petabytes of data stored in HDF and HDF5 to support the Global Climate Change Research Program. 10th International LCI Conference - HDF5 Tutorial

  19. Structure of HDF5 Library Applications Object API (C, F90, C++, Java) Library internals Virtual file I/O File or other “storage” 10th International LCI Conference - HDF5 Tutorial

  20. HDF Tools - HDFView and Java Products - Command-line utilities (h5dump, h5ls, h5cc, h5diff, h5repack) 10th International LCI Conference - HDF5 Tutorial 20

  21. Simulation, visualization, remote sensing… Examples: Thermonuclear simulations Product modeling Data mining tools Visualization tools Climate models HDF5 Applications & Domains HDF-EOS CGNS ASC Communities HDF5 Data Model & API Virtual File Layer (I/O Drivers) Stdio Split Files MPI I/O Custom Storage ? HDF5 format User-defined device Split metadata and raw data files File on parallel file system File 10th International LCI Conference - HDF5 Tutorial

  22. HDF5The Format 10th International LCI Conference - HDF5 Tutorial

  23. palette An HDF5 “file” is a container… …into which you can put your data objects lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 10th International LCI Conference - HDF5 Tutorial

  24. “Groups” 3-D array lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Table palette Raster image Raster image 2-D array “Datasets” Structures to organize objects “/” (root) “/foo” 10th International LCI Conference - HDF5 Tutorial

  25. HDF5 model • Groups – provide structure among objects • Datasets – where the primary data goes • Data arrays • Rich set of datatype options • Flexible, efficient storage and I/O • Attributes, for metadata Everything else is built essentially from these parts. 10th International LCI Conference - HDF5 Tutorial

  26. HDF5The Software 10th International LCI Conference - HDF5 Tutorial

  27. HDF5 Software Tools, Applications, Libraries HDF5 I/O Library HDF5 File 10th International LCI Conference - HDF5 Tutorial

  28. Most data consumers are here. Scientific/engineering applications. Domain-specific libraries/API, tools. Applications, tools use this API to create, read, write, query, etc. Power users (consumers) Modules to adapt I/O to specific features of system, or do I/O in some special way. “File” could be on parallel system, in memory, collection of files, etc. Users of HDF5 Software Tools & Applications HDF5 Application Programming Interface “Virtual file layer” (VFL) File system, MPI-IO, SAN, other layers “HDF5 File” 10th International LCI Conference - HDF5 Tutorial

  29. HDF5 Data Model 10th International LCI Conference - HDF5 Tutorial

  30. HDF5 model (recap) • Groups – provide structure among objects • Datasets – where the primary data goes • Data arrays • Rich set of datatype options • Flexible, efficient storage and I/O • Attributes, for metadata • Other objects • Links (point to data in a file or in another HDF5 file) • Datatypes (can be stored for complex structures and reused by multiple datatsets) 10th International LCI Conference - HDF5 Tutorial

  31. Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 HDF5 Dataset 10th International LCI Conference - HDF5 Tutorial

  32. HDF5 Dataspace • Two roles • Dataspace contains spatial info about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition • Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimensions = 12 10th International LCI Conference - HDF5 Tutorial

  33. HDF5 Datatype • Datatype – how to interpret a data element • Permanent part of the dataset definition • Two classes: atomic and compound • Can be stored in a file as an HDF5 object (HDF5 committed datatype) • Can be shared among different datasets 10th International LCI Conference - HDF5 Tutorial

  34. HDF5 Datatype • HDF5 atomic types include • normal integer & float • user-definable (e.g., 13-bit integer) • variable length types (e.g., strings) • references to objects/dataset regions • enumeration - names mapped to integers • array • HDF5 compound types • Comparable to C structs (“records”) • Members can be atomic or compound types 10th International LCI Conference - HDF5 Tutorial

  35. HDF5 dataset: array of records 3 5 Dimensionality: 5 x 3 int8 int4 int16 2x3x2 array of float32 Datatype: Record 10th International LCI Conference - HDF5 Tutorial

  36. Better subsetting access time; compressible; extendable; chunked Improves storage efficiency, transmission speed compressed Arrays can be extended in any direction extendable File B Metadata in HDF5 file, raw data in a binary file Dataset “Fred” external File A Metadata for Fred Data for Fred Special storage options for dataset 10th International LCI Conference - HDF5 Tutorial

  37. HDF5 Attribute • Attribute – data of the form “name = value”, attached to an object by application • Operations similar to dataset operations, but … • Not extendible • No compression or partial I/O • Can be overwritten, deleted, added during the “life” of a dataset 10th International LCI Conference - HDF5 Tutorial

  38. A mechanism for organizing collections of related objects Every file starts with a root group Similar to UNIXdirectories Can have attributes HDF5 Group “/” 10th International LCI Conference - HDF5 Tutorial

  39. Path to HDF5 object in a file “/” Y • / (root) • /X • /Y • /Y/temp • /Y/bar/temp X bar temp temp 10th International LCI Conference - HDF5 Tutorial

  40. Shared HDF5 objects “/” A C B R R P • /A/P • /B/R • /C/R 10th International LCI Conference - HDF5 Tutorial

  41. HDF5 Data ModelExample ENSIGHT Automotive crash simulation 10th International LCI Conference - HDF5 Tutorial

  42. Automotive crash simulation 10th International LCI Conference - HDF5 Tutorial

  43. Automotive crash simulation 10th International LCI Conference - HDF5 Tutorial

  44. Automotive crash simulation 10th International LCI Conference - HDF5 Tutorial

  45. Solid modeling 10th International LCI Conference - HDF5 Tutorial

  46. Solid modeling 10th International LCI Conference - HDF5 Tutorial

  47. HDF5mesh 10th International LCI Conference - HDF5 Tutorial

  48. Mesh Example, in HDFView April 28, 2008 LCI Tutorial 10th International LCI Conference - HDF5 Tutorial 48

  49. HDF5 Software 10th International LCI Conference - HDF5 Tutorial

  50. HDF5 software stack Tools & Applications HDF I/O Library HDF File 10th International LCI Conference - HDF5 Tutorial

More Related