520 likes | 699 Views
HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999. Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign. Topics. I. Overview II. NCSA HDF Activities III. HDF5 IV. HDF4 vs. HDF5. I. HDF Overview.
E N D
HDFHDF/HDF-EOS Workshop IIISept. 14-16, 1999 Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Topics I. Overview II. NCSA HDF Activities III. HDF5 IV. HDF4 vs. HDF5
HDF Mission To develop, promote, deploy, and support open and free technologies that facilitate scientific data storage, exchange, access, analysis and discovery.
What is HDF? • Scientific data file format & supporting software • For images, arrays, tables, other structures • Features • Portability across architectures • I/O library • Files • Efficient I/O • Efficient storage
Why use HDF? • Manage data • Share data • Use software that understands HDF • Improve I/O performance • Improve storage efficiency • Use an open standard
An HDF File: A Collection of Scientific Data Objects HDF file containing four 3-D arrays
Mixing HDF Objects in One File 3-D array group Raster image palette Lat lon temp ---- ---- ----- 12 23 3.1 15 24 4.2 17 21 3.6 16 35 5.7 HDF file Raster image 3-D array Table
HDF Software • Utilities and applications for manipulating, viewing, and analyzing data. • HDF I/O library • High-level, object-specific APIs. • Low-level API for I/O to files, etc. • File or other data source. General Applications } Application Programming Interfaces Low-level Interface HDF file
HDF Applications Software • Free software • NCSA HDF library and utilities • Other software • Commercial/other software that “understands” • all of HDF (Noesys, IDL, HDF Explorer) • certain HDF objects (MATLAB, WebWinds) • certain HDF applications (SHARP, WIM) • http://hdf.ncsa.uiuc.edu/tools.html
What platforms does HDF run on? • Sun: Solaris • SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E • HP9000, HP-Convex Exemplar • IBM: RS6000, SP2 • DEC: Alpha/Digital UNIX, OpenVMSVAX: OpenVMS • Intel: Solarisx86, Linux, FreeBSD, Windows NT/98 • PowerPC: Mac-OS University of Illinois at Urbana-Champaign
A Sampling of HDF Users NCSA-affiliated Science teams Visualization, data exch, fast I/O, ... Mathworks, Fortner Software, Format supported by vendors of visResearch Systems Inc., etc. and data analysis software Boeing Space-time change detection in images Distributed Oceanographic Data Remote access to earth science dataSystem (DODS) Army Research Lab Network distributed global memory Center for Analysis & Prediction Fast parallel I/O, portability, of Storms multi-resolution grids TRAPPIST Exchange, analysis & visualization of (Euro consortium) non-destructive testing data
Major User #1: EOSDIS • ESDIS Project • open standard exchange format and I/O library for EOSDIS • EOS applications • HDF requirements • Earth science data types (HDF-EOS, etc,) • User support for scientists, data producers, etc. • Library and file structure improvements • HDF tools, utilities, access software • Software maintenance and QA
Major User #2: ASCI • ASCI Data Models and Formats (DMF) Group • open standard exchange format and I/O library for ASCI • DOE tri-lab ASCI applications • HDF requirements • large datasets (> a terabyte) • ASCI data types, especially meshes • good performance in massive parallel environments • primarily HDF 5
Java applications • HDF APIs • Basis for tools that access HDF • HDF Viewers • HDF browser/visualizer • HDF4 Data Server Prototype • Lessons learned about remote access to
Remote Data Access • The SDB: Web-based Server-side Data Browser • Java for remote access • WP-ESIP: DODS project • Computational Grids (Globus/GASS)
HDF Standardization • To share files, users must organize them similarly. • HDF user groups create standard profiles • Ways to organize data in HDF files. • Metadata • API • Examples: HDF-EOS, ASCI DMF
HDF-EOS software layers HDF-EOS API General Applications HDF-EOS Applications HDF-EOS profiles Application Programming Interfaces Low-level Interface HDF file
“HDF Configuration Record” (HCR) • To simplify the tasks of defining, comparing, and producing HDF-EOS files • Formal (ODL) descriptions of HDF-EOS objects
HCR of Swath • /* Project XYZ */ • /* First version defined on June 10th, 1998 */ • OBJECT = SWATH • NAME = SCAN1 • OBJECT = Dimension • NAME = GeoTrack • Size = 1200 • END_OBJECT = Dimension • OBJECT = Dimension • NAME = GeoCrossTrack • Size = 205 • END_OBJECT = Dimension • OBJECT = Dimension • NAME = DataX • Size = 2410 • END_OBJECT = Dimension • END_OBJECT = SWATH • END
HCR • HCR Utilities: • Converters: HCR HDF-EOS • Edit HCR and HDF-EOS • Compare HCR with HDF-EOS file • Current projects: • Extend HCR converters to all of HDF4 • Similar work with HDF5 • XML too
Why HDF5? • HDF shortcomings exposed by EOSDIS, ASCI and others... • Limits on object & file size (<2GB) • Limited number of of objects (<20K) • Rigid data models • I/O performance • Aging software infrastructure (code entropy)
…new Demands... • Bigger, faster machines and storage systems • massive parallelism, parallel file systems • teraflop speeds, terabyte storage • Greater complexity • complex data structures • complex subsetting • More emphasis on remote & distributed access
… and ASCI Requirements • Compatibility with vector bundle model • Compatibility with MPI-IO • Ability to transform data between memory & storage • Parallel file systems: PIOFS, HPSS, etc.
New HDF5 Features • More scalable • Larger arrays and files • More objects • Improved data model • New datatypes • Single comprehensive dataset object • Improved software • More flexible, robust library • More flexible API • More I/O options
HDF5 data model • Two primary objects • Dataset • multidimensional array of elements • rich variety of datatypes • group • directory-like structure • contains datasets, groups, other objects
Dataset components • multidimensional array • header with metadata • datatype • dataspace • attributes • storage properties
Simple datatypes • The usual scalars: integer & float • user-defined scalars (e.g. 13-bit integers) • variable length (e.g. strings) • pointers to objects or regions of datasets • enumeration • opaque
Compound datatypes • User-defined • Comparable to C structs • Members can be simple or compound types • Members can be multidimensional
Data Spaces • How data are organized to form a dataset • rank • dimensions • Subsetting during I/O operations • What subset of data is to be moved • In-memory organization of data • In-file organization of data
3 HDF5 dataset: array of records 5 int8 int4 int16 float32 Datatype: Record Dimensionality: 5 x 3
DataspacesReading Dataset into Memory from File File Memory 2D array of integers 3D array of floats Read
Selection: Examples of mappings between file selections and memory selections. (a) A hyperslab from a 2D array to the corner of a smaller 2D array (b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (d) Union of slabs in file to union of slabs in memory. No. of elements must be equal.
Attributes • Named pieces of data • Stored in a dataset or group header • Operations are scaleddown versions of the dataset operations • Not extendible • No compression • No partial I/O
Property list • Properties of objects or operations • Describe how to create, store, access and transfer data
Some Properties File B Dataset “Fred” File A Data for Fred Metadata for Fred Better subsetting access time; extendable • chunked • compressed • extendable • split file Improves storage efficiency, transmission speed Datasets can be extended in any direction Metadata in one file, raw data in another.
Dataset components Dataset Metadata Data Attributes Dataspace time = 32.4 pressure = 987 temp = 56 Datatype int16 Dim_3=2 Storage properties Dim_2=4 Rank=2 Chunked; compressed Dim_1=5
Groups • Structures for organizing the file • Like Vgroups in HDF4 • Like directories in hierarchical file system • Every file starts with a root group • Groups have attributes
Groups “root” • A mechanism for collections of related objects • Every file starts with a root group • Can have attributes • Like directories in Unix, but a graph, rather than a tree
Groups root Groups and members of groups can be shared
Mounting File A root root mount! mount! File B
Reading & writing with HDF5 • Set properties • Describe the data • datatypes • rank and dimensions • mapping between file and memory • Read/write
Files needn’t be files - Virtual File Layer VFL: A public API for writing I/O drivers Hid_t “File” Handle VFL: Virtual File I/O Layer I/O drivers stdio mpio memory network “Storage” Memory Network Files
HDF5 tools • Current • hdf5ls - lists contents of HDF5 file • h5dumper - higher level view • hdf5hdf4 converter • Future • Convert HDF5 ascii, binary, GIFF, etc • Convert HDF4 HDF5 • Java tools - VisAD, etc. • File/code generation from DDL description • Talking to vendors
Other HDF5 activities • Performance tuning • Object model • Fortran and C++ API • Thread-safe HDF5
HDF4 Original format and library Compatible with all earlier versions 6 primary objects multidim array of scalars raster image, palette table annotation group Biggest current user: Earth Observing System Data and Info System (EOSDIS) HDF5 - successor to HDF4 New format and library Not compatible with earlier versions 2 primary objects multidim. array of records group Biggest current user: Accelerated Strategic Computing Initiative (ASCI) HDF4 vs. HDF5
HDF4 object types can be derived from HDF5 datasets and groups HDF5 group HDF5 dataset HDF4 Vgroup lat lon temp 12 23 3.1 15 24 4.2 17 21 3.6 23 35 7.2 25 31 6.3 03 04 43 43 43 -3 72 44 50 34 March 15, 1990. 2-dim array of 45 77 34 23 57 Simulation with k=10.0, beta=1.22e3. Calculate 45 67 87 00 45 multi-component the magnitude ... scalars HDF4 SDS HDF4 Vdata n-dim array 1-dim array of scalars of records HDF4 24-bit raster HDF4 8-bit raster