580 likes | 793 Views
HDF5 Datasets and I/O. Dataset storage and its effect on performance. Outline. Dataset metadata and array data storage layouts Types of dataset storage layouts Factors affecting I/O performance I/O with compact datasets I/O with contiguous datasets I/O with chunked datasets
E N D
HDF5 Datasets and I/O Dataset storage and its effect on performance HDF5 Workshop at PSI
Outline • Dataset metadata and array data storage layouts • Types of dataset storage layouts • Factors affecting I/O performance • I/O with compact datasets • I/O with contiguous datasets • I/O with chunked datasets • Variable length data and I/O HDF5 Workshop at PSI
HDF5 Layers HDF5 Application Application buffer HDF5 Object Layer (API) H5Dwrite is called Data is prepared for I/O HDF5 Internals VFD Layer SEC2 driver performs I/O HDF5 file HDF5 Workshop at PSI
Goal of this talk • Present what is happening to data inside the HDF5 library • Show how application can control the HDF5 library behavior • Specifically: • Describe some basic operations and data structures and explain how they affect performance and storage sizes • Give some “recipes” for how to improve performance HDF5 Workshop at PSI
HDF5 dataset metadata HDF5 Workshop at PSI
HDF5 Dataset • Data array • Also called raw data • Metadata • Dataspace • Rank, dimensions of dataset array • Datatype • Information on how to interpret data • Storage Properties • How array is organized on disk • Attributes • User-defined metadata (optional) HDF5 Workshop at PSI
HDF5 dataset components Dataset header Dataset data array Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype IEEE 32-bit float Attributes Storage info Time = 32.4 Chunked Pressure= 987 Compressed Temp = 56 Metadata Raw data HDF5 Workshop at PSI
HDF5 metadata • HDF5 metadata • Information about HDF5 objects used by the HDF5 library • Examples: object headers, B-tree nodes for group, B-Tree nodes for chunks, heaps, super-block, etc. • Usually small compared to raw data sizes (KB vs. MB-GB) HDF5 Workshop at PSI
HDF5 metadata cache Applicationmemory Metadata cache (MDC) Dataset array data Datasetheader Dataset header resides in MDC. MDC is handled by HDF5 library HDF5 metadata Dataset array data HDF5 File Metadata is mixed with raw data in HDF5 file HDF5 Workshop at PSI
HDF5 metadata cache • Metadata cache • Space allocated to handle pieces of the HDF5 metadata • Allocated by the HDF5 library in application’s memory space • Allocated per file; released when file is closed • Metadata cache behavior affects overall performance • Metadata cache implementation prior to HDF5 1.6.5 could cause performance degradation for some applications HDF5 Workshop at PSI
HDF5 dataset storage layouts HDF5 Workshop at PSI
HDF5 datasets storage layouts • Contiguous • External • Chunked • Compact HDF5 Workshop at PSI
Contiguous storage layout • Contiguous storage layout is a default storage layout for an HDF5 dataset • Dataset raw data is stored in one contiguous block in HDF5 file HDF5 Workshop at PSI
Contiguous storage layout Applicationmemory Metadata cache (MDC) Dataset array data Datasetheader Dataset array data Datasetheader HDF5 File Raw data is stored in one contiguous block in HDF5 file HDF5 Workshop at PSI
External storage layout • Dataset raw data is stored in an external file(s) that should be kept together with the HDF5 file • Layout in the external file is specified by an application • An easy way to make legacy data available to HDF5 library HDF5 Workshop at PSI
External storage layout Application memory Metadata cache (MDC) Dataset array data Datasetheader Unix/Windows file HDF5 file Datasetheader Metadata is stored in HDF5 file. Raw data is stored in a separate file as specified by application HDF5 Workshop at PSI
Chunked storage layout • Chunking – storage layout where a dataset is partitioned in fixed-size multi-dimensional tiles or chunks • Each chunk is stored as contiguous block • HDF5 library treats each chunk as atomic object for I/O • Greatly affects performance and file sizes • Use for extendible datasets and datasets with filters applied (checksum, compression) • Use for sub-setting of big datasets HDF5 Workshop at PSI
Chunked storage layout Applicationmemory Dataset array data Metadata cache (MDC) B C D A Datasetheader Chunkindex HDF5 File Datasetheader Chunkindex C D B A Raw data is stored in separate chunks in HDF5 file HDF5 Workshop at PSI
Compact storage layout • Raw data is stored in a dataset object header • Raw data read/written with the header • Use for small (few K) datasets to minimize small I/O operations HDF5 Workshop at PSI
Compact storage layout Applicationmemory Metadata cache (MDC) Dataset array data Datasetheader Datasetheader Dataset array data HDF5 File Raw data is stored in a dataset object header HDF5 Workshop at PSI
Factors affecting I/O performance HDF5 Workshop at PSI
HDF5 data structures • Data structures used by HDF5 library • B-trees (groups, dataset chunks) • Hash tables • Local and global heaps (variable length data: link names, strings, etc.) • Other concepts • HDF5 metadata cache • HDF5 chunk cache • Free space management data structure • Etc. HDF5 Workshop at PSI
Operations on data inside HDF5 library • Copying to/from internal buffers • Datatype conversion, e.g., • Float to integer • Little-endian to big-endian • 64-bit integer to 16-bit integer • Variable-length data conversion from memory to file • Scattering - gathering • Data is scattered/gathered from/to application buffers into internal buffers for datatype conversion and partial I/O HDF5 Workshop at PSI
Operations on data inside HDF5 library • Data transformation (filters, compression) • Checksum on raw data and metadata • Algebraic transform • GZIP and SZIP compressions • HDF5 and user-defined data transformations HDF5 Workshop at PSI
I/O performance • I/O performance depends on many factors • Storage layouts • Dataset storage properties • Chunking strategy • Metadata cache performance • Datatype conversion performance • Other filters, such as compression • Access patterns HDF5 Workshop at PSI
I/O with different storage layouts HDF5 Workshop at PSI
Writing compact dataset HDF5 Workshop at PSI
Writing compact dataset Applicationmemory Metadata cache (MDC) Dataset array data Datasetheader Datasetheader HDF5 File Raw data is written when object header is written HDF5 Workshop at PSI
Writing contiguous dataset HDF5 Workshop at PSI
Writing contiguous dataset Applicationmemory Metadata cache (MDC) Dataset array data Datasetheader Dataset array data Datasetheader HDF5 File Raw data is written first. The header is written when flushed to file (H5Dclose, H5Fflush, or MDC flush done by the HDF5 library) HDF5 Workshop at PSI
Writing contiguous dataset with conversion Applicationmemory Metadata cache (MDC) Dataset array data Datasetheader 1MB conversion buffer Datasetheader HDF5 File Raw data goes through conversion buffer. The header is written when flushed to file (H5Dclose, H5Fflush, or MDC flush done by HDF5 library) HDF5 Workshop at PSI
Partial i/o for contiguous dataset HDF5 Workshop at PSI
Sub-setting of contiguous datasetSeries of adjacent rows Application data in memory M rows N One I/O operation M rows HDF5 File N elements Subset is contiguousin file HDF5 Workshop at PSI
Sub-setting of contiguous datasetAdjacent, partial rows Application data in memory N elements M rows Several I/O operation M rows HDF5 File N elements Subset is in M contiguousblocksin file HDF5 Workshop at PSI
Sub-setting of contiguous datasetExtreme case: writing a column Application data in memory M rows Several small I/O operation 1element 1 element HDF5 File Subset data is scattered in a file in M different locations HDF5 Workshop at PSI
Sub-setting of contiguous datasetData sieve buffer Application data in memory Data is copied to a sieve buffer in memory (64K) memcopy M One write operation 1 element … HDF5 File HDF5 Workshop at PSI
Performance tuning for contiguous dataset • Datatype conversion • Avoid for better performance • Use H5Pset_buffer function to customize conversion buffer size • Partial I/O • Write/read in big contiguous blocks • Use H5Pset_sieve_buf_size to improve performance for complex sub-setting • Caution: • Sieve buffer is allocated when the first write occurs and is released when the dataset is closed. • Memory will grow if there are a lot opened datasets. HDF5 Workshop at PSI
i/o for chunked dataset HDF5 Workshop at PSI
Recall: Chunked storage layout Applicationmemory Dataset array data Metadata cache (MDC) B C D A Datasetheader Chunkindex HDF5 File Datasetheader Chunkindex C D B A Raw data is stored in separate chunks in HDF5 file HDF5 Workshop at PSI
HDF5 chunking • HDF5 library treats each chunk as atomic object • Compression is applied to each chunk • Datatype conversion, other filters applied per chunk • Chunk size greatly affects performance • Chunk overhead adds to file size • Chunk processing involves many steps HDF5 Workshop at PSI
HDF5 chunk cache • Chunk cache (general points, details later) • Caches chunks for better performance; remains allocated across multiple calls • Created for each chunked dataset • Size of chunk cache is set for file(default size 1MB) • Each chunked dataset has its own chunk cache • Chunk may be too big to fit into cache • Memory may grow if application keeps opening datasets HDF5 Workshop at PSI
HDF5 chunk cache Metadata cache (MDC) Datasetheader Metadata cache Default size is 1MB Chunking B-tree nodes Chunk caches (per dataset) Application memory HDF5 Workshop at PSI
Writing chunked dataset Application memory space Chunked dataset Chunk cache Conversion buffer A C C B Filter pipeline HDF5 File B A C Datatype conversion is performed before chunked placed in cache Chunk is written when evicted from cache Compression and other filters are applied on eviction HDF5 Workshop at PSI
Partial i/o for chunked dataset HDF5 Workshop at PSI
Partial I/O for chunked dataset • Example: write the green subset from the dataset , converting the data • Dataset is stored as six chunks in the file. • The subset spans four chunks, numbered 1-4 in the figure. • Hence four chunks must be written to the file. • But first, the four chunks must be read from the file, to preserve those parts of each chunk that are not to be overwritten. 1 2 3 4 HDF5 Workshop at PSI
Partial I/O for chunked dataset • For each of the four chunks: • Read chunk from file into chunk cache, unless it’s already there. • Determine which part of the chunk will be replaced by the selection. • Move those elements to conversion buffer and perform conversion • Move data elements to write from application buffer to conversion buffer • Move those elements back from conversion buffer to chunk cache. • Apply filters (compression) when chunk is flushed from chunk cache • For each element 3 memcopy performed HDF5 Workshop at PSI
Partial I/O for chunked dataset Chunk cache memcopy memcopy Conversion buffer 3 memcopy Application memory Compress and write to file HDF5 File Chunk HDF5 Workshop at PSI
i/o for variable-length dataset HDF5 Workshop at PSI
Examples of variable length data • String A[0] “the first string we want to write” ………………………………… A[N-1] “the N-th string we want to write” • Each element is a record of variable-length A[0] (1,1,0,0,0,5,6,7,8,9) [length = 10] A[1] (0,0,110,2005) [length = 4] ……………………….. A[N] (1,2,3,4,5,6,7,8,9,10,11,12,….,M) [length = M] HDF5 Workshop at PSI
Variable length data in HDF5 • Variable length description in HDF5 application typedefstruct { size_t length; void *p; }hvl_t; • Base type can be any HDF5 type H5Tvlen_create(base_type) • ~ 20 bytes overhead for each element • Data cannot be compressed HDF5 Workshop at PSI