290 likes | 304 Views
Learn about HDF5 and its capabilities, how it can address data management challenges, and comparison with filesystems, XML, relational databases.
E N D
Introduction to HDF5Session Two Data Model ComparisonHDF5 File Format
Our Purpose Today • Familiarize you with HDF5 and its capabilities. 2) Help you understand how HDF5 might be applied to your data management challenges.
HDF5 Data Model File Link Dataset Group Datatype Dataspace Attribute HDF5 Objects
Developing a Project Data Model HDF5 Data Model Relational A Relational Database HDF5 File
HDF5 / Directories and Files • Both support hierarchies for organizing information (and to some degree, directed graphs)
HDF5 / XML • Both support rich metadata and allow new types to be defined • HDF5 objects designed for numeric data; XML objects designed for text
HDF5 / Relational Databases • HDF5 supports multi-dimensional arrays with common datatypes in the cells; locate by offset • RDB support rows with different data types in fields; locate by primary key
HDF5 Technology Platform • HDF5 data model • The “building blocks” for data organization and specification • HDF5 software • Library, language interfaces, tools • HDF5 file format • Bit-level organization of HDF5 file Let’s look at… Recall…
HDF5 File Format • Defined by the HDF5 File Format Specification • Specifies the bit-level organization of an HDF5 file on storage media • Maps the data model objects to a linear address space • Other representations of the data model objects are also possible, but those are not the HDF5 format • Self-describing • All the information necessary to read and reconstruct the data model objects is specified by the format • Designed to work well with other technologies • Designed for speed and storage efficiency • Binary format
HDF5 File Format Specification Introduction You can have the power of the format without worrying about the details of the specification.
Developing a Project Data Model HDF5 Data Model Relational A Relational Database HDF5 File
Physical Instantiations Format
HDF5 / Filesystem • Both allow traversal of objects in the hierarchy • Both include internal metadata for fast access to subsets of the data • Both can handle variety of data • HDF5 file can be easily migrated or shared
HDF5 / “Binary Flat File” • “Binary Flat File” = A sequence of bytes representing (primarily) numeric data. Often written by scientific and engineering applications to save results from simulations or experiments. • A binary flat files usually represents the fastest way to write numeric data. Read performance varies depending on access patterns. • Unlike HDF5, binary flat files are not self-describing or portable across architectures.
HDF5/XML • Both HDF5 and XML are self-describing and portable • XML is text-based and requires contents to be accessed sequentially • HDF5 is binary and supports random access and subsetting
HDF5/PDF • Both HDF5 and PDF formats are published and open • Both can include heterogeneous types of information • PDF focused on documents • HDF5 focused on collections of different types, with strong support for multi-dimensional arrays of numeric data • Both are portable across architectures
HDF5 / Relational Databases • RDB provides access control features; HDF5 does not • RDB transaction based; HDF5 is not • Transactions / Logging introduce overhead that may not be needed • HDF5 not designed for many writers to ‘random’ locations • RDB provides built-in indices to values • HDF5 provides navigation to datasets / subsets within datasets • HDF5 files portable across platforms
Discussion • How could daily temperature measurements made at various locations throughout a building be modeled in different formats? Filesytem, Binary Flat File, XML, PDF, Relational Database • What are some pros/cons of each?
Review • HDF5 consists of • file format • self-describing • many internal structures to support high-performance • software • data model • file, dataset, datatype, dataspace, attribute, group, link • HDF5 designed to support • management of high-volume, complex data • data sharing and preservation
HDF5 Data ModelExample ENSIGHT Automotive Crash Simulation