400 likes | 542 Views
HDF5. Cross Disciplinary Applications of Multiplex Observational and Computational Datasets using for Archiving and High Performance Processing. Marcel Ritter , Werner Benger , Joseph Stoeckl , Donna Delparte , Mike Folk, Quincey Koziol, Frank Steinbacher and Markus Aufleger .
E N D
HDF5 Cross Disciplinary Applications of Multiplex Observational and Computational Datasets usingfor Archiving and High Performance Processing. Marcel Ritter, Werner Benger, Joseph Stoeckl, Donna Delparte, Mike Folk, Quincey Koziol, Frank SteinbacherandMarkus Aufleger ASTRO@UIBK Center for Computation & Technology
Outlook • Motivation • Requirements on a Data Format • Introduction HDF5 • F5 • Introduction • Examples of Data Sets • Application Example: • The Hawaiian Geospatial Data Repository • Conclusion
Motivation Scientific Collaboration Workgroup A Workgroup B Workgroup C Workgroup D Software 3 Software 4
Motivation Scientific Collaboration Workgroup A Workgroup B Software Tool 1 Software Tool 2 File Format 2 File Format 1 Workgroup C Workgroup D Software 3 Software 4
Motivation Workgroup A Workgroup B Software Tool 1 Software Tool 2 File Format 2 File Format 1 Data Exchange Workgroup C Workgroup D Software 3 Software 4
Motivation File Format 1 … File Format 2 File Format N File Format 3 File Format 5 File Format 4 Workgroup C Workgroup D Software 3 Software 4
Motivation File Format 1 … File Format 2 File Format N o(N2) File Format 3 File Format 5 File Format 4 Huge Implementation Effort Workgroup C Workgroup D Software 3 Software 4
Motivation File Format 1 … File Format 2 File Format N Common Data Format File Format 3 File Format 5 File Format 4 o(N) Less Implementation Effort Workgroup C Workgroup D Software 3 Software 4
Motivation Easier collaboration More time for science Workgroup B Workgroup A Software Tool 1 Software Tool 2 Software 4 Software 3 Common Data Format Workgroup C Workgroup D
Requirementson a Data Format • Easy to read and write • Fast and efficient • Hold hugedata sets ( Terabytes ) • Multiple operating systems • Hold huge variety of data • Store meta information of the data • Self-descriptive • Well-documented, active support and community • Sustainable (still easily accessible in >10 years) !
HDF5 Hierarchical Data Format 5 http://www.hdfgroup.org/HDF5
HDF5 - A Few Analogies • File system (in a file) • Binary XML file • PDF for numerical data • Database (container for array variables)
HDF5 - Relationships / SimOut Parameters 10;100;1000 Relation Attribute Timestep 36,000 City A Group lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 Dataset
HDF5 -What Users Get… • A multi-platform library and tools built on over 10 years experience in large data handling from the high performance computing community (HPC). • A capability that: • Lets them organizelarge and/or complex collections of data • Gives them efficient and scalabledata storage and access • Lets them integrate a wide varietyof types of data and data sources • Guarantees long-term data integrity and preservation
HDF5 • Shapefiles: HDF5 as container format Browser application
HDF5 Pixel data Vector data • Shapefiles: HDF5 as container format Attribute data Browser application
Aqua (6/01) Terra CERES MISR MODIS MOPITT AquaCERES MODIS AMSR Aura TES HRDLS MLS OMI HDF5 - More Applications Earth Science (Earth Observing System) Big simulations Billions of elements/dozens associated values Flight Testing Movie Making
HDF5 • More than a ZIP or TAR • also allows to describe the structure of the contents of a file • How to store different kinds of data sets consistently in HDF5?
F5 Fiber Bundle Data Model http://www.fiberbundle.net
F5 • Based on HDF5 • Inspired by concepts of: • Topology • Differential Geometry • Geometric Algebra • Separation of Geometry (Grids) and Datafield (Fields) Grid Field
F5 Field • Hierarchical Structure: Coordinates Topology Grid Time Slice Fiber Bundle
F5 Field • Hierarchical Structure: Coordinates Topology Grid Visible to the end user Time Slice FiberBundle
Fiber: 0D 1D 3D 6D Base: 3D 2D 1D 0D
F5 • Multi Channel – Multi Resolution Images:
F5 • Multi Channel – Multi Resolution Images: Time Grid Topology Representation Field [Datatype] /1.4/Satellite/VertexRefinement1x1/Cartesian/Positions [uniform-grid]/RGB [byte,byte,byte] /N-IR [float64]/T-IR [float64] /VertexRefinement2x2/Cartesian/Positions /RGB “/N-IR/T-IR /1.6/ …
F5 • Full Waveform LIDAR: t_emission t3 t1 t2
F5 • Full Waveform LIDAR: - Laser Data Time Grid Topology Representation Field [Datatype] /CorseTime/LASER/POINTS/CartesianCoords/Positions [point3D] /TimeStamp [float64]/Waveform [uint16,uint16] /Reflectance [float32] /SHOTS /SHOTSAsPOINTS/Positions vlen[uint32] /Origin [point3D] /Direction [vector3D] /EmissionTime [float64] t_emission t3 t1 t2
F5 • Full Waveform LIDAR: - Airplane Data /CorseTime/PLANE/POINTS/CartesianCoords/Positions [point3D] /Rotation [rotor3D] /TimeStamps [float64]
F5 • Bringing together in F5: • Satellite data • LIDAR • Shapefiles • Features of HDF5 • Sustainable storage • Meta data • Compression • Parallel IO • Hyperslab access • Consistent data organization of simple and complex spatial-temporal data • Handle time series of data easily • Make tools of other disciplines applicableto the Geo-science Community, such as astrophysics imaging mosaic tools for satellite data: Montage, http://montage.ipac.caltech.edu Benefits
Application Example HawaiIan Data repoSitory http://www.epscor.hawaii.edu
HawaiIan Data repoSitory Goal: Centralized integrative capability to storeand manage access to massive (terabytes) research datasets Broad statewide research community University of Hawaii research teams Users: Mission: Objectives: Collect, store and manage access to data Discovery, manipulation, fusion and visualization Utilize user portals Utilize and link to the Maui High Performance Computing Center (MHPCC)
Geospatial Information and Mass Storage How to manage and store large complex datasets?!!
Conclusion collaborations • A common data format eases andreduces wasted time spenton data conversions • Data formats for sustainable transparent storage of huge and complex data exist, one just has to use them – • captures observational and simulation data consistently. • Geoscience repositories, such as the can be built upon this format. HDF5 F5 Hawaiian Data repository
Thankyou References: http://www.hdfgroup.org/HDF5 http://www.fiberbundle.net http://www.epscor.hawaii.edu http://montage.ipac.caltech.edu http://sciviz.cct.lsu.edu http://www.marcel-ritter.com
HDF5 - HDFView screenshot of shapefiles
Geospatial Information and Mass Storage • Weather station data • Marine buoy sensor data • GPS data collection • Database datasets, excel files • Spatial data - imagery, LiDAR, GIS • Geowebapplication services – WMS, WFS, WPC • Database management • Data streaming • Data storage of statewide datasets • Access to HPC services • real-time modeling and analysis • Upload and download capability • Metadata search capacity • Visualization of spatial and non-spatial datasets
F5 • Grid • Manifold describing the base space • Topology • Refinement level • Coordinate representation • Vertex positions in representation