170 likes | 317 Views
Integrating HDF5 with SRB. The HDF5-SRB Architecture Peter Cao, HDF, NCSA February 24, 2005. Project Description. Object-level access to HDF5 stored in the SRB Use SRB as middleware to transfer data between the server and client Interactive and efficient access Previous work
E N D
Integrating HDF5 with SRB The HDF5-SRB Architecture Peter Cao, HDF, NCSA February 24, 2005
Project Description • Object-level access to HDF5 stored in the SRB • Use SRB as middleware to transfer data between the server and client • Interactive and efficient access • Previous work • Extracting entire HDF5 files • Extracting byte-streams through the SRB’s POSIX interface
The SRB Architecture SRB Client MCAT SRB Server HPSS Unitree DB2 ObjStore HDF5 FTP Distributed Storage Resources: database system, archival storage system, file system, ftp
The HDF5-SRB Architecture HDF5 file HDF Application HDF5 Library HDF5 Object (File, Group, Dataset, Attribute) MCAT HDF5 Object (File, Group, Dataset, Attribute) HDF5-SRB Module (unpackMsg/packMsg) SRB Server HDF5-SRB Module (unpackMsg/packMsg)
The HDF5-SRB Module Client API srbObjRequest(void *obj, int objID) Server API srbObjProcess(void *obj, int objID) 5. H5Object 3. H5Obj::op() 7. unpackMsg() 6. packMsg() HDF5 Library 1. packMsg() 2. unpackMsg() 4. Access file HDF5 file SRB Server
Implementation Requirement • Object fashion • Interactive access • Data information encapsulated in structure • Easy mapping to objects in client application • Simple and efficient • No complicated packMsg()/unpackMsg() • Use one set of objects for both server and client • Minimum data to transfer between client and server • Pack only required data • No redundant member object within an object
HDF5 Objects H5File H5Group H5Dataset Data operations implemented on the server side Client Side Server Side H5Datatspace H5Attribute H5Datatype
H5Group typedef struct H5Object_t { enum { H5GROUP, H5DATASET }t; union { struct H5Group; struct H5dataset }u; } H5Object;
Implementation Challenge • Efficiency of the packMsg/unpackMsg • Datatype conversion • The Client needs to know the datatype from server • The server have to use client datatype to load data • Life cycle of object • When to close object (dataset, group, file) • When to clean memory space • Byte stream to transfer large raw data • How to pack/unpack VL/compound data
Milestone • Module specifications • RFC 02/11/05 • Tech. seminar 02/24/05 • final publication 03/04/05 • Implementation • Compile and install test SRB server 03/18/05 • Client-side module 03/31/05 • Server-side module 04/22/05 • Client application 05/20/05 • Testing and merge source with SDSC07/15/05 • Documentation and release 08/31/05
Further Work • Metadata Ingest • srbObjPut() calls the HDF5 ingest program to put file information into MCAT • Datacutter • searching and filtering HDF5 data • HDF5 Indexing • store HDF5 indexing table into MCAT
Questions/Comments? • [Ruth Aydt] what object can be packed in the new srbObjRequest() and srbObjProcess APIs. What are the objIDs, how they are managed • [Jae Alameda] what kind of message is transferred through SRB: objects or string message • [Mike Folk and other] How to transfer large raw dataset: byte stream or openDAP-g way • [Albert Cheng] how to accomplish complex HDF5 request: number of message vs complex message • [Elena Pourmal] Is the packMsg()/unpackMsg() part of the current SRB or new functions • [Quincey Koziol] When passing objects between client and server, how to ensure to pass fields of the object only need for the operation • [Bob Mcgrath] How to manage the life cycle of object on the server side. When client dies, how to close the object on the server (timeout?) • [Stuart Levy] Synchronization and locking issues. concurrent access to file and operations on file. File cache and physical file location • [Quincey Koziol and other] In general, there were a lot of questions about the message protocol, what parts of the structure are optional, etc.. I would say we need to document the protocol as completely as we can. • [Quincey Koziol] How will datatypes of attributes be handled, how will selection from compound datatype [fields of compound] be done. • [Joe Futrelle] How MCAT handle complex query from HDF5 or other data • [Ruth Aydt] How file access control is handled in HDF5 or SRB • Elena had some idea about precomputing some of the messages. Notclear if this is really viable. • It would be good to add an example that shows the steps of a simple operation,e.g., open dataset.