270 likes | 474 Views
Milestone 5.1: Initial POSIX Function Shipping Demonstration. Jerome Soumagne, Quincey Koziol 09/24/2013. Overview – Mercury. Mercury “Function Shipper”: RPC layer that supports Non-blocking transfers Large data arguments (w/RMA) Native transport protocols of HPC systems
E N D
Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, QuinceyKoziol 09/24/2013
Overview – Mercury • Mercury “Function Shipper”: RPC layer that supports • Non-blocking transfers • Large data arguments (w/RMA) • Native transport protocols of HPC systems • Mercury serves as a basis for higher-level frameworks that need to operate on/store/access data remotely • HDF5 IOD virtual object plugin • IOFSL I/O forwarding scalability layer • Storage systems • Analysis frameworks
Overview – Mercury • Already largely presented in previous milestones • No major modification of Mercury for this deliverable in order to support POSIX calls • But Mercury is still being improved: • Performance tuning on Infinibandcluster • Support for additional network transports is being added (TCP / ibverbs / SSM) • Paper submitted at end of Q4 now accepted and being presented at IEEE Cluster 2013: • J. Soumagne, D. Kimpe, J. Zounmevo, M. Chaarawi, Q.Koziol, A. Afsahi, and R. Ross, “Mercury: Enabling Remote Procedure Call for High-Performance Computing”, IEEE International Conference on Cluster Computing, Sep 2013
Fast Forward Stack – Function Shipping HDF5 API VOL Native (H5) IOD VOL Network VFL Mercury (Client) Mercury (Server) IOD VOL …
POSIX Function Shipping (Example) HDF5 API VOL Native (H5) IOD VOL VFL POSIX I/O Network Mercury (Client) Mercury (Server) sec2 POSIX I/O POSIX I/O File System
Mercury POSIX • Support POSIX I/O routines through Mercury • Completely separate package built on top of Mercury called: Mercury POSIX (lightweight library + server) • Design keys: • Support 32/64 bit platforms and large files • No modification of original source code that uses POSIX I/O (e.g., HDF5 sec2 driver) • Redirects I/O to Mercury server with dynamic linking • Can make use of all the transports available through Mercury (although MPI dynamic connection is not really flexible and always available) • Code for supporting POSIX routine is automatically generated inside Mercury POSIX by using BOOST preprocessor macros
Mercury POSIX – Stub Generation • Most routines are generated with one line macro • Built on top of existing Mercury/Boost macros • However supporting variable arguments routines requires some extra lines to create encoding / decoding routines that check argument flags etc
Mercury POSIX – Stub Generation • Two main macros: /* Non-bulk routines */ MERCURY_POSIX_GEN_STUB(func_name, ret_type, in_types, out_types) /* Bulk routines */ MERCURY_POSIX_GEN_BULK_STUB(func_name, ret_type, in_types, out_types, bulk_read)/* 1/0 if reading/writing bulk data */
Mercury POSIX – Stub Generation • Example, showing results of the following macro: /* off_tlseek(intfildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), )
Mercury POSIX – Stub Generation /* off_tlseek(intfildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) • Generate input structure typedefstruct { hg_int32_t in_param_0; hg_off_t in_param_1; hg_int32_t in_param_2; } lseek_in_t;
Mercury POSIX – Stub Generation /* off_tlseek(intfildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) • Generate proc routine for input structure static __inline__ int hg_proc_lseek_in_t(hg_proc_tproc, void *data) { lseek_in_t *struct_data = (lseek_in_t *) data; hg_proc_hg_int32_t(proc, &struct_data->in_param_0); hg_proc_hg_off_t(proc, &struct_data->in_param_1); hg_proc_hg_int32_t(proc, &struct_data->in_param_2); return ret; }
Mercury POSIX – Stub Generation /* off_tlseek(intfildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) • Generate output structure typedefstruct { hg_off_t ret; } lseek_out_t;
Mercury POSIX – Stub Generation /* off_tlseek(intfildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) • Generate proc routine for output structure static __inline__ int hg_proc_lseek_out_t(hg_proc_tproc, void *data) { lseek_out_t *struct_data = (lseek_out_t *) data; hg_proc_hg_int64_t(proc, &struct_data->ret); return ret; }
Mercury POSIX – Stub Generation /* off_tlseek(intfildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) • Generate client stub (simplified version) hg_off_t lseek(hg_int32_t in_param_0, hg_off_t in_param_1, hg_int32_t in_param_2) { lseek_in_tin_struct; lseek_out_tout_struct; hg_off_t ret; /* Initialization */ ...
Mercury POSIX – Stub Generation /* Register function if not registered */ MERCURY_REGISTER("lseek", lseek_in_t, lseek_out_t); /* Fill input structure */ in_struct.in_param_0 = in_param_0; in_struct.in_param_1 = in_param_1; in_struct.in_param_2 = in_param_2; /* Forward call to remote addr and get a new request */ HG_Forward(addr, id, &in_struct, &out_struct, &request); /* Wait for call to be executed */ HG_Wait(request, HG_MAX_IDLE_TIME, &status); /* Get output parameters */ ret = out_struct.ret; return ret; }
Mercury POSIX – Stub Generation /* off_tlseek(intfildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) • Generate server stub (simplified version) static int lseek_cb(hg_handle_t handle) { lseek_in_tin_struct; lseek_out_tout_struct; hg_int32_t in_param_0; hg_off_t in_param_1; hg_int32_t in_param_2; hg_off_t ret;
Mercury POSIX – Stub Generation /* Get input buffer */ HG_Handler_get_input(handle, &in_struct); /* Get parameters */ in_param_0 = in_struct.in_param_0; in_param_1 = in_struct.in_param_1; in_param_2 = in_struct.in_param_2; /* Call function */ ret = lseek (in_param_0, in_param_1, in_param_2); /* Fill output structure */ out_struct.ret = ret; /* Free handle and send response back */ HG_Handler_start_output(handle, &out_struct); }
Mercury POSIX • Routines currently supported:
Mercury POSIX • Routines not yet supported:
Mercury POSIX - Configuration • Environment variables required: • MERCURY_NA_PLUGIN: Underlying network transport method used to forward calls to remote servers. • e.g., "bmi” • MERCURY_PORT_NAME: Port name information (IP/port) specific to the network transport chosen – used to establish a connection with a remote server. • e.g., "tcp://72.36.68.242:22222” • LD_PRELOAD: Path to Mercury POSIX shared library. • e.g., “/usr/local/lib/libmercury_posix.so” • Setting LD_PRELOAD redirects all POSIX calls to the Mercury server (can be an issue with local scripts, etc. that make use of POSIX I/O)
Mercury POSIX - Testing • Integrated regression tests (limited POSIX test suite) • HDF5 sec2 driver (demo) • Lustre POSIX test suite • However: framework issues, needs to be modified, possibly need to support fdopen and FILE*routines?
Demo – Mercury POSIX and HDF5 tools $ pwd ~jsoumagne/demo $ ls *.h5 ls: *.h5: No such file or directory $ export MERCURY_NA_PLUGIN=“bmi” $ export MERCURY_PORT_NAME=“tcp://127.0.0.1:22222” $ export LD_PRELOAD=/path/to/libmercury_posix.so $ pwd ~jsoumagne/demo_server $ ls coord.h5 $ mercury_posix_serverbmi Waiting for client...
Demo – Mercury POSIX and HDF5 tools $ h5dump -H coord.h5 HDF5 "coord.h5" { GROUP "/" { DATASET "multiple_ends_dset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 5, 3, 4, 2, 3, 6, 2 ) / ( 4, 5, 3, 4, 2, 3, 6, 2 ) } } DATASET "multiple_ends_dset_chunked" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 5, 3, 4, 2, 3, 6, 2 ) / ( 4, 5, 3, 4, 2, 3, 6, 2 ) } } DATASET "single_end_dset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2, 3, 6, 2 ) / ( 2, 3, 6, 2 ) } ... (skip) $ mercury_posix_serverbmi Waiting for client... Thu, 19 Sep 13 17:31:00 CDT: Executing open64 Thu, 19 Sep 13 17:31:00 CDT: Executing __fxstat64 Thu, 19 Sep 13 17:31:00 CDT: Executing lseek64 Thu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:31:00 CDT: Executing lseek64 Thu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:31:00 CDT: Executing getcwd ... (skip)
Demo – Mercury POSIX and HDF5 tools $ h5copy -i coord.h5 -s single_end_dset -o coord_simple.h5 -d simple Thu, 19 Sep 13 17:33:51 CDT: Executing open64 Thu, 19 Sep 13 17:33:51 CDT: Executing open64 ... (skip) Thu, 19 Sep 13 17:33:51 CDT: Executing __fxstat64 Thu, 19 Sep 13 17:33:51 CDT: Executing lseek64 Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:33:51 CDT: Executing lseek64 Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_read ... (skip) Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_write Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_write Thu, 19 Sep 13 17:33:51 CDT: Executing close
Demo – Mercury POSIX and HDF5 tools $ h5dump coord_simple.h5 HDF5 "coord_simple.h5" { GROUP "/" { DATASET "simple" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2, 3, 6, 2 ) / ( 2, 3, 6, 2 ) } DATA { (0,0,0,0): 0, 1, (0,0,1,0): 1, 2, ... (skip) (1,2,2,0): 122, 123, (1,2,3,0): 123, 124, (1,2,4,0): 124, 125, (1,2,5,0): 125, 126 } } } } Thu, 19 Sep 13 17:36:57 CDT: Executing open64 Thu, 19 Sep 13 17:36:57 CDT: Executing __fxstat64 Thu, 19 Sep 13 17:36:57 CDT: Executing lseek64 Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:36:57 CDT: Executing lseek64 Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read ... (skip) Thu, 19 Sep 13 17:36:57 CDT: Executing lseek64 Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:36:57 CDT: Executing close
Conclusion – Future Work • Very easy to forward POSIX I/O calls and does not require modification of existing tools / code • Mercury POSIX can be easily extended to support additional system / library calls • Can directly take advantage of updates to Mercury (network transports, etc.) • Next Quarter: • Support remaining POSIX routines • Test with MPI I/O (ROMIO driver) • Test with Lustre POSIX test suite • If framework issues are solved