1 / 13

Non-Blocking Collective MPI I/O Routines

Non-Blocking Collective MPI I/O Routines. Ticket #273. Introduction. I/O is one of the main bottlenecks in HPC applications. Many applications or higher level libraries rely on MPI-I/O for doing parallel I/O.

zeki
Download Presentation

Non-Blocking Collective MPI I/O Routines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Non-Blocking Collective MPI I/O Routines Ticket #273

  2. Introduction • I/O is one of the main bottlenecks in HPC applications. • Many applications or higher level libraries rely on MPI-I/O for doing parallel I/O. • Several optimizations have been introduced in MPI-I/O to meet the needs of application • Non-blocking individual I/O • Different collective I/O algorithms

  3. Motivation • Routines for non-blocking individual I/O operations exist (MPI_File_i(read/write)(_at) • Non-blocking point-to-point (existing) and collective (to be added) communication operations have demonstrated benefits. • Split collective I/O operations have their restrictions and limitations. • What’s keeping us from adding non-blocking collective I/O operations? • Implementation

  4. Usecase (I) • HDF5 operations that modify metadata: • Collective to keep the cache among all processes synchronized. • The Metadata cache uses an LRU eviction scheme. • Items at the bottom of the list are evicted in a collective write call to disk. The amount of data written is usually small (< 1KB). • Non blocking collective I/O would allow us to fire off those writes and go do other stuff avoiding the I/O overhead.

  5. Usecase (II) • HDF5 Raw data operations: • Chunking data in file is a key optimization HDF5 uses for parallel I/O. • If HDF5 can detect a pattern in the way chunks are accessed, we can pre-fetch those chunks from disk. • Asynchronous I/O operations would hide the cost of I/O operations. • Chunk cache for writes (currently disabled for parallel HDF5): • Similar concept to the metadata cache

  6. New Routines • MPI_File_iread_all(MPI_Filefh, void *buf, int count, MPI_Datatype type, MPI_Request*req); • MPI_File_iwrite_all(MPI_Filefh, void *buf, int count, MPI_Datatype type, MPI_Request*req); • MPI_File_iread_at_all(MPI_Filefh, MPI_Offset offset, void *buf, int count, MPI_Datatype type, MPI_Request*req); • MPI_File_iwrite_at_all(MPI_Filefh, MPI_Offset offset, void *buf, int count, MPI_Datatype type, MPI_Request*req); • Ordered read/write (add non blocking or deprecate ordered) • Deprecate split collectives • Straw Vote: 22 - 0 - 0

  7. Challenges • Major difference between collective communication and collective I/O operations: • Each process is allowed to provide different volumes of data to a collective I/O operation, without having knowledge on the data volumes provided by other processes. • Collective I/O algorithms do aggregation.

  8. Implementation • Need non-blocking collective communication • Integrate with the progress engine • Test/Wait on the request like other non-blocking operations • Explicit or Implicit progress? • Different collective I/O algorithms

  9. Implementation • A recent implementation was done within an Open MPI specific I/O library (OMPIO) and uses LibNBC: • leverages the same concept of a schedule for non blocking collective communication operations • work is still at preliminary stages so a large scale evaluation is not available • done at PSTL at the University of Houston (Edgar Gabriel) in collaboration with Torsten • paper accepted at EuroMPI 2011: • Design and Evaluation of Nonblocking Collective I/O Operations

  10. Other MPI I/O Operations • Several MPI I/O functions are considered expensive other than read/write functions: • Open/Close • Sync • Set view • Set/Get size • It would be valuable to have non-blocking versions of some of those functions too.

  11. Usecases • Applications that open a file but don’t touch the file until a certain amount of computation has been done • Cost of opening a file will be hidden • Non-blocking sync would also provide great advantages in case we flush data items to disk before we go do computation. • The intention is to hide the cost (whenever possible) of all the expensive MPI I/O operations.

  12. Proposed Routines • MPI_File_iopen (MPI_Commcomm, char* filename, intamode, MPI_Info info, MPI_File *fh, MPI_Request *req); • MPI_File_iclose (MPI_Filefh, MPI_Request *req); • MPI_File_isync (MPI_File file, MPI_Request *req); • MPI_File_iset_view(MPI_Filefh, MPI_Offsetdisp, MPI_Datatypeetype, MPI_Datatypefiletype, char *datarep, MPI_Info info, MPI_Request *req); • MPI_File_iset_size (MPI_Filefh, MPI_Offset size, MPI_Request *req); • MPI_File_ipreallocate (MPI_Filefh, MPI_Offset size, MPI_Request *req); • MPI_File_iset_info ( MPI_Filefh,MPI_Infoinfo, MPI_Request *req); • Straw Vote: 15 – 1 – [5(need to think), 1(doesn’t care)]

  13. Conclusion • The need for non-blocking collective I/O is fairly high. • Implementation is the non-easy part. • Performance benefits can be substantial. • Users would also benefit from non-blocking versions of some MPI I/O operations that are considered fairly time consuming.

More Related