1 / 40

CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:30 p.m. MPI File I/O

Prof. Chris Carothers Computer Science Department MRC 309a chrisc@cs.rpi.edu www.cs.rpi.edu/~chrisc/COURSES/PARALLEL/SPRING-2013 Adapted from: people.cs.uchicago.edu/~asiegel/courses/cspp51085/.../mpi-io.ppt.

hyatt-black
Download Presentation

CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:30 p.m. MPI File I/O

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PPC 2013 - MPI Parallel File I/O Prof. Chris Carothers Computer Science Department MRC 309a chrisc@cs.rpi.edu www.cs.rpi.edu/~chrisc/COURSES/PARALLEL/SPRING-2013 Adapted from: people.cs.uchicago.edu/~asiegel/courses/cspp51085/.../mpi-io.ppt CSCI-4320/6360: Parallel Programming & ComputingTues./Fri. 12-1:30 p.m.MPI File I/O

  2. PPC 2013 - MPI Parallel File I/O Common Ways of Doing I/O in Parallel Programs • Sequential I/O: • All processes send data to rank 0, and 0 writes it to the file

  3. PPC 2013 - MPI Parallel File I/O Pros and Cons of Sequential I/O • Pros: • parallel machine may support I/O from only one process (e.g., no common file system) • Some I/O libraries (e.g. HDF-4, NetCDF, PMPIO) not parallel • resulting single file is handy for ftp, mv • big blocks improve performance • short distance from original, serial code • Cons: • lack of parallelism limits scalability, performance (single node bottleneck)

  4. PPC 2013 - MPI Parallel File I/O Another Way • Each process writes to a separate file • Pros: • parallelism, high performance • Cons: • lots of small files to manage • LOTS OF METADATA – stress parallel filesystem • difficult to read back data from different number of processes

  5. PPC 2013 - MPI Parallel File I/O What is Parallel I/O? • Multiple processes of a parallel program accessing data (reading or writing) from a common file FILE P(n-1) P0 P1 P2

  6. PPC 2013 - MPI Parallel File I/O Why Parallel I/O? • Non-parallel I/O is simple but • Poor performance (single process writes to one file) or • Awkward and not interoperable with other tools (each process writes a separate file) • Parallel I/O • Provides high performance • Can provide a single file that can be used with other tools (such as visualization programs)

  7. PPC 2013 - MPI Parallel File I/O Why is MPI a Good Setting for Parallel I/O? • Writing is like sending a message and reading is like receiving. • Any parallel I/O system will need a mechanism to • define collective operations (MPI communicators) • define noncontiguous data layout in memory and file (MPI datatypes) • Test completion of nonblocking operations (MPI request objects) • i.e., lots of MPI-like machinery

  8. PPC 2013 - MPI Parallel File I/O MPI-IO Background • Marc Snir et al (IBM Watson) paper exploring MPI as context for parallel I/O (1994) • MPI-IO email discussion group led by J.-P. Prost (IBM) and Bill Nitzberg (NASA), 1994 • MPI-IO group joins MPI Forum in June 1996 • MPI-2 standard released in July 1997 • MPI-IO is Chapter 9 of MPI-2

  9. PPC 2013 - MPI Parallel File I/O FILE P(n-1) P0 P1 P2 Using MPI for Simple I/O Each process needs to read a chunk of data from a common file

  10. PPC 2013 - MPI Parallel File I/O Using Individual File Pointers #include<stdio.h> #include<stdlib.h> #include "mpi.h" #define FILESIZE 1000 int main(int argc, char **argv){ int rank, nprocs; MPI_File fh; MPI_Status status; int bufsize, nints; int buf[FILESIZE]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); MPI_File_seek(fh, rank * bufsize, MPI_SEEK_SET); MPI_File_read(fh, buf, nints, MPI_INT, &status); MPI_File_close(&fh); }

  11. PPC 2013 - MPI Parallel File I/O Using Explicit Offsets #include<stdio.h> #include<stdlib.h> #include "mpi.h" #define FILESIZE 1000 int main(int argc, char **argv){ int rank, nprocs; MPI_File fh; MPI_Status status; int bufsize, nints; int buf[FILESIZE]; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh); MPI_File_read_at(fh, rank*bufsize, buf, nints, MPI_INT, &status); MPI_File_close(&fh); }

  12. PPC 2013 - MPI Parallel File I/O Function Details MPI_File_open(MPI_Comm comm, char *file, int mode, MPI_Info info, MPI_File *fh) (note: mode = MPI_MODE_RDONLY, MPI_MODE_RDWR, MPI_MODE_WRONLY, MPI_MODE_CREATE, MPI_MODE_EXCL, MPI_MODE_DELETE_ON_CLOSE, MPI_MODE_UNIQUE_OPEN, MPI_MODE_SEQUENTIAL, MPI_MODE_APPEND) MPI_File_close(MPI_File *fh) MPI_File_read(MPI_File fh, void *buf, int count, MPI_Datatype type, MPI_Status *status) MPI_File_read_at(MPI_File fh, int offset, void *buf, int count, MPI_Datatype type, MPI_Status *status) MPI_File_seek(MPI_File fh, MPI_Offset offset, in whence); (note: whence = MPI_SEEK_SET, MPI_SEEK_CUR, or MPI_SEEK_END) MPI_File_write(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status) MPI_File_write_at( …same as read_at … ); (Note: Many other functions to get/set properties (see Gropp et al))

  13. PPC 2013 - MPI Parallel File I/O Writing to a File • Use MPI_File_write or MPI_File_write_at • Use MPI_MODE_WRONLY or MPI_MODE_RDWR as the flags to MPI_File_open • If the file doesn’t exist previously, the flag MPI_MODE_CREATE must also be passed to MPI_File_open • We can pass multiple flags by using bitwise-or ‘|’ in C, or addition ‘+” in Fortran

  14. PPC 2013 - MPI Parallel File I/O MPI Datatype Interlude • Datatypes in MPI • Elementary: MPI_INT, MPI_DOUBLE, etc • everything we’ve used to this point • Contiguous • Next easiest: sequences of elementary types • Vector • Sequences separated by a constant “stride”

  15. PPC 2013 - MPI Parallel File I/O MPI Datatypes, cont • Indexed: more general • does not assume a constant stride • Struct • General mixed types (like C structs)

  16. PPC 2013 - MPI Parallel File I/O Creating simple datatypes • Let’s just look at the simplest types: contiguous and vector datatypes. • Contiguous example • Let’s create a new datatype which is two ints side by side. The calling sequence is MPI_Type_contiguous(int count, MPI_Datatype oldtype, MPI_Datatype *newtype); MPI_Datatype newtype; MPI_Type_contiguous(2, MPI_INT, &newtype); MPI_Type_commit(newtype); /* required */

  17. PPC 2013 - MPI Parallel File I/O Using File Views • Processes write to shared file • MPI_File_set_view assigns regions of the file to separate processes

  18. PPC 2013 - MPI Parallel File I/O File Views • Specified by a triplet (displacement, etype, and filetype) passed to MPI_File_set_view • displacement = number of bytes to be skipped from the start of the file • etype = basic unit of data access (can be any basic or derived datatype) • filetype = specifies which portion of the file is visible to the process • This is a collective operation and so all processors/ranks must use the same data rep, etypes in the group determined when the file was open..

  19. PPC 2013 - MPI Parallel File I/O File Interoperability • Users can optionally create files with a portable binary data representation • “datarep” parameter to MPI_File_set_view • native -default, same as in memory, not portable • internal - impl. defined representation providing an impl. defined level of portability • external32 - a specific representation defined in MPI, (basically 32-bit big-endian IEEE format), portable across machines and MPI implementations

  20. PPC 2013 - MPI Parallel File I/O File View Example MPI_File thefile; for (i=0; i<BUFSIZE; i++) buf[i] = myrank * BUFSIZE + i; MPI_File_open(MPI_COMM_WORLD, "testfile", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &thefile); MPI_File_set_view(thefile, myrank * BUFSIZE, MPI_INT, MPI_INT, "native", MPI_INFO_NULL); MPI_File_write(thefile, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE); MPI_File_close(&thefile);

  21. PPC 2013 - MPI Parallel File I/O Ways to Write to a Shared File like Unix seek • MPI_File_seek • MPI_File_read_at • MPI_File_write_at • MPI_File_read_shared • MPI_File_write_shared • Collective operations combine seek and I/O for thread safety use shared file pointer good when order doesn’t matter

  22. PPC 2013 - MPI Parallel File I/O Collective I/O in MPI • A critical optimization in parallel I/O • Allows communication of “big picture” to file system • Framework for 2-phase I/O, in which communication precedes I/O (can use MPI machinery) • Basic idea: build large blocks, so that reads/writes in I/O system will be large Small individual requests Large collective access

  23. PPC 2013 - MPI Parallel File I/O Collective I/O • MPI_File_read_all, MPI_File_read_at_all, etc • _all indicates that all processes in the group specified by the communicator passed to MPI_File_open will call this function • Each process specifies only its own access information -- the argument list is the same as for the non-collective functions

  24. PPC 2013 - MPI Parallel File I/O Collective I/O • By calling the collective I/O functions, the user allows an implementation to optimize the request based on the combined request of all processes • The implementation can merge the requests of different processes and service the merged request efficiently • Particularly effective when the accesses of different processes are noncontiguous and interleaved

  25. PPC 2013 - MPI Parallel File I/O Collective non-contiguousMPI-IO examples #define “mpi.h” #define FILESIZE 1048576 #define INTS_PER_BLK 16 int main(int argc, char **argv){ int *buf, rank, nprocs, nints, bufsize; MPI_File fh; MPI_Datatype filetype; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); bufsize = FILESIZE/nprocs; buf = (int *) malloc(bufsize); nints = bufsize/sizeof(int); MPI_File_open(MPI_COMM_WORLD, “filename”, MPI_MODE_RD_ONLY, MPI_INFO_NULL, &fh); MPI_Type_vector(nints/INTS_PER_BLK, INTS_PER_BLK, INTS_PER_BLK*nprocs, MPI_INT, &filetype); MPI_Type_commit(&filetype); MPI_File_set_view(fh, INTS_PER_BLK*sizeof(int)*rank, MPI_INT, filetype, “native”, MPI_INFO_NULL); MPI_File_read_all(fh, buf, nints, MPI_INT, MPI_STATUS_IGNORE); MPI_Type_free(&filetype); free(buf) MPI_Finalize(); return(0); }

  26. PPC 2013 - MPI Parallel File I/O More on MPI_Read_all • Note that the _all version has the same argument list • Difference is that all processes involved in MPI_Open must call this the read • Contrast with the non-all version where any subset may or may not call it • Allows for many optimizations

  27. PPC 2013 - MPI Parallel File I/O Split Collective I/O • A restricted form of nonblocking collective I/O • Only one active nonblocking collective operation allowed at a time on a file handle • Therefore, no request object necessary MPI_File_write_all_begin(fh, buf, count, datatype); // available on Blue Gene/L, but may not improve // performance for (i=0; i<1000; i++) { /* perform computation */ } MPI_File_write_all_end(fh, buf, &status);

  28. PPC 2013 - MPI Parallel File I/O Passing Hints to the Implementation MPI_Info info; MPI_Info_create(&info); /* no. of I/O devices to be used for file striping */ MPI_Info_set(info, "striping_factor", "4"); /* the striping unit in bytes */ MPI_Info_set(info, "striping_unit", "65536"); MPI_File_open(MPI_COMM_WORLD, "/pfs/datafile", MPI_MODE_CREATE | MPI_MODE_RDWR, info, &fh); MPI_Info_free(&info);

  29. PPC 2013 - MPI Parallel File I/O Examples of Hints (used in ROMIO) • striping_unit • striping_factor • cb_buffer_size • cb_nodes • ind_rd_buffer_size • ind_wr_buffer_size • start_iodevice • pfs_svr_buf • direct_read • direct_write MPI-2 predefined hints New Algorithm Parameters Platform-specific hints

  30. PPC 2013 - MPI Parallel File I/O I/O Consistency Semantics • The consistency semantics specify the results when multiple processes access a common file and one or more processes write to the file • MPI guarantees stronger consistency semantics if the communicator used to open the file accurately specifies all the processes that are accessing the file, and weaker semantics if not • The user can take steps to ensure consistency when MPI does not automatically do so

  31. PPC 2013 - MPI Parallel File I/O Process 0 Process 1 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_read_at(off=0,cnt=100) MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_File_read_at(off=100,cnt=100) Example 1 • File opened with MPI_COMM_WORLD. Each process writes to a separate region of the file and reads back only what it wrote. • MPI guarantees that the data will be read correctly

  32. PPC 2013 - MPI Parallel File I/O Example 2 • Same as example 1, except that each process wants to read what the other process wrote (overlapping accesses) • In this case, MPI does not guarantee that the data will automatically be read correctly Process 0 Process 1 /* incorrect program */ MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_Barrier MPI_File_read_at(off=100,cnt=100) /* incorrect program */ MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_Barrier MPI_File_read_at(off=0,cnt=100) • In the above program, the read on each process is not guaranteed to get the data written by the other process!

  33. PPC 2013 - MPI Parallel File I/O Example 2 contd. • The user must take extra steps to ensure correctness • There are three choices: • set atomicity to true • close the file and reopen it • ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process/MPI rank • Can hurt performance….

  34. PPC 2013 - MPI Parallel File I/O Process 0 Process 1 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_set_atomicity(fh1,1) MPI_File_write_at(off=0,cnt=100) MPI_Barrier MPI_File_read_at(off=100,cnt=100) MPI_File_open(MPI_COMM_WORLD,…) MPI_File_set_atomicity(fh2,1) MPI_File_write_at(off=100,cnt=100) MPI_Barrier MPI_File_read_at(off=0,cnt=100) Example 2, Option 1Set atomicity to true

  35. PPC 2013 - MPI Parallel File I/O Example 2, Option 2Close and reopen file Process 0 Process 1 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_close MPI_Barrier MPI_File_open(MPI_COMM_WORLD,…) MPI_File_read_at(off=100,cnt=100) MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=100,cnt=100) MPI_File_close MPI_Barrier MPI_File_open(MPI_COMM_WORLD,…) MPI_File_read_at(off=0,cnt=100)

  36. PPC 2013 - MPI Parallel File I/O Example 2, Option 3 • Ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process • a sequence is a set of operations between any pair of open, close, or file_sync functions • a write sequence is a sequence in which any of the functions is a write operation

  37. PPC 2013 - MPI Parallel File I/O Process 0 Process 1 MPI_File_open(MPI_COMM_WORLD,…) MPI_File_write_at(off=0,cnt=100) MPI_File_sync MPI_Barrier MPI_File_sync /*collective*/ MPI_File_sync /*collective*/ MPI_Barrier MPI_File_sync MPI_File_read_at(off=100,cnt=100) MPI_File_close MPI_File_open(MPI_COMM_WORLD,…) MPI_File_sync /*collective*/ MPI_Barrier MPI_File_sync MPI_File_write_at(off=100,cnt=100) MPI_File_sync MPI_Barrier MPI_File_sync /*collective*/ MPI_File_read_at(off=0,cnt=100) MPI_File_close Example 2, Option 3

  38. PPC 2013 - MPI Parallel File I/O General Guidelines for Achieving High I/O Performance • Buy sufficient I/O hardware for the machine • Use fast file systems, not NFS-mounted home directories • Do not perform I/O from one process only • Make large requests wherever possible • For noncontiguous requests, use derived datatypes and a single collective I/O call

  39. PPC 2013 - MPI Parallel File I/O Optimizations • Given complete access information, an implementation can perform optimizations such as: • Data Sieving: Read large chunks and extract what is really needed • Collective I/O: Merge requests of different processes into larger requests • Improved prefetching and caching

  40. PPC 2013 - MPI Parallel File I/O Summary • MPI-IO has many features that can help users achieve high performance • The most important of these features are the ability to specify noncontiguous accesses, the collective I/O functions, and the ability to pass hints to the implementation • Users must use the above features! • In particular, when accesses are noncontiguous, users must create derived datatypes, define file views, and use the collective I/O functions

More Related