1 / 21

Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab. Motivation. NERSC uses GPFS for $HOME and $SCRATCH Local disk filesystems on seaborg (/tmp) are tiny Growing data sizes and concurrencies often outpace I/O methodologies. Seaborg.nersc.gov.

Download Presentation

Scaling Up Parallel I/O on the SP David Skinner, NERSC Division, Berkeley Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling Up Parallel I/O on the SPDavid Skinner, NERSC Division, Berkeley Lab

  2. Motivation • NERSC uses GPFS for $HOME and $SCRATCH • Local disk filesystems on seaborg (/tmp) are tiny • Growing data sizes and concurrencies often outpace I/O methodologies

  3. Seaborg.nersc.gov

  4. Case Study: Data Intensive Computing at NERSC • Binary black hole collisions • Finite differencing on a 1024x768x768x200 grid • Run on 64 NH2 nodes with 32GB RAM (2 TB total) • Need to save regular snapshots of full grid The first full 3D calculation of inward spiraling black holes done at NERSC by Ed Seidel, Gabrielle Allen, Denis Pollney, and Peter Diener Scientific American April 2002

  5. Problems • The binary black hole collision uses a modified version of the Cactus code to solve Einstein’s equations. It’s choices for I/O are serial and MPI-I/O • CPU utilization suffers as time is lost to I/O • Variation in write times can be severe

  6. Finding solutions • Data pattern is a common one • Survey strategies to determine the rate and variation in rate

  7. Parallel I/O Strategies

  8. Multiple File I/O if(private_dir) rank_dir(1,rank); fp=fopen(fname_r,"w"); fwrite(data,nbyte,1,fp); fclose(fp); if(private_dir) rank_dir(0,rank); MPI_Barrier(MPI_COMM_WORLD);

  9. Single File I/O fd=open(fname,O_CREAT|O_RDWR, S_IRUSR); lseek(fd,(off_t)(rank*nbyte)-1,SEEK_SET); write(fd,data,1); close(fd);

  10. MPI-I/O MPI_Info_set(mpiio_file_hints, MPIIO_FILE_HINT0); MPI_File_open(MPI_COMM_WORLD, fname, MPI_MODE_CREATE | MPI_MODE_RDWR, mpiio_file_hints, &fh); MPI_File_set_view(fh, (off_t)rank*(off_t)nbyte, MPI_DOUBLE, MPI_DOUBLE, "native", mpiio_file_hints); MPI_File_write_all(fh, data, ndata, MPI_DOUBLE, &status); MPI_File_close(&fh);

  11. Results

  12. Scaling of single file I/O

  13. Scaling of multiple file and MPI I/O

  14. Large block I/O • MPI I/O on the SP includes the file hint IBM_largeblock_io • IBM_largeblock_io=true used throughout, default values show large variation • IBM_largeblock_io=true also turns off data shipping

  15. Large block I/O = false • MPI on the SP includes the file hint IBM_largeblock_io • Except above IBM_largeblock_io=true used throughout • IBM_largeblock_io=true also turns off data shipping

  16. Bottlenecks to scaling • Single file I/O has a tendency to serialize • Scaling up with multiple files create filesystem problems • Akin to data shipping consider the intermediate case

  17. Parallel IO with SMP aggregation (32 tasks)

  18. Parallel IO with SMP aggregation (512 tasks)

  19. Summary

  20. Future Work • Testing NERSC port of NetCDF to MPI-I/O • Comparison with Linux/Intel GPFS NERSC/LBL Alvarez Cluster 84 2way SMP Pentium Nodes Myrinet 2000 Fiber Optic Interconnect • Testing GUPFS technologies as they become available

More Related