1 / 8

ZioLib, Parallel I/O Library

ZioLib, Parallel I/O Library. Woo-Sun Yang and Chris Ding Computational Research Division Lawrence Berkeley National Laboratory. Parallel netCDF write (256  256  256). Parallel netCDF read (256  256  256). Height (Z). Latitude (Y). Longitude (X).

alban
Download Presentation

ZioLib, Parallel I/O Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ZioLib, Parallel I/O Library Woo-Sun Yang and Chris Ding Computational Research Division Lawrence Berkeley National Laboratory

  2. Parallel netCDF write (256256256)

  3. Parallel netCDF read (256256256)

  4. Height (Z) Latitude (Y) Longitude (X) ZioLib uses I/O staging processors for Z-decomposition Distributed array In (X,Z,Y) index order Remapped at I/O staging PEs In (X,Y,Z) index order I/O staging PEs write global field in parallel • Relieves memory limitations of a PE • Relieves congestion on I/O nodes • Writes/reads in large blocks (no seeks) in parallel • Eliminates gather/scatter from user codes

  5. Current status of ZioLib • A set of Fortran 90 modules supporting • netCDF I/O (serial and parallel) • direct-access unformatted I/O (serial and parallel) • sequential-access unformatted I/O (serial) • Works for arrays of any number of dimensions of integer*4, real*4 and real*8 • Reads or writes in any array index order • Works with any parallel decomposition • Can handle ghost nodes • Uses MPI-1 routines only – can still work for serial I/O on machines without a parallel file system, a parallel netCDF library or MPI-2

  6. Direct-access write (256256256; XZY to XYZ) transpose global array total remap

  7. Direct-access write (256256256; XZY to XYZ)Speed-up w.r.t. existing MPI + single-PE I/O

  8. More on testing • Direct-access I/O with T42L26 resolution (1286426: 1.625 MB) • Write: speed up by 3-4 • Read: speed up by 6-7 • CAM2.0 history I/O with 8, 16 and 32 processors • with EUL (T42L26, Y-decomposition) and FV (B26, 2D-decomposition), load balancing chunking turned off • used the serial netCDF with one staging processor speed-up by 1.5-2.5 (with serial netCDF only)

More Related