80 likes | 505 Views
ZioLib, Parallel I/O Library. Woo-Sun Yang and Chris Ding Computational Research Division Lawrence Berkeley National Laboratory. Parallel netCDF write (256 256 256). Parallel netCDF read (256 256 256). Height (Z). Latitude (Y). Longitude (X).
E N D
ZioLib, Parallel I/O Library Woo-Sun Yang and Chris Ding Computational Research Division Lawrence Berkeley National Laboratory
Height (Z) Latitude (Y) Longitude (X) ZioLib uses I/O staging processors for Z-decomposition Distributed array In (X,Z,Y) index order Remapped at I/O staging PEs In (X,Y,Z) index order I/O staging PEs write global field in parallel • Relieves memory limitations of a PE • Relieves congestion on I/O nodes • Writes/reads in large blocks (no seeks) in parallel • Eliminates gather/scatter from user codes
Current status of ZioLib • A set of Fortran 90 modules supporting • netCDF I/O (serial and parallel) • direct-access unformatted I/O (serial and parallel) • sequential-access unformatted I/O (serial) • Works for arrays of any number of dimensions of integer*4, real*4 and real*8 • Reads or writes in any array index order • Works with any parallel decomposition • Can handle ghost nodes • Uses MPI-1 routines only – can still work for serial I/O on machines without a parallel file system, a parallel netCDF library or MPI-2
Direct-access write (256256256; XZY to XYZ) transpose global array total remap
Direct-access write (256256256; XZY to XYZ)Speed-up w.r.t. existing MPI + single-PE I/O
More on testing • Direct-access I/O with T42L26 resolution (1286426: 1.625 MB) • Write: speed up by 3-4 • Read: speed up by 6-7 • CAM2.0 history I/O with 8, 16 and 32 processors • with EUL (T42L26, Y-decomposition) and FV (B26, 2D-decomposition), load balancing chunking turned off • used the serial netCDF with one staging processor speed-up by 1.5-2.5 (with serial netCDF only)