80 likes | 160 Views
High-Performance Parallel I/O Libraries. (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group (Group Leader, Rob Ross, ANL). Compute node. Compute node. Compute node. Compute node. Applications. Parallel netCDF. MPI-IO.
E N D
High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group (Group Leader, Rob Ross, ANL)
Compute node Compute node Compute node Compute node Applications Parallel netCDF MPI-IO Client-side File System network I/O Server I/O Server I/O Server Parallel NetCDF • NetCDF defines: • A portable file format • A set of APIs for file access • Parallel netCDF • New APIs for parallel access • Maintaining the same file format • Tasks • Built on top of MPI for portability and high performance • Support C and Fortran interfaces
Parallel NetCDF - status • Version 1.0.1 was released on Dec. 7, 2005 • Web page receives 200 page views a day • Supported platforms • Linux Cluster, IBM SP, BG/L, SGI Origin, Cray X, NEC SX • Two sets of parallel APIs • High level APIs (mimicking the serial netCDF APIs) • Flexible APIs (to utilize MPI derived datatype) • Support for large files ( > 2GB files) • Test suites • Self test codes ported from Unidata netCDF package to validate against single-process results • New data analysis APIs • Basic statistical functions • min, max, mean, median, variance, deviation
Illustrative PnetCDF Users • FLASH– astrophysical thermonuclear application from ASCI/Alliances center at university of Chicago • ACTM– atmospheric chemical transport model, LLNL • WRF– Weather Research and Forecast modeling system, NCAR • WRF-ROMS– regional ocean model system I/O module from scientific data technologies group, NCSA • ASPECT– data understanding infrastructure, ORNL • pVTK– parallel visualization toolkit, ORNL • PETSc– portable, extensible toolkit for scientific computation, ANL • PRISM– PRogram for Integrated Earth System Modeling, users from C&C Research Laboratories, NEC Europe Ltd. • ESMF– earth system modeling framework, national center for atmospheric research • CMAQ – Community Multiscale Air Quality code I/O module, SNL • More …
PnetCDF Future Work • Non-blocking I/O • Built on top of non-blocking MPI-IO • Improve data type conversion • Type conversion while packing non-contiguous buffers • Data analysis APIs • Statistical functions • Histogram functions • Range query: regional sum, min, max, mean, … • Data transformation: DFT, FFT • Collaboration with application users
MPI-IO Caching • Client-side file caching • Reduces client-server communication costs • Enables write behind to better utilize network bandwidth • Avoids file system locking overhead by aligning I/O with file block size (or stripe size) • Prototype in ROMIO • Collaborating caching by the group of MPI processes • A complete caching subsystem in MPI library • Data consistency and cache coherence control • Distributed file locking • Memory management for data caching, eviction, and migration • Applicable for both MPI collective and independent I/O • Two implementations • Creating an I/O thread in each MPI process • Using MPI RMA utility
FLASH - I/O Benchmark • The I/O kernel of FLASH application, a block-structured adaptive mesh hydrodynamics code • Each process writes 80 cubes • I/O through HDF5 • Write-only operations • The improvement is due to write behind
Local array is in 4D P2,0 File view P2,0 P2,0 P2,1 P2,2 P2,0 P0,2 P0,0 P0,1 P1,0 P1,1 P1,2 BTIO Benchmark • Block tri-diagonal array partitioning • 40 MPI collective writes followed by 40 collective reads