Testing Efficiency of Parallel I/O Software

Testing Efficiency of Parallel I/O Software Weikuan Yu, Jeffrey Vetter December 6, 2006

Testing Parallel IO at ORNL • Earlier analysis of scientific codes running at ORNL/LCF • Most users have limited I/O capability in their applications because they have not had access to portable, widespread PIO • Seldom direct use of MPI-IO • Little use of high-level IO middleware: PnetCDF or HDF5 • Large variance in performance of parallel IO using different software stacks (as demonstrated in VH1 experiments) • Ongoing work • Collect application IO access pattern and Lustre server IO traces, with tau, craypat, mpiP, etc • Testing other parallel IO components over Lustre • Analysis, benchmarking and optimization of data intensive scientific codes

Parallel IO Optimization at ORNL • Parallel IO over Lustre • A new file system still relies on a generic ADIO implementation • Generations of platforms at ORNL demands efficient parallel IO • Performance with Jaguar • Good read/write bandwidth for large shared single file • Not scalable for small read/write and parallel IO management operations (metadata) • Approaches for Optimizations • Providing a specific, ADIO implementation well-tuned for Lustre • Investigating parameters for adjusting striping pattern • Exploited Lustre file joining • Regular files can be joined in place • Split writing and hierarchical striping • Developed a prototype on an 80-node Linux cluster • Paper submitted to CCGrid 2006, available if interested

Some Characteristics of Lustre IO Performance • Performance can be significantly affected by stripe width • Need to introduce flexibilities in striping pattern • Exploit file joining for growing stripe width with increasing file size

Explore Lustre File Joining • Split writing: • Create/write a shared file as multiple small files, aka subfiles • Temporary structure to hold file attributes • Subfiles joined at the closing time open File Attributes read/write Subfiles close Joined file Diagram of Split Writing

Hierarchical Striping • Hierarchical striping • Create another level of striping pattern for subfiles • Allow maximum coverage of Lustre storage targets • Mitigate the impact of striping overhead Diagram of Hierarchical Striping (HS) (HS width: N+1; HS size: S*w) subfile 0 subfile 1 subfile n ost0 ost1 ost2 ost3 ost2n ost2n+1 S S+1 0 1 nS nS+1 S+2 S+3 2 3 nS+2 nS+3 S-2 S-1 2S-2 2S-1 nS-2 nS-1 (Stripe width: 2; Stripe size: w)

Evaluation Table 1: Scalability of Management Operations Table 2: Performance of Collective Read/Write • Write/Read performance improved dramatically for new files • Read/Write of an existing join file is not well performing due to a non-optimized IO path for a join file in Lustre • Scalability of file open and file resize improved dramatically

Results on Scientific Benchmarks –MPI-Tile-IO and BT/IO • IO Pattern as represented by BT-IO can be improved if the number of iterations is small. It may help if an arbitrary number of files can be joined. • Write Performance in MPI-Tile-IO can be improved dramatically • Read performance in MPI-Tile-IO cannot be improved by file joining because reading an existing join file does not perform well

Conclusions • Parallel IO over Lustre • Split writing can improve metadata management operations • Stripe overhead can be mitigated with careful augmentations of stripe width • Lustre file joining • Race conditions when joining files multiple processes • Low read/write performance on an existing file • Not possible for arbitrary hierarchical striping because limited number of files can be joined • Need improvement before its production usage parallel IO • Next Steps • Continue optimization of parallel IO at ORNL, • Adapting the earlier techniques to liblustre on XT3/XT4 • Develop/Exploit other features, group locks and dynamic stripe width • Adapting parallel I/O and parallel FS to wide area collaborative science with other IO protocols such as pNFS and Logistical Networking

Testing Efficiency of Parallel I/O Software