1 / 26

Improving Parallel I/O Performance with Data Layout Awareness

Improving Parallel I/O Performance with Data Layout Awareness. Yong Chen, Oak Ridge National Laboratory Xian-He Sun, Illinois Institute of Technology Rajeev Thakur , Argonne National Laboratory Huaiming Song, Illinois Institute of Technology Hui Jin, Illinois Institute of Technology

bevan
Download Presentation

Improving Parallel I/O Performance with Data Layout Awareness

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Parallel I/O Performance with Data Layout Awareness Yong Chen, Oak Ridge National Laboratory Xian-He Sun, Illinois Institute of Technology Rajeev Thakur, Argonne National Laboratory Huaiming Song, Illinois Institute of Technology Hui Jin, Illinois Institute of Technology Presented by: Robert Latham, Argonne National Laboratory Cluster-2010

  2. I/O Bottleneck in High-Performance Computing • Significant gap between computing and I/O • Long I/O latency leads perf. degradation • Applications tend to be data intensive • Limited I/O attributed as the cause of low sustained performance Cluster-2010

  3. I/O for Computational Science

  4. Limitation of Current Parallel I/O System • Historically, parallel file system and parallel I/O middleware were designed and developed separately • There’s information gap between parallel I/O sub-systems • Parallel file system decides data layout on storage • Parallel I/O middleware optimizes, groups and rearranges accesses • The separation and information gap lose the potential optimization opportunity that benefits overall parallel I/O performance • For instance, collective I/O relies on the logical layout of file accesses, whereas the physical layout determines access latency & concurrency • Current parallel I/O does not explore layout awareness well Cluster-2010

  5. Data Layout and Data Accesses • Data layout mechanism decides how data are distributed among multiple file servers • A crucial factor determining the data access latency and the I/O subsystem performance • Significance and performance improvement demonstrated by arranging data properly • Log-like reordering • Parallel Log-structured File System (PLFS) • Optimize I/O access with aware of data layout is a necessity • A challenging and tedious task for users • Manual rearrangement is limited • Not scalable for petascale/exascale systems Cluster-2010

  6. Layout-aware Parallel I/O Strategy • A Parallel I/O Strategy with Data Layout Awareness • Consider physical data layout and data locality in parallel I/O strategy • Foster better integration of parallel file system and middleware • Achieve a better matched I/O • Contributions of this research • Demonstrate data layout has a clear impact on I/O performance • Propose layout-aware independent I/O and collective I/O strategy considering physical data layout and data locality • Verify with prototyping system that layout-aware parallel I/O strategy achieves performance improvement over existing systems Cluster-2010

  7. Independent I/O Cluster-2010

  8. Layout-aware Independent I/O Exploit data layout to reduce contention and improve data locality Cluster-2010

  9. Layout-aware Independent I/O • Considers the layout and improves access locality • Reveals data layout via file system calls and cached at middleware • Avoids the interruption caused by the contention from processes • Reduces the performance loss due to the contention • Decouples the network communication and I/O operations • Avoids I/O serialization to file servers • Reduces imbalanced response time for different processes • The total execution time can be improved even though the response time for individual processes are not well balanced • Note that the total response time, or the time-to-solution, is what users care for a parallel application Cluster-2010

  10. Collective I/O and Two-phase Implementation Cluster-2010

  11. Layout-aware Collective I/O Cluster-2010

  12. Layout-aware Collective I/O Cluster-2010

  13. Layout-aware Collective I/O • Conventional collective I/O: combine noncontiguous accesses and split in a logically contiguous way • Layout-aware collective I/O: combine noncontiguous accesses and split in a logically noncontiguous way but with better physical locality and reduced data access contention • Layout-aware collective I/O can be beneficial • Still performs collective I/O – the overlapping and redundant requests are removed • The number of requests to the parallel file system is controlled by taking advantage of noncontiguous parallel file system calls • Access rearranging and reordering exploit better locality and reduce access contention Cluster-2010

  14. Experimental Setup • Experimental environment • 65-node Sun Fire Linux-based cluster • Sun Fire X4240 head node • 12x500GB 7.2K-RPM SATA-II drives configured as a RAID-5 • Sun Fire X2200 compute nodes • 250GB 7.2K-RPM SATA hard drive • MPICH2-1.0.5p3 release • PVFS2 2.8.1 • Benchmarks • Synthetic user-level checkpointing application • IOR benchmark Cluster-2010

  15. Layout-aware Independent I/O on PVFS2 Sustained bandwidth decreased (execution time increased) when the number of processes increased even though total image size remained same due to contention Bandwidth improved by 8.36% and 45.7% on average respectively Achieved stable performance under various cases Cluster-2010

  16. Layout-aware Collective I/O Performance Left: IOR Random Reads Testing Up to 74% speedup On average, 40% speedup Right: IOR Random Writes Testing Up to 38% speedup On average, 23% speedup Cluster-2010

  17. Layout-aware Collective I/O Performance Left: IOR Interleaved Reads Testing Up to 112% speedup On average, 28% speedup Right: IOR Interleaved Writes Testing Up to 45% speedup On average, 16% speedup Cluster-2010

  18. Conclusion Poor I/O performance has been a bottleneck in HPC Parallel I/O middleware and parallel file systems are critical Little has been done to exploit a layout aware optimization and to foster a better integration of these two subsystems We propose a new layout-aware parallel I/O strategy and exploit this strategy for both independent I/O and collective I/O Preliminary results have demonstrated the potential More research needed for next-generation I/O architectures to support layout awareness, access awareness and intelligence Cluster-2010

  19. Ongoing and Future Work Application-specific customized data layout strategy Adapt to a proper data layout depending on specific access pattern Continue investigation on layout and access awareness optimizations Cluster-2010

  20. Any Questions? Thank you. Welcome to visit: http://ft.ornl.gov http://www.cs.iit.edu/~scs This research was sponsored in part by Cluster-2010

  21. Improving Parallel I/O Performance with Data Layout Awareness Yong Chen, Oak Ridge National Laboratory Xian-He Sun, Illinois Institute of Technology Rajeev Thakur, Argonne National Laboratory Huaiming Song, Illinois Institute of Technology Hui Jin, Illinois Institute of Technology Presented by: Robert Latham, Argonne National Laboratory Cluster-2010

  22. Backup Slides Cluster-2010

  23. Performance Gap Between Computing and I/O Cluster-2010

  24. Data Layout Matters Cluster-2010

  25. Independent I/O An Ideal Case Cluster-2010

  26. Dynamic Application-specific I/O Optimization Architecture Cluster-2010

More Related