1 / 9

The Parallel and Grid I/O Perspective

The Parallel and Grid I/O Perspective. MPI, MPI-IO, NetCDF, and HDF5 are in common use Multi TB datasets also common Testbeds needed for software at scale. Topics for Discussion. NetCDF Other applications (leverage) Read (parallel analysis tools) PVFS opportunities TB in your office

tekla
Download Presentation

The Parallel and Grid I/O Perspective

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Parallel and Grid I/O Perspective • MPI, MPI-IO, NetCDF, and HDF5 are in common use • Multi TB datasets also common • Testbeds needed for software at scale

  2. Topics for Discussion • NetCDF • Other applications (leverage) • Read (parallel analysis tools) • PVFS opportunities • TB in your office • Scalability Testbed • Where do you test at scale? • Application Log files • Real app log files at > 1GB • Other • Can we quantify apps needs?

  3. PVFS Peak Write Performance • Using compute nodes for storage in these tests • Peak at around 25-30 Mbytes/sec per I/O server • Clients cannot maintain this to disk

  4. Processes Logfile Jumpshot Display Performance Visualizationwith Jumpshot • For detailed analysis of parallel program behavior, timestamped events are collected into a log file during the run. • A separate display program (Jumpshot) aids the user in conducting a post mortem analysis of program behavior. • Log files can become large (>1GB), making it impossible to inspect the entire program at once. • The FLASH Project motivated an indexed file format (SLOG) that uses a preview to select a time of interest and quickly display an interval. • We collaborated with IBM and LLNL to collect SLOG files directly from AIX trace records and display traces from multithreaded programs.

  5. Chiba City Scalability Testbed http://www.mcs.anl.gov/chiba/

  6. Notes • FAQ on Parallel I/O • Include performance graphs, tutorial links • Interaction with P2 (Data Mining and Access Pattern Discovery) • Parallel NetCDF (P2 as an application group) • Managing datasets of NetCDF files • Collect log files of application I/O • Explore use of WAN FTP for Grid I/O • Remote I/O through MPI-IO interface • PVFS Clusters for TB dataset experimentation • Close with John Drake on parallel NetCDF for Climate

  7. SC02 Demo • Use Parallel NetCDF over MPI-IO over PVFS to access dataset • Extract time series from collection of files • Parallel reads as well as writes • New feature: handle dynamically changing datasets • Observe progress of running application • Perform data analysis and visualization • Contrast with nonparallel approach • Prototype on Chiba scalability testbed at ANL • Bonus: collect log files of I/O behavior and show analysis and visualizations of log files

  8. Demo Steps • Select variable from collection of files, write a new NetCDF file • Illustrates fast I/O • (address open performance for collections of files) • Perform PCA • Illustrates algorithmically efficient methods • Visualize at each time step

  9. Vision for the Future • Databases and parallel I/O integration • Data representations for standard file formats that provide better performance for typical access patterns (post NetCDF/HDF) • Transparent parallel I/O to/from everywhere (grid transparent, file system hierarchy transparent)

More Related