140 likes | 274 Views
Connecting HPIO Capabilities with Domain Specific Needs. Rob Ross MCS Division Argonne National Laboratory rross@mcs.anl.gov. …. Application. I/O System Software. I/O System Software. …. Storage Hardware. I/O in a HPC system. Many cooperating tasks sharing I/O resources
E N D
Connecting HPIO Capabilities with Domain Specific Needs Rob Ross MCS Division Argonne National Laboratory rross@mcs.anl.gov
… Application I/O System Software I/O System Software … Storage Hardware I/O in a HPC system • Many cooperating tasks sharing I/O resources • Relying on parallelism of hardware and software for performance Clients runningapplications (100s-1000s) Storage orSystem Network I/O devices or servers (10s-100s)
Motivation • HPC applications increasingly rely on I/O subsystems • Large input datasets, checkpointing, visualization • Applications continue to be scaled, putting more pressure on I/O subsystems • Application programmers desire interfaces that match the domain • Multidimensional arrays, typed data, portable formats • Two issues to be resolved by I/O system • Very high performance requirements • Gap between app. abstractions and HW abstractions
I/O history in a nutshell • I/O hardware has lagged behind and continues to lag behind all other system components • I/O software has matured more slowly than other components (e.g. message passing libraries) • Parallel file systems (PFSs) are not enough • This combination has led to poor I/O performance on most HPC platforms • Only in a few instances have I/O libraries presented abstractions matching application needs
MPI-IO Local disk, POSIX Parallel file systems Serial high-level libraries Remote access (NFS, FC) Parallel high-level libraries Evolution of I/O software • Goal is convenience and performance for HPC • Slowly capabilities have emerged • Parallel high-level libraries bring together good abstractions and performance, maybe (Not to scale or necessarily in the right order…)
Application High-level I/O Library MPI-IO Library Parallel File System I/O Hardware I/O software stacks • Myriad I/O components are converging into layered solutions • Insulate applications from eccentric MPI-IO and PFS details • Maintain (most of) I/O performance • Some HLL features do cost performance
Role of parallel file systems • Manage storage hardware • Lots of independent components • Must present a single view • Provide fault tolerance • Focus on concurrent, independent access • Difficult to pass knowledge of collectives to PFS • Scale to many clients • Probably means removing all shared state • Lock-free approaches • Publish an interface that MPI-IO can use effectively • Not POSIX
Role of MPI-IO implementations • Facilitate concurrent access by groups of processes • Understanding of the programming model • Provide hooks for tuning PFS • MPI_Info as interface to PFS tuning parameters • Expose a fairly generic interface • Good for building other libraries • Leverage MPI-IO semantics • Aggregation of I/O operations • Hide unimportant details of parallel file system
Role of high-level libraries • Provide an appropriate abstraction for the domain • Multidimensional, typed datasets • Attributes • Consistency semantics that match usage • Portable format • Maintain the scalability of MPI-IO • Map data abstractions to datatypes • Encourage collective I/O • Implement optimizations that MPI-IO cannot (e.g. header caching)
ASCI FLASH Parallel netCDF IBM MPI-IO GPFS Storage Example: ASCI/Alliance FLASH • FLASH is an astrophysics simulation code from the ASCI/Alliance Center for Astrophysical Thermonuclear Flashes • Fluid dynamics code using adaptive mesh refinement (AMR) • Runs on systems with thousands of nodes • Three layers of I/O software between the application and the I/O hardware • Example system: ASCI White Frost
FLASH data and I/O • 3D AMR blocks • 163 elements per block • 24 variables per element • Perimeter of ghost cells • Checkpoint writes all variables • no ghost cells • one variable at a time (noncontiguous) • Visualization output is a subset of variables • Portability of data desirable • Postprocessing on separate platform Ghost cell Element (24 vars)
Tying it all together • FLASH tells PnetCDF that all its processes want to write out regions of variables and store them in a portable format • PnetCDF performs data conversion and calls appropriate MPI-IO collectives • MPI-IO optimizes writing of data to GPFS using data shipping, I/O agents • GPFS handles moving data from agents to storage resources, storing the data, and maintaining file metadata • In this case, PnetCDF is a better match to the application
Application Domain Specific I/O Library High-level I/O Library MPI-IO Library Parallel File System I/O Hardware Future of I/O system software • More layers in the I/O stack • Better match application view of data • Mapping this view to PnetCDF or similar • Maintaining collectives, rich descriptions • More high-level libraries using MPI-IO • PnetCDF, HDF5 are great starts • These should be considered mandatoryI/O system software on our machines • Focusing component implementations on their roles • Less general-purpose file systems • Scalability and APIs of existing PFSs aren’t up to workloads and scales • More aggressive MPI-IO implementations • Lots can be done if we’re not busy working around broken PFSs • More aggressive high-level library optimization • They know the most about what is going on
Future • Creation and adoption of parallel high-level I/O libraries should make things easier for everyone • New domains may need new libraries or new middleware • HLLs that target database backends seem obvious, probably someone else is already doing this? • Further evolution of components necessary to get best performance • Tuning/extending file systems for HPC (e.g. user metadata storage, better APIs) • Aggregation, collective I/O, and leveraging semantics are even more important at larger scale • Reliability too, especially for kernel FS components • Potential HW changes (MEMS, active disk) are complementary