E N D
1. Parallel MxN Communication Using MPI-I/O Felipe Bertrand, Yongquan Yuan,
Kenneth Chiu, Randy Bramley
Department Of Computer Science, Indiana University
Supported by
NSF Grants 0116050 and EIA-0202048
Department of Energy's Office of Science SciDAC grants
2. Motivation Interesting problems span multiple regimes of time, space, and physics. Multiphysics, multidiscipline:
Climate models
Combustion models
Fusion simulation
Protein simulation Climate models: the biggest supercomputer (at least outside of NSA) is being built in Japan and used for weather simulation.
This topic was very present in the SuperComputing Conference 2002.
Combustion modes: rockets, explosives.
Fusion simulation: there is a huge international effort to build the next generation of fusion reactors.
Expect to sustain fusion for a few minutes.
Protein simulation: Find the shape of a protein from the sequence of aminoacids
(DNA -> RNA (sequence of codons = 3 nucleotid bases) -> proteins).
Our task as computer scientists in scientific computation is
Climate models: the biggest supercomputer (at least outside of NSA) is being built in Japan and used for weather simulation.
This topic was very present in the SuperComputing Conference 2002.
Combustion modes: rockets, explosives.
Fusion simulation: there is a huge international effort to build the next generation of fusion reactors.
Expect to sustain fusion for a few minutes.
Protein simulation: Find the shape of a protein from the sequence of aminoacids
(DNA -> RNA (sequence of codons = 3 nucleotid bases) -> proteins).
Our task as computer scientists in scientific computation is
3. Community Climate System Model (CCSM) CCSM models are parallel but the inter-model communication is serial
CCSM is an evolution of a number of projects
A fully-coupled, global climate model that provides state-of-the-art computer simulations of the Earth's past, present, and future climate states.
Based on a framework which divides the complete climate system into component models connected by a coupler. This design requires four components: atmosphere, land, ocean, and ice
In development by the National Center for Atmospheric Research since 1984.
Resolution is very poor: there is a lot of room for improvement.
Atmosphere and land grid:
High resolution grid (T42): 128x64 = 8,192 points.
Resolution is 300km at equator (2.8 degree).
26 levels in the vertical (total points: 213k).
Ocean and Ice grid:
High resolution grid (gx1v3): 320x384 = 123 kpoints.
Resolution is 1 degree longitudinal (constant) and 0.3 latitudinal at equator.
40 levels in the vertical, 10 to 250 meters thick (total points: 5M).
Component have to communicate state variables (for example the atm component has to communicate to the ocean component the pressure on the surface of the sea; maybe that has some relevance to evaporation) and fluxes (like heat, or salt).
Albedo: amount of radiation that is reflected by a surface.
This is what is more important to us: the architectural design of theapplication. We can see components, processes, and communication links.The coupler does more work that just being a hub of communication. It doesdata interpolation between the grids and guarantees the conservation of thefluxes (if that much heat is emitted, then the couples guarantees that the target grid will get as much heat).
This picture suggests that a parallel communication scheme will be verybeneficial. Things I would ponder if I had to make the decision of migrating to a parallel communication scheme: load of communication versus computation(also: if we make communication cheaper maybe we can benefit from more frequent communication), effort to maintain the parallel communicationinfrastructure (this is an management decision) including training and the costof software dependencies.
CCSM is an evolution of a number of projects
A fully-coupled, global climate model that provides state-of-the-art computer simulations of the Earth's past, present, and future climate states.
Based on a framework which divides the complete climate system into component models connected by a coupler. This design requires four components: atmosphere, land, ocean, and ice
In development by the National Center for Atmospheric Research since 1984.
Resolution is very poor: there is a lot of room for improvement.
Atmosphere and land grid:
High resolution grid (T42): 128x64 = 8,192 points.
Resolution is 300km at equator (2.8 degree).
26 levels in the vertical (total points: 213k).
Ocean and Ice grid:
High resolution grid (gx1v3): 320x384 = 123 kpoints.
Resolution is 1 degree longitudinal (constant) and 0.3 latitudinal at equator.
40 levels in the vertical, 10 to 250 meters thick (total points: 5M).
Component have to communicate state variables (for example the atm component has to communicate to the ocean component the pressure on the surface of the sea; maybe that has some relevance to evaporation) and fluxes (like heat, or salt).
Albedo: amount of radiation that is reflected by a surface.
This is what is more important to us: the architectural design of theapplication. We can see components, processes, and communication links.The coupler does more work that just being a hub of communication. It doesdata interpolation between the grids and guarantees the conservation of thefluxes (if that much heat is emitted, then the couples guarantees that the target grid will get as much heat).
This picture suggests that a parallel communication scheme will be verybeneficial. Things I would ponder if I had to make the decision of migrating to a parallel communication scheme: load of communication versus computation(also: if we make communication cheaper maybe we can benefit from more frequent communication), effort to maintain the parallel communicationinfrastructure (this is an management decision) including training and the costof software dependencies.
4. Existing Approaches Refactor codes: integrate all components into a single, much larger program
More efficient data sharing
Large time investment in rewriting program
Closer coordination between teams: more costly in development, testing and deployment
Component models
Simplify the overall application development.
Complicate the efficient coordination and sharing of data between components (MxN problem). Closer coordination between teams:Closer coordination between teams:
5. Goals Scalably connect codes to create new multiphysics simulations
Target existing and currently evolving codes
Created by teams with disparate research interests
Spanning multiple time scales, spatial domains, and disciplines
Rapid prototyping/testing without extensive rewriting
Use standard APIs and paradigms familiar to most application area scientists (MPI I/O) Climate models: the biggest supercomputer (at least outside of NSA) is being built in Japan and used for weather simulation.
This topic was very present in the SuperComputing Conference 2002.
Combustion modes: rockets, explosives.
Fusion simulation: there is a huge international effort to build the next generation of fusion reactors.
Expect to sustain fusion for a few minutes.
Protein simulation: Find the shape of a protein from the sequence of aminoacids
(DNA -> RNA (sequence of codons = 3 nucleotid bases) -> proteins).
Our task as computer scientists in scientific computation is
Climate models: the biggest supercomputer (at least outside of NSA) is being built in Japan and used for weather simulation.
This topic was very present in the SuperComputing Conference 2002.
Combustion modes: rockets, explosives.
Fusion simulation: there is a huge international effort to build the next generation of fusion reactors.
Expect to sustain fusion for a few minutes.
Protein simulation: Find the shape of a protein from the sequence of aminoacids
(DNA -> RNA (sequence of codons = 3 nucleotid bases) -> proteins).
Our task as computer scientists in scientific computation is
6. The MxN Problem
Transfer data from a parallel program running on M processors to another running on N processors.
M and N may differ
May require complex all-to-all communications, data redistribution
7. Solving the MxN Problem Existing solutions
Use process 0 on all components
Used by CCSM model
Not scalable
Read/Write through files
Scalable if parallel I/O used
Slow because involves hard drive read/write
Our solution
Use MPI I/O interface, create middleware to transfer data via network
Treat application codes as software components
Provide easy migration path for existing applications
8. Solving the MxN Problem MPI-I/O defines an API for parallel I/O using file-like semantics.
ROMIO is an implementation of MPI-IO.
Provides an abstract device interface (ADIO) that allows different physical I/O mechanisms to be plugged in.
9. MxN MPI-IO Communication Application level:
Components live in different MPI instances
Transparency:
Not aware of the MxN communication
Reads and writes data through regular MPI interface.
No change in the source code is required:
Switch to MxN backend with filename prefix mxn:
Communication can be established between different MPI ROMIO-based implementations. I am presenting now one solution to the MxN problem developed by us.
This solution is a back-end to an MPI-IO implementation (MPI is a message passing library,a library for communication).
Explain the picture.
The design goals were those shown on the slide.
Also: allows extreme decoupling for debugging and testing.
Also: no new paradigm (easy to learn)
Also: easy migration from current file-based applications (some only requirerelinking and changing a file name, others might need to change the accesspattern to the file)I am presenting now one solution to the MxN problem developed by us.
This solution is a back-end to an MPI-IO implementation (MPI is a message passing library,a library for communication).
Explain the picture.
The design goals were those shown on the slide.
Also: allows extreme decoupling for debugging and testing.
Also: no new paradigm (easy to learn)
Also: easy migration from current file-based applications (some only requirerelinking and changing a file name, others might need to change the accesspattern to the file)
10. MxN MPI-IO Communication MxN backend:
Logical serialization: intuitive paradigm.
Parallel implementation: high performance. The interface between the components is the format of the (virtual) file.
The abstraction is that of a stream file, like writing to a tape.
There is not random access supported, because the data is released assoon as it is read by the other side.The transfer is done through many read and write operations.The interface between the components is the format of the (virtual) file.
The abstraction is that of a stream file, like writing to a tape.
There is not random access supported, because the data is released assoon as it is read by the other side.
11. MxN MPI-IO Communication MxN backend: The interface between the components is the format of the (virtual) file.
The abstraction is that of a stream file, like writing to a tape.
There is not random access supported, because the data is released assoon as it is read by the other side.The transfer is done through many read and write operations.The interface between the components is the format of the (virtual) file.
The abstraction is that of a stream file, like writing to a tape.
There is not random access supported, because the data is released assoon as it is read by the other side.
12. Timing: first MxN connection between discretizer and solver components.
4 discretizer processes, 16 solver processes MxN MPI-IO Communication This technology was demonstrated in Super Computing with the setup shown above.
Four components.This technology was demonstrated in Super Computing with the setup shown above.
Four components.
13. Thor Results: Time vs. Bytes
14. Future work Incorporate MxN communication system into a CCA component
Explore standard API for MxN components
Identify current computations challenges in
areas of scientific application and design supporting middleware MxN introduces a new capability into scientific computing, one not previously available. This means a new application space is open and not yet full explored or utilized by application scientists.MxN introduces a new capability into scientific computing, one not previously available. This means a new application space is open and not yet full explored or utilized by application scientists.
15. References Ian Foster, David Kohr, Jr., Rakesh Krishnaiyer, Jace Mogill. Remote I/O: Fast Access to Distant Storage. Proceedings of the Fifth Workshop on Input/Output in Parallel and Distributed Systems, 1997.
Climate and UCAR Global Dynamic Division. Community Climate System Model, http://www.cgd.ucar.edu/csm
Message Passing Interface Forum.
http://www.mpi-forum.org
R..Thakur, W..Gropp, E. Lusk. An abstract-device interface for implementing portable parallel-I/O interfaces. In Proceedings of the Sixth Symposium on the Frontiers of Massively Parallel /computation, p.180, 1996
S.A. Hutchinson, J.N. Shadid, R.S. Tuminaro. Aztec Users Guide: Version 2.30. Sandia National Laboratories. http://www.cs.sandia.gov/CRF/aztec1.html, 1998
16. Extra
17. Goals Large-scale scientific computations span multiple scales, domains, disciplines developed by large and diverse teams (multi-physics simulations)
Create multi-physics simulations using existing community parallel codes
Rapid prototype/testing without rewriting codes
18. MxN Problem Defined
The transfer of data from
a parallel program running
on M processors to another
parallel program running on
N processors. Ideally
neither program knows
the number of processes
on the other one.
19. Solving MxN Problem Existing approaches
Use process 0 on all components
example: CCSM models
Read/Write through files
Our Approaches
Decouple application into components
Provide easy migration path for existing application
Enable an intuitive model
Use MPI I/O interface
20. RI Support Critical to have cluster where we can install variant file systems, modified middleware such as ROMIO with new abstract devices
Next phase: components on different clusters; fast network connection to university clusters critical for testing
Storage updates allow ability to switch between MxN and standard file I/O. MxN introduces a new capability into scientific computing, one not previously available. This means a new application space is open and not yet full explored or utilized by application scientists.MxN introduces a new capability into scientific computing, one not previously available. This means a new application space is open and not yet full explored or utilized by application scientists.