260 likes | 399 Views
Role of spectral turbulence simulations in developing HPC systems. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN. Background. Experience of developing the Earth Simulator 40Tflops vector-type distributed-memory supercomputer system
E N D
Role of spectral turbulence simulations in developing HPC systems YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN
Background • Experience of developing the Earth Simulator • 40Tflops vector-type distributed-memory supercomputer system • A simulation code for box turbulence flow was used in the final adjustment of the system • Large simulation on box turbulence flow was carried out. • A Peta-flops supercomputer project One-day Meeting, INI, September 26th, 2008
Contents • Simulations on the Earth Simulator • A Japanese peta-scale supercomputer project • Trends of HPC system • Summary One-day Meeting, INI, September 26th, 2008
Simulations on the Earth Simulator One-day Meeting, INI, September 26th, 2008
The Earth Simulator • It was completed in 2002. • 35.86Tflops sustained in LINPACK benchmark was achieved. • It was chosen as one of 2002 best inventions by “TIME.” One-day Meeting, INI, September 26th, 2008
Why I did? • It is important to make performance evaluation of the Earth Simulator at the final adjustment phase. • Suitable codes should be chosen • To evaluate performance of vector processor, • To measure performance all-to-all communication among compute-nodes through a crossbar switch, • To make an operation of the Earth Simulator stable. • Candidates • LINPACK Benchmark? • Atmospheric general circulation model (AGCM)? • Any other code? One-day Meeting, INI, September 26th, 2008
Why I did? (cont’d) • Spectral turbulence simulation code • Intensive computational kernel & a lot of data communications • Simple code • Significance to computational science. • One of the grand challenges in computational science and high performance computing • A new spectral code for the Earth Simulator • Fourier spectral method for spatial discretization • Some techniques (mode truncation and phase shift techniques) for aliasing error in calculating nonlinear terms • Fourth-order Runge-Kutta method for time integration One-day Meeting, INI, September 26th, 2008
Points of coding • Optimization to the Earth Simulator • Coordinated assignment of calculation to three-level of parallelism (vector processing, micro-tasking, and MPI parallelization) • Higher-radix FFT • B/F (data transfer rate between CPU and memories vs. operation performance) • Removal of redundant processes and variables One-day Meeting, INI, September 26th, 2008
3days by 512 PNs Calculation for one time step 100 30.7sec 10 3.21sec Wall time 1 0.1 0.01 64 128 256 512 Number of nodes One-day Meeting, INI, September 26th, 2008
Performance 100 16.4Tflops 50% of the peak (single precision & analytical FLOP number) Tflops 10 1 64 128 256 512 Number of PNs One-day Meeting, INI, September 26th, 2008
10000 1000 100 10 1 1960 1970 1980 1990 2000 2010 Year Achievement of box turbulence flow simulations 1283 Jimenez et al.(1993) Caltech Delta machine K & I & Y (2002) Earth Simulator Kerr(1985) Cray-1S NCAR 5123 643 20483, 40963 Siggia(1981) Cray-1 NCAR 10243 Gotoh&Fukayama(2001) VPP5000/56 NUCC Number of grid points 323 Yamamoto(1994) Numerical Wind Tunnel Orszag(1969) IBM 360-95 2403 One-day Meeting, INI, September 26th, 2008
A Japanese Peta-Scale Supercomputer Project One-day Meeting, INI, September 26th, 2008
Next-Generation Supercomputer Project • Objectives are • to develop the world's most advanced and high-performance supercomputer • to develop and deploy its usage technologies as well as application software. as one of Japan's Key Technologies of National Importance. • Period & Budget: FY2006-FY2012, ~1 billion US$ (expected) • RIKEN (The Institute of Physical and Chemical Research) plays the central role of the project in developing the supercomputer under the law. One-day Meeting, INI, September 26th, 2008
Goals of the project • Development and installation of the most advanced high performance supercomputer system with LINPACK performance of 10 petaflops. • Development and deployment of application software, which should be made to attain the system maximum capability, in various science and engineering fields. • Establishment of an “Advanced Computational Science and Technology Center (tentative)” as one of the Center of Excellences for research, personnel development and training built around the supercomputer. One-day Meeting, INI, September 26th, 2008
Major applications for the system Grand Challenges One-day Meeting, INI, September 26th, 2008
Configuration of the system • The Next-Generation Supercomputer will be a hybrid general-purpose supercomputer that provides the optimum computing environment for a wide range of simulations. • Calculations will be performed in processing units that are suitable for the particular simulation. • Parallel processing in a hybrid configuration of scalar and vector units will make larger and more complex simulations possible. One-day Meeting, INI, September 26th, 2008
Roadmap of the project We are here. One-day Meeting, INI, September 26th, 2008
Kobe Tokyo Location of the supercomputer site, Kobe-City 450km (280miles) west from Tokyo One-day Meeting, INI, September 26th, 2008
Artists’ image of a building One-day Meeting, INI, September 26th, 2008
Photo of the site (under construction) June 10, 2008 July 17, 2008 Aug. 20, 2008 Photo From South-Side One-day Meeting, INI, September 26th, 2008
Trends of HPC system One-day Meeting, INI, September 26th, 2008
Trends of HPC system • It will have the large number of processors around 1 million or more. • Each chip will be multi-core(8, 16, or 32), or many-core(more than 64) processor. • low performance for each core • small main memory capacity for each core • fine-grain parallelism • Each processor consumes low energy – low power processor • Narrow bandwidth between CPU and main memory • Bottleneck of the number of signal pins • Bi-sectional bandwidth among compute-nodes will be narrow. • One-to-one connection is very expensive and power-consuming One-day Meeting, INI, September 26th, 2008
Impact to spectral simulations • High performance in LINPACK benchmark • The more the number of processors is, the higher the LINPACK performance is. • It is not necessary that LINPACK performance denotes real-world application performance, especially spectral simulations • Small memory capacity for each processor • fine-grain decomposition of space • increasing communication cost among parallel compute nodes • Narrow memory bandwidth and narrow inter-node bi-sectional bandwidth • memory wall problem and low all-to-all communication performance • necessity of a low B/F algorithm in place of FFT One-day Meeting, INI, September 26th, 2008
Impact to spectral simulations (cont’d) • The trend does not completely fit doing 3D-FFT, i.e. box turbulence simulations are getting to be difficult to perform. • We can use more and more computational resource near future, … • But finer resolution simulation by spectral methods needs a long-time calculation time because of extremely slow of communications among parallel compute nodes, and we might not be able to obtain the final results in reasonable time. One-day Meeting, INI, September 26th, 2008
Estimates for more than 40963 simulation • If simulation performance with 500TFlops sustained can be used, • 81923 simulation needs • 7 second for one-time step • 100TB total memory • 8 days for 100,000 steps and 1PBytes for a complete simulation • 163843 simulation • 1 min for one-time step • 800TB total memory • 3 months for 125,000 steps and 10PB in total for a complete simulation One-day Meeting, INI, September 26th, 2008
Summary • Spectral methods is a very useful algorithm to evaluate the HPC system. • In this sense, the trend of HPC system architecture is going to worse. • Even if peak performance of the system is so high… • We cannot expect high sustained performance. • It may take a long time to finish a simulation due to very slow data transfer between nodes. • Can we discard spectral methods and change the algorithm? Or, we have to • put strong pressure on computer architecture community, and • think of any international collaboration for developing the supercomputer system which fit the turbulent study. • I would think of a HPC system as a particle accelerator like CERN. One-day Meeting, INI, September 26th, 2008