Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Operational Forecasting on the SGI Origin 3800 and Linux Clusters ? Roar Skålin Norwegian Meteorological Institute CAS 2001, Annecy, 31.10.2001 Contributions by: Dr. D. Bjørge, Dr. O. Vignes, Dr. E. Berge and T. Bø

DNMI Atmospheric Models • Weather Forecasting • HIRLAM (HIgh Resolution Limited Area Model) • 3D VAR, hydrostatic, semi-implicit, semi-Lagrangian • Parallellisation by SHMEM and MPI • Resolutions: 50 km -> 20 km, 10 km, 5 km • Air Quality Forecasting (Clean City Air): • HIRLAM: 10 km • MM5: 3 km and 1 km • AirQUIS: Emission database, Eulerian dispersion model, sub-grid treatment of line and point sources, receptor point calculations

DNMI Operational Computers Gridur SGI O3800 220 PE/220 GB MIPS 400 MHz Trix OS Compute Server Cluster Scali TeraRack 20 PE/5 GB Intel PIII 800 MHz Linux OS Compute Server 500 km Peak: 100 Mbit/s Ftp: 55 Mbit/s Scp: 20 Mbit/s 2 m Peak 100 Mbit/s Ftp: 44 Mbit/s Monsoon SGI O2000 4 PE/2 GB Irix OS System Monitoring and Scheduling (SMS)

DNMI Operational Schedule Gridur Hirlam 50: 02:30 Hirlam 20: 03:15 Hirlam 10: 03:30 Cluster MM5: 05:00 NT systems AirQUIS: 05:50 Met. Workstation Hirlam 50: 02:30 Hirlam 10: 03:30 MM5: 05:00 AirQUIS: 06:00 Monsoon SMS EC Frames: 01:20 Observastions: 02:15

Cray T3E vs. SGI Origin 3800 • HIRLAM 50 on Cray T3E: • Version 2.6 of HIRLAM • DNMI specific data assimilation and I/O • 188 x 152 x 31 grid points • Run on 84 EV5 300 MHz processors • HIRLAM 20 on SGI Origin 3800 • Version 4.2 of HIRLAM • 3D VAR and GRIB I/O • 468 x 378 x 40 grid points • Run on 210 MIPS R14K 400 MHz processors

Cray T3E vs. SGI Origin 3800 HIRLAM 50 on Cray T3E, 84 PEs vs. HIRLAM 20 on SGI Origin 3800, 210 PEs:

Cray T3E vs. SGI Origin 3800 HIRLAM 50 on Cray T3E, 84 PEs vs. HIRLAM 20 on SGI Origin 3800, 210 PEs :

O3800 Algorithmic Challenges • Reduce the amount of messages and synchronisation points • Use of buffers in nearest neighbour communication • Develop new algorithms for data transposition • Remove unnecessary statistics • Parallel I/O • Asynchronous I/O on a dedicated set of processors • Dynamic load balancing • Single node optimisation • Currently far less important than on the Cray T3E

O3800 System Challenges • Interference from other users • CPU: Must suspend all other jobs, even if we run on a subset of the system • Memory: Global Swapping under TRIX/IRIX • Interactive processes: Cannot be suspended • Security • Scp substantially slower that ftp • TRIX is not a problem • Communication on a system level • Memory: Use local memory if possible • I/O: CXFS, NFS, directly mounted disks

Clean City Air • Collaborative effort of:  The Norwegian Public Road Administration  The Municipality of Oslo  The Norwegian Meteorological Institute  Norwegian Institute for Air Research

Main Aims • Reduce the undesired effects of wintertime air pollution in Norwegian cities • Components: NO2, PM10 (PM2.5) • Develop a standardised and science based forecast system for air pollution in Norwegian cities • Develop a basis for decision makers who want to control emissions on winter days with high concentration levels

Modelling Domains

AirQUIS Output Domain Oslo

Scali TeraRack • 10 Dual Nodes: • Two 800 MHz Pentium III • 512 MByte RAM • 30 GB IDE disk • Dolphin Interconnect • Software: • RedHat Linux 6.2 • Scali MPI implementation • PGI Compilers • OpenPBS queuing system

MM5 on the TeraRack Target: 90 minutes to complete a 3 and 1 km run for the Oslo area

MM5 on the TeraRack • Modifications to MM5: • No changes to the source code • Changed to configuration files • Inlined eight routines • DCPL3D, BDYTEN, SOLVE, EQUATE, DM_BCAST, EXCHANJ, ADDRX1C, SINTY • Struggled with one bug in the PGI runtime environment and a few Scali bugs

Conclusions • Shared Memory (SM) vs. Distributed Memory (DM): • Performance of communication algorithms may differ significantly • DM systems best for single user (peak), SM better for multi user systems (throughput) • SM easy to use for ”new” users of parallel systems, DM easier for ”experienced” users • Linux Clusters: • So inexpensive that you can’t afford to optimise code • So inexpensive that you can afford to buy a backup system • Main limitations: Interconnect and I/O

Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Operational Forecasting on the SGI Origin 3800 and Linux Clusters

Presentation Transcript

Scalable CC-NUMA Design Study - SGI Origin 2000

Operational flash flood forecasting based on grid technology Monitoring and Forecasting

Protocol-Dependent Message-Passing Performance on Linux Clusters

OGO 2.1 SGI Origin 2000

Linux Clusters in ITD

Understanding Application Scaling NAS Parallel Benchmarks 2.2 on NOW and SGI Origin 2000

The Largest Linux Clusters

Parallel Programming on the SGI Origin2000

Numerical Weather Prediction on Linux-clusters – Operational and research aspects

USAF Operational Turbulence Forecasting

Introduction to Scientific Computing on Linux Clusters

High Performance Linux Clusters

Operational Numerical Forecasting on Tropical Cyclones

Parallel Programming on the SGI Origin2000

SGI

Introducing the SGI

OpenGL Development: SGI vs Linux

CNRFC Operational Flood Forecasting

SGI Origin 3000

OGO 2.1 SGI Origin 2000

Scalable CC-NUMA Design Study - SGI Origin 2000

Protocol-Dependent Message-Passing Performance on Linux Clusters