170 likes | 316 Views
Operational Forecasting on the SGI Origin 3800 and Linux Clusters. ?. Roar Skålin Norwegian Meteorological Institute CAS 2001, Annecy, 31.10.2001. Contributions by: Dr. D. Bjørge, Dr. O. Vignes, Dr. E. Berge and T. Bø. DNMI Atmospheric Models. Weather Forecasting
E N D
Operational Forecasting on the SGI Origin 3800 and Linux Clusters ? Roar Skålin Norwegian Meteorological Institute CAS 2001, Annecy, 31.10.2001 Contributions by: Dr. D. Bjørge, Dr. O. Vignes, Dr. E. Berge and T. Bø
DNMI Atmospheric Models • Weather Forecasting • HIRLAM (HIgh Resolution Limited Area Model) • 3D VAR, hydrostatic, semi-implicit, semi-Lagrangian • Parallellisation by SHMEM and MPI • Resolutions: 50 km -> 20 km, 10 km, 5 km • Air Quality Forecasting (Clean City Air): • HIRLAM: 10 km • MM5: 3 km and 1 km • AirQUIS: Emission database, Eulerian dispersion model, sub-grid treatment of line and point sources, receptor point calculations
DNMI Operational Computers Gridur SGI O3800 220 PE/220 GB MIPS 400 MHz Trix OS Compute Server Cluster Scali TeraRack 20 PE/5 GB Intel PIII 800 MHz Linux OS Compute Server 500 km Peak: 100 Mbit/s Ftp: 55 Mbit/s Scp: 20 Mbit/s 2 m Peak 100 Mbit/s Ftp: 44 Mbit/s Monsoon SGI O2000 4 PE/2 GB Irix OS System Monitoring and Scheduling (SMS)
DNMI Operational Schedule Gridur Hirlam 50: 02:30 Hirlam 20: 03:15 Hirlam 10: 03:30 Cluster MM5: 05:00 NT systems AirQUIS: 05:50 Met. Workstation Hirlam 50: 02:30 Hirlam 10: 03:30 MM5: 05:00 AirQUIS: 06:00 Monsoon SMS EC Frames: 01:20 Observastions: 02:15
Cray T3E vs. SGI Origin 3800 • HIRLAM 50 on Cray T3E: • Version 2.6 of HIRLAM • DNMI specific data assimilation and I/O • 188 x 152 x 31 grid points • Run on 84 EV5 300 MHz processors • HIRLAM 20 on SGI Origin 3800 • Version 4.2 of HIRLAM • 3D VAR and GRIB I/O • 468 x 378 x 40 grid points • Run on 210 MIPS R14K 400 MHz processors
Cray T3E vs. SGI Origin 3800 HIRLAM 50 on Cray T3E, 84 PEs vs. HIRLAM 20 on SGI Origin 3800, 210 PEs:
Cray T3E vs. SGI Origin 3800 HIRLAM 50 on Cray T3E, 84 PEs vs. HIRLAM 20 on SGI Origin 3800, 210 PEs :
O3800 Algorithmic Challenges • Reduce the amount of messages and synchronisation points • Use of buffers in nearest neighbour communication • Develop new algorithms for data transposition • Remove unnecessary statistics • Parallel I/O • Asynchronous I/O on a dedicated set of processors • Dynamic load balancing • Single node optimisation • Currently far less important than on the Cray T3E
O3800 System Challenges • Interference from other users • CPU: Must suspend all other jobs, even if we run on a subset of the system • Memory: Global Swapping under TRIX/IRIX • Interactive processes: Cannot be suspended • Security • Scp substantially slower that ftp • TRIX is not a problem • Communication on a system level • Memory: Use local memory if possible • I/O: CXFS, NFS, directly mounted disks
Clean City Air • Collaborative effort of: The Norwegian Public Road Administration The Municipality of Oslo The Norwegian Meteorological Institute Norwegian Institute for Air Research
Main Aims • Reduce the undesired effects of wintertime air pollution in Norwegian cities • Components: NO2, PM10 (PM2.5) • Develop a standardised and science based forecast system for air pollution in Norwegian cities • Develop a basis for decision makers who want to control emissions on winter days with high concentration levels
Scali TeraRack • 10 Dual Nodes: • Two 800 MHz Pentium III • 512 MByte RAM • 30 GB IDE disk • Dolphin Interconnect • Software: • RedHat Linux 6.2 • Scali MPI implementation • PGI Compilers • OpenPBS queuing system
MM5 on the TeraRack Target: 90 minutes to complete a 3 and 1 km run for the Oslo area
MM5 on the TeraRack • Modifications to MM5: • No changes to the source code • Changed to configuration files • Inlined eight routines • DCPL3D, BDYTEN, SOLVE, EQUATE, DM_BCAST, EXCHANJ, ADDRX1C, SINTY • Struggled with one bug in the PGI runtime environment and a few Scali bugs
Conclusions • Shared Memory (SM) vs. Distributed Memory (DM): • Performance of communication algorithms may differ significantly • DM systems best for single user (peak), SM better for multi user systems (throughput) • SM easy to use for ”new” users of parallel systems, DM easier for ”experienced” users • Linux Clusters: • So inexpensive that you can’t afford to optimise code • So inexpensive that you can afford to buy a backup system • Main limitations: Interconnect and I/O