Comparison of the Unified Model Version 5.3 on Various Platforms

Comparison of the Unified Model Version 5.3 on Various Platforms Zoe Chaplin 11 September 2003 CAS2K3

Why am I talking about the UM?

Why am I talking about the UM? • SVE consists of CSAR (HPC group), MVC (Visualisation) and eScience • CSAR has 512 processor Origin 3000, 816 processor Cray T3E and is getting a 256 processor Altix Itanium • Many academic users of the UM based throughout the UK using the CSAR service (UGAMP – The UK Universities Global Atmospheric Modelling Group) • Links between Manchester Computing and the Met Office Supercomputing, Visualization & eScience

The Platforms

The Platforms • At CSAR • ‘Green’: Origin 3000, 400MHz MIPS R12000 processors, 1GB memory/processor ie 512 GB memory in total • ‘Turing’: Cray T3E, 816 600MHz processors, 256MB memory/processor ie 209 GB memory in total • At ECMWF • IBM P690: 2x30 P690 nodes, 32 1.3GHz processors/node ie 16 POWER4 chips/node. Each node divided into 4 LPARs, most P690 nodes have 32 GB memory, 2x3 have 128 GB. SPSwitch2 connects LPARs. Supercomputing, Visualization & eScience

The Model Version

The Model Version • All experiments performed at version 5.3 ‘the new dynamics’ • Semi-Lagrangian dynamics • Semi-implicit physics • Non-hydrostatic • Arakawa C-grid • Charney-Phillips grid in the vertical • Must have even number of procs in x direction Supercomputing, Visualization & eScience

Global Runs • N216 ie 432x325 points in the horizontal • 38 vertical levels • 20 minute timestep • Simulation run for 1 day ie 72 timesteps • 5 point halos in both directions • Up to 256 processors used (144 on the IBM) • Fast solver used • MPI used for communication Supercomputing, Visualization & eScience

Mesoscale Runs • Mes covers the UK, parts of Northern Europe and Scandinavia • 146x182 points in the horizontal • 38 vertical levels • 5 minute timestep • Simulation run for 36 hours ie 432 timesteps Supercomputing, Visualization & eScience

Mesoscale Runs • 5 point halos in both directions • 8 points for merging lbcs with main field • Up to 120 processors used • Limited to 10 procs max east-west and 13 procs max north-south • MPI used for communication Supercomputing, Visualization & eScience

Global Results

Global Results • T3E required minimum of 24 processors so comparisons are against this value • At lower processor numbers, the Origin 3000 proves to be the most scalable • Above ~156 processors, the T3E overtakes the Origin • The IBM is the least scalable but only run up to 144 processors (normally using < 8 cpus/LPAR – discussed later) • Curve showing IBM results by LPAR show scalability tailing off above 4 LPARs Supercomputing, Visualization & eScience

Global Results Supercomputing, Visualization & eScience

General Comments on the Global Results • Up to 144 processors, normally better to use 4 processors in the east-west direction • Below 32 processors for the IBM, results more varied – sometimes 2 is better • Origin between 1.33 and 1.63 times faster than the T3E • IBM between 2.36 and 3.07 times faster than the T3E Supercomputing, Visualization & eScience

Global Results Supercomputing, Visualization & eScience

Mesoscale Results

Mesoscale Results • T3E required minimum of 8 processors so comparisons are against this value • Up to 48 processors the IBM out performs the other two machines (using < 8cpus/LPAR – discussed later) • Above ~64 processors, the T3E has the greatest scalability • Curve showing IBM results by LPAR show scalability tailing off above 4 LPARs (using all 8 cpus/LPAR) Supercomputing, Visualization & eScience

Mesoscale Results Supercomputing, Visualization & eScience

General Comments on the Mesoscale Results • For the T3E, better to use fewer processors in EW direction • For the Origin, below 36 processors use 2 or 4 procs in the EW direction. For 36 procs and above use 6 or even 8 • For the IBM, below 24 processors, use more procs in the north-south direction than the east-west. For 24 and above, reverse this Supercomputing, Visualization & eScience

General Comments on the Mesoscale Results • Origin between 1.30 and 1.65 times faster than the T3E • IBM between 1.93 and 3.82 times faster than the T3E • The dip in the results on the IBM at 64 processors is due to having to use 8 procs/LPAR rather than 6 (discussed later) Supercomputing, Visualization & eScience

Mesoscale Results Supercomputing, Visualization & eScience

Discussion About the IBM

Discussion about the IBM • For a given processor configuration, results indicate that it is better to use < 8 tasks/LPAR • If you are charged by the number of LPARs used then it is faster to use all the processors on the LPAR (ie increase the configuration) • Eg For a configuration of 4x9, use 6 LPARs (ie 6 tasks/LPAR) • However, if you are using 6 LPARs, the run will be faster using a 4x12 configuration. But 4x12 will produce faster results on 8 LPARs rather than 6. Supercomputing, Visualization & eScience

Discussion about the IBM Supercomputing, Visualization & eScience

Conclusions and Further Work

Conclusions • The amount of time spent on optimising the UM for use with the T3E is clearly reflected in the results • Further work needs to be done to optimise the code adequately for the Origin and the IBM • The processor configuration for the IBM may be dependent on the charging mechanism Supercomputing, Visualization & eScience

Conclusions • For a given configuration, using < 8 tasks/LPAR will produce faster results • On all machines, it is generally better to use as few processors in the east-west direction as possible Supercomputing, Visualization & eScience

Further Work • Optimisations for the Origin and the P690 • Perform similar tests on the SGI Altix Itanium Supercomputing, Visualization & eScience

SVE @ Manchester Computing World Leading Supercomputing Service, Support and Research Bringing Science and Supercomputers Together www.man.ac.uk/sve sve@man.ac.uk Thanks to ECMWF

Comparison of the Unified Model Version 5.3 on Various Platforms

Comparison of the Unified Model Version 5.3 on Various Platforms

Presentation Transcript

version 5.3, February 20 10

Summary and Comparison of Version 3.0 and Version 4.0

“our” version of the gravity model

Comparison of performance metrics among the four quantitative platforms

AGN: The Unified Model

Comparison of Various Jet Algorithms

Unified Traffic Model on Enterprise Scenario

Trials of a 1km Version of the Unified Model for Short Range Forecasting of Convective Events

The EMEP Unified model

Model Comparison:

Comparison of Various Personal Retirement Investments

1 Degree POP on Various Platforms

A Quantitative Comparison of the Responses of Three Floating Platforms

The Unified Model of Quasi-Stellar Objects

The Unified Model Cloud Scheme.

On the Various Means of

Update on the Ecological Model: Box Model Version

Comparison of the Various Stanford Streaming Languages

A Comparison of Ecommerce Website Development Platforms

Various Benefits of the Online Auction Platforms

Social Media Advertising On Various Social Media Platforms

Version Model