1 / 29

Comparison of the Unified Model Version 5.3 on Various Platforms

Comparison of the Unified Model Version 5.3 on Various Platforms. Zoe Chaplin 11 September 2003 CAS2K3. Why am I talking about the UM?. Why am I talking about the UM?. SVE consists of CSAR (HPC group), MVC (Visualisation) and eScience

hedwig
Download Presentation

Comparison of the Unified Model Version 5.3 on Various Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparison of the Unified Model Version 5.3 on Various Platforms Zoe Chaplin 11 September 2003 CAS2K3

  2. Why am I talking about the UM?

  3. Why am I talking about the UM? • SVE consists of CSAR (HPC group), MVC (Visualisation) and eScience • CSAR has 512 processor Origin 3000, 816 processor Cray T3E and is getting a 256 processor Altix Itanium • Many academic users of the UM based throughout the UK using the CSAR service (UGAMP – The UK Universities Global Atmospheric Modelling Group) • Links between Manchester Computing and the Met Office Supercomputing, Visualization & eScience

  4. The Platforms

  5. The Platforms • At CSAR • ‘Green’: Origin 3000, 400MHz MIPS R12000 processors, 1GB memory/processor ie 512 GB memory in total • ‘Turing’: Cray T3E, 816 600MHz processors, 256MB memory/processor ie 209 GB memory in total • At ECMWF • IBM P690: 2x30 P690 nodes, 32 1.3GHz processors/node ie 16 POWER4 chips/node. Each node divided into 4 LPARs, most P690 nodes have 32 GB memory, 2x3 have 128 GB. SPSwitch2 connects LPARs. Supercomputing, Visualization & eScience

  6. The Model Version

  7. The Model Version • All experiments performed at version 5.3 ‘the new dynamics’ • Semi-Lagrangian dynamics • Semi-implicit physics • Non-hydrostatic • Arakawa C-grid • Charney-Phillips grid in the vertical • Must have even number of procs in x direction Supercomputing, Visualization & eScience

  8. Global Runs • N216 ie 432x325 points in the horizontal • 38 vertical levels • 20 minute timestep • Simulation run for 1 day ie 72 timesteps • 5 point halos in both directions • Up to 256 processors used (144 on the IBM) • Fast solver used • MPI used for communication Supercomputing, Visualization & eScience

  9. Mesoscale Runs • Mes covers the UK, parts of Northern Europe and Scandinavia • 146x182 points in the horizontal • 38 vertical levels • 5 minute timestep • Simulation run for 36 hours ie 432 timesteps Supercomputing, Visualization & eScience

  10. Mesoscale Runs • 5 point halos in both directions • 8 points for merging lbcs with main field • Up to 120 processors used • Limited to 10 procs max east-west and 13 procs max north-south • MPI used for communication Supercomputing, Visualization & eScience

  11. Global Results

  12. Global Results • T3E required minimum of 24 processors so comparisons are against this value • At lower processor numbers, the Origin 3000 proves to be the most scalable • Above ~156 processors, the T3E overtakes the Origin • The IBM is the least scalable but only run up to 144 processors (normally using < 8 cpus/LPAR – discussed later) • Curve showing IBM results by LPAR show scalability tailing off above 4 LPARs Supercomputing, Visualization & eScience

  13. Global Results Supercomputing, Visualization & eScience

  14. General Comments on the Global Results • Up to 144 processors, normally better to use 4 processors in the east-west direction • Below 32 processors for the IBM, results more varied – sometimes 2 is better • Origin between 1.33 and 1.63 times faster than the T3E • IBM between 2.36 and 3.07 times faster than the T3E Supercomputing, Visualization & eScience

  15. Global Results Supercomputing, Visualization & eScience

  16. Mesoscale Results

  17. Mesoscale Results • T3E required minimum of 8 processors so comparisons are against this value • Up to 48 processors the IBM out performs the other two machines (using < 8cpus/LPAR – discussed later) • Above ~64 processors, the T3E has the greatest scalability • Curve showing IBM results by LPAR show scalability tailing off above 4 LPARs (using all 8 cpus/LPAR) Supercomputing, Visualization & eScience

  18. Mesoscale Results Supercomputing, Visualization & eScience

  19. General Comments on the Mesoscale Results • For the T3E, better to use fewer processors in EW direction • For the Origin, below 36 processors use 2 or 4 procs in the EW direction. For 36 procs and above use 6 or even 8 • For the IBM, below 24 processors, use more procs in the north-south direction than the east-west. For 24 and above, reverse this Supercomputing, Visualization & eScience

  20. General Comments on the Mesoscale Results • Origin between 1.30 and 1.65 times faster than the T3E • IBM between 1.93 and 3.82 times faster than the T3E • The dip in the results on the IBM at 64 processors is due to having to use 8 procs/LPAR rather than 6 (discussed later) Supercomputing, Visualization & eScience

  21. Mesoscale Results Supercomputing, Visualization & eScience

  22. Discussion About the IBM

  23. Discussion about the IBM • For a given processor configuration, results indicate that it is better to use < 8 tasks/LPAR • If you are charged by the number of LPARs used then it is faster to use all the processors on the LPAR (ie increase the configuration) • Eg For a configuration of 4x9, use 6 LPARs (ie 6 tasks/LPAR) • However, if you are using 6 LPARs, the run will be faster using a 4x12 configuration. But 4x12 will produce faster results on 8 LPARs rather than 6. Supercomputing, Visualization & eScience

  24. Discussion about the IBM Supercomputing, Visualization & eScience

  25. Conclusions and Further Work

  26. Conclusions • The amount of time spent on optimising the UM for use with the T3E is clearly reflected in the results • Further work needs to be done to optimise the code adequately for the Origin and the IBM • The processor configuration for the IBM may be dependent on the charging mechanism Supercomputing, Visualization & eScience

  27. Conclusions • For a given configuration, using < 8 tasks/LPAR will produce faster results • On all machines, it is generally better to use as few processors in the east-west direction as possible Supercomputing, Visualization & eScience

  28. Further Work • Optimisations for the Origin and the P690 • Perform similar tests on the SGI Altix Itanium Supercomputing, Visualization & eScience

  29. SVE @ Manchester Computing World Leading Supercomputing Service, Support and Research Bringing Science and Supercomputers Together www.man.ac.uk/sve sve@man.ac.uk Thanks to ECMWF

More Related