1 / 25

MITgcm

MITgcm. History. MITgcm Family Tree. MITgcm. Algorithm and applications. MITgcm UV. Ultra-Versatile Implementation. Target Compute Environments. SMP and also Clustered SMP T3E ( SGI/CRAY ) Single and multi-processor vector NEC-SX4, CRAY-C90 ( SGI ).

kioko
Download Presentation

MITgcm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MITgcm History

  2. MITgcm Family Tree

  3. MITgcm Algorithm and applications

  4. MITgcm UV Ultra-Versatile Implementation

  5. Target Compute Environments • SMP • and also Clustered SMP • T3E ( SGI/CRAY ) • Single and multi-processor vector NEC-SX4, CRAY-C90 (SGI) • IBM, • SGI, • Sun, • Intel et al., • Digital, • HP.

  6. Goals Useful today • Good performance on current generation machines • TAMC “compatible” With a future • Practical route to a teraflop/s • Practical route to a desktop gigaflop/s

  7. Challenges • Cache blocking v. long vectors. • Isolating and minimizing communication/synchronization primitives. • OS idiosyncrasies. • Varying degrees of compiler capability.

  8. Technologies • Vector processing. • Caches, deep memory hierarchy. • MPI. • HPF. • Multi-threading. • Network interface. SCI, Memory Channel, Giga-ring, Arcitc

  9. Cache and vector sNx + OLx • “Voodoo numbers” nSx sNx sNy sNx sNy + OLy nSy sNy

  10. Vector mode • Strips or one whole domain i.e. four proc. example sNy = sNy sNx = Nx

  11. Cache, deep memory mode • Block the domain. sNy= sNx= sNy sNx

  12. What about the algorithm? • We know it vectorizes • Can it be blocked? -lets hope so!

  13. MITgcm UV structure Range 1:sNx+1,.. { Can be “block” by “block” Fill overlaps Don’t need any “long vector sweeps” Range 1-OLx:sNx+OLx+1,.. Depends on alg. and problem! {

  14. Communication • Minimize comm. points • Keep at high level, not in compute primitives. • Overlap with computation (needs hardware and OS support to have an effect!). • Multi-threaded and/or multi-process (MPI).

  15. MITgcm UV communication Send G’s Update overlaps Receive G’s Depends on alg. and problem! Send and receive ps

  16. MPI and shared memory • Repeat domain in each process • Shared mem copies -> messaging calls …. etc sNy Nx

  17. Exploiting NI innovation • Ongoing collaborations • T3E production hardware • HP, Sun, Digital - semi-production • Intel, IBM experimental • Rapidly evolving field • MITgcm UV can exploit it

  18. Compiler and OS maturity • F77 v. F90 • F77 is universally OK • On SMP for predictable performance need batch execution, private environment. • Not always configured that way. • Virtual memory • Makes “cache speedup” hard to predict

  19. Is UV really Ugly Version • Example code.

  20. Per proc. grid size 64x32x20 100Mflop/s per proc. Number of procs 16 Total problem size 244x132x20 Total performance 1.6GFlop/s Time per block per time step 0.2 secs Some HP Numbers.1 Forward Code

  21. Some HP Numbers.2 Inverter • Per proc. grid size 64x32 • 200Mflop/s per proc. • Number of procs 16 • Total problem size 244x132 • Total performance 3.2GFlop/s • Time per block per time step 0.2 secs per timestep.

  22. Base code debugging and testing. Parameterizations Mixed layer Eddy mixing I/O pre and post-process. Diagnostics. SPP customization communication primitives solver TAMC compilation. Outstanding Issues 1

  23. Outstanding Issues 2 • Per platform customizations • Pipelined slices! • T3E TAMC tape • Scientific libraries for solver

  24. “Parallel computing has historically been a field whose promise has been characterized by hyperbole, but whose development has has been defined by pragmatism” Conclusion HYPEPERBOLE - Combine the best of MITgcm Classic and MITgcm UV. PRAGAMATISM - We want performance today. - and - UV style implementation most likely teraflop/s model. UV style implementation most likely gigaflop/s desktop.

  25. MITgcm UV • Ultra-Versatile redesign and implementation of MITgcm algorithm • Implementation that can exploit • Wildfire shared-memory • Cache friendly, not vector dominated • Toward coupled ocean-atmosphere model.

More Related