300 likes | 343 Views
Scaling the High Order Method Modeling Environment (HOMME) on Blue Gene/L Dr. Richard D. Loft Scientific Computing Division National Center for Atmospheric Research loft@ucar.edu. Outline:. Scalable system: Blue Gene/L Scalable model: HOMME
E N D
Scaling the High Order Method Modeling Environment (HOMME) on Blue Gene/LDr. Richard D. LoftScientific Computing DivisionNational Center for Atmospheric Researchloft@ucar.edu
Outline: • Scalable system: Blue Gene/L • Scalable model: HOMME • Explicit Dynamics: using SFC’s and process mapping to get scalability • Extensions to POP (Dennis) • Limitations and Future Directions
Blue Gene/L: PetaFlops prototype • Blue Gene/L’s Petascale “DNA” • Massive parallelism. (up to 130 K cores) • Low power per core. (~12W/core) • High component reliability. • Achieves high packaging density. (2048 pes/rack) • Dedicated reduction network. (solver scalability) • Conventional programming model (usability) • xlf90, xlcc compiler • MPI
Blue Gene/L @ NCAR “Frost”
HOMME Project Participants • Core Development Team (all NCAR) • John Dennis (POP scalability) • Jim Edwards • Ram Nair • Amik St-Cyr • Steve Thomas (talking about timestepping schemes) • Henry Tufo • Collaborators: • Hae-Won Choi, UCB postdoc • Jack Chen, UCB postdoc • Vani Cheruvu, NCAR ASP postdoc • Mike Levy, UCB graduate student • Michael Oberg, UCB undergraduate student • Phil Rasch, NCAR • Mark Taylor, Sandia • Theron Voran, UCB graduate student • Funding from NSF and DOE
HOMME Framework • HOMME = High-Order Method Modeling Environment • Framework for developing scalable and efficient General Atmospheric Circulation Models (GACMs) to support climate science. • Serves as a prototype for the Community Atmospheric Model (CAM) component of the Community Climate System Model (CCSM). • Designed for high-order methods (e.g. spectral element and discontinuous Galerkin methods) on the cubed-sphere. • Configurable for shallow water and (dry/moist) primitive equations (hydrostatic). • Support for: • explicit and semi-implicit time stepping. • several vertical discretization schemes (e.g., Lin vertical Lagrangian method). • geometrically non-conforming elements and dynamically adaptive meshes (AMR).
Advantages of High-Order Methods • Algorithmic Advantages of High Order Methods • h-p element-based method on quadrilaterals (Ne x N) • Exponential convergence in polynomial degree (N) • Computational Advantages of High Order Methods • Naturally cache-blocked N x N computations • Nearest-neighbor communication between elements (explicit) • Well suited to parallel µprocessor systems
Ne=16 Degree of non-uniformity Geometry - Cube-Sphere • Sphere is decomposed into 6 identical regions using a central projection (Sadourny, 1972) with equiangular grid (Rancic et al., 1996). • Avoids pole problems, quasi-uniform. • Non-orthogonal curvilinear coordinate system with identical metric terms
Computational Mesh • Elements: • A quadrilateral “patch” of N x N gridpoints • Gauss-Lobatto Grid • Typ. N=8 • Cube • Ne = Elements on an edge • 6 x Ne x Ne elements total
Key Points • Only C0 continuity or flux conservation is enforced across element interfaces. • Locally the mesh is structured with solution, data, and geometry expressed as sums of Nth-order tensor-product Lagrange polynomials based on the Gauss or Gauss-Lobatto quadrature points. • Globally the mesh is an unstructured array of deformed quadrilaterals (layered in 3D). • Exponential convergence (large N ideal for transitional flows because of minimal numerical dispersion and dissipation). • Geometrically nonconforming formulation provides additional meshing flexibility and adaptivity.
Domain Decomposition • Mapping the elements to processors is achieved using Hilbert space-filling curves (Sagan, 1994; Dennis et al., 2006). • Generates the best partitioning mappings when Ne = 2n 3m, where n and m are positive integers. • (Have also examined Metis and Chaco but we’ve found SFC to be superior at large processors counts.)
Mapping to SFC’s to Torus Network • Must map 1-D list of SFC domains to MPI processes on 3-D torus intelligently. • Need to maximize torus locality. • Need to minimize wire contention. • Basic idea: snake processes through the torus as well.
Default “Lexical” Mapping (2-D Example) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Coprocessor Mode
Default “Lexical” Mapping (2-D Example) 0 16 1 17 2 18 3 19 4 20 5 21 6 22 7 23 8 24 9 25 10 26 11 27 12 28 13 29 14 30 15 31 Virtual Node Mode
Desirable “Grouped” Mapping (2-D Example) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Virtual Node Mode
2x2 “Snaked” Mapping (2-D Example) 0 1 6 7 8 9 14 15 2 3 4 5 10 11 12 13 16 17 22 23 24 25 30 31 18 19 20 21 26 27 28 29 Virtual Node Mode
2x2 “Snaked” Mapping (2-D Example) 0 1 6 7 8 9 14 15 2 3 4 5 10 11 12 13 16 17 22 23 24 25 30 31 18 19 20 21 26 27 28 29 Virtual Node Mode
BG/L HOMME - Moist Dynamics CPM gets 8 TFlops VNM Improvement due to snaking Sustained MFLOP per second per processor for moist Held-Suarez. Explicit integration Dt = 4 seconds. 6 X 128 X 128 elements, 96 vertical levels.
BG/L HOMME - Moist Dynamics with Physics 11.3 TFlops Sustained MFLOP per second per processor for Aquaplanet with Emanuel physics. Explicit integration Dt = 4 seconds. 6 X 128 X 128 elements, 40 vertical levels.
Limitations of work/Future directions • Explicit result - integration rate @ 10 km is too low for useful climate work. • Solution: solvers and preconditioners (Thomas) • Some progress here, but no data on large systems • Lots of parallelism in 3-D element left to exploit, particularly in physics. • Solution: redistribution of work between physics and dynamics components.
Logical View of AGCM Dynamics/Physics Coupling PHYSICS DYNAMICS
Hardware View of Dyamics/Physics Coupling on Blue Gene/L Begin with elements laid out for dynamics scheme
Hardware View of Dynamics/Physics Coupling on Blue Gene/L Begin scattering the columns
Hardware View of CRCP-HOMME Coupling on Blue Gene/L Continue scattering columns
Hardware View of CRCP-HOMME Coupling on Blue Gene/L Continue scattering columns
Hardware View of CRCP Physics Layout on Blue Gene/L Colors denote elements Physics Columns Simplified 2-d BG/L topology