150 likes | 159 Views
This article explores the application of emerging computational architectures, such as GPUs and MICs, to atmospheric modeling. It discusses the challenges and benefits of utilizing these architectures and provides insights into the performance and portability of different programming approaches. The article also shares the ongoing research and plans for implementing these architectures in atmospheric modeling.
E N D
Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling Tom Henderson NOAA Global Systems Division Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden, James Rosinski, Craig Tierney
HPC-Enabled Scientific Goals • NIM • 2013: Run @ global 4km resolution aqua-planet • 2013: Run @ global 30km resolution with real data & topography • 2014: Run @ global 4km resolution with real data & topography • FIM • 2013: Run 60-100 ensemble members @ global 15km resolution • 2014: 100+ members @ 10km coupled to ocean-FIM • However…
HPC Challenges • CPU clock rates have stalled • Emerging “accelerator” architectures crowd many (10s-100s) “cores” on a chip • Graphics Processing Units (GPU): NVIDIA • Many Integrated Core (MIC): Intel • But they require exploitation of fine-grained parallelism
HPC Challenges • Atmospheric Modeling has a lot of fine-grained parallelism • … but it is memory-bandwidth bound • How do we write software that runs efficiently on GPU & MIC? • How can we leverage our existing software investments? • Do we need new algorithms/formulations? • Enter GSD’s ACS…
ESRL’s Advanced Computing Section (Sandy’s Vision in 1991) • Lead HPC R&D group at NOAA for 20+ years • Vector MPP COTS Fine-Grained “accelerators” (GPU and MIC) • Focus on software challenges • HPC: MPI, OpenMP, OpenACC, etc. • Provide Modern SE support • Emphasize performance-portability • Early adoption of new HPC technology • Benefit: competitive HPC procurements • Top500 #8 in 2002 with modest budget
“We did it before, we’ll do it again” GSD MPP (1992- ) GSD GPU (2008- ) 1st Operational NCEP MPP (2000)
Current “Accelerator” Research • GPU (NVIDIA) • NIM dynamical core • FIM dynamical core • Selected WRF physics packages • MIC (Intel) • FIM dynamical core • Ongoing close interaction with technical staff at NVIDIA, Intel, & compiler vendors • Technology transfer to commercial GPU compiler vendors
GPU vs MIC • GPU • >512 cores, 10,000s of “thin” threads • Many threads allow overlap of memory latency with useful computation • Limited working set size • Code restructuring often required • Hardware relatively mature • MIC • Fewer cores, fewer threads • Likely easier to port code (i86) • Code restructuring requirements unclear • Hardware still beta, Intel gag order
Performance-Portable Programming Approaches • GPU • Commercial directive-based compilers • CAPS HMPP 3.0.5 • Portland Group PGI Accelerator 11.10 • Cray (beta), Pathscale (beta) • Directive syntax converging to OpenACC • OpenMP long-term • MIC • OpenMP plus compiler vectorization
NIM NWP Dynamical Core • Science-SE collaboration from the start • “GPU-friendly” design (also good for CPU) • Single-precision floating-point computations • Computations structured as simple vector ops with horizontal indirect addressing and directly-addressed inner vertical loop • Coarse-grained (MPI) parallelism via SMS directives • Initial fine-grained (GPU) parallelism via locally-developed “F2C-ACC” • Followed by PGI Accelerator and CAPS HMPP
Initial NIM Performance Results on GPU • “G5-L96” test case • 10242 columns, 96 levels, 1000 time steps • Expect similar number of columns on each GPU at ~3km target resolution • Optimize for both CPU and GPU • CPU = Intel Westmere (2.66GHz) • GPU = NVIDIA C2050 “Fermi” • ~27% of peak on Westmere 2.8 GHz CPU • Quite respectable for a NWP dynamical core!
Fermi GPU vs. Single/Multiple Westmere CPU cores, “G5-L96” * Total time includes I/O, PCIe, etc. ** Recent result: HMPP now complete
FIM MIC & GPU Work • MIC • Added OpenMP parallelism to FIM (alongside SMS) • Working closely with Intel staff to analyze and tune kernel performance on MIC • Installed new “Knight’s Corner” boards at GSD • Gag order… • GPU • Encouraging early results
Plans • 2013 • NIM aqua-planet runs on DoE Titan (GPU) • FIM dynamics GPU & MIC parallelization complete • Single source code for GPU/MIC/MPI • 2014 • Full FIM and NIM models with physics running on GPU and MIC nodes • Add FIM-ocean model on MIC/GPU • Ongoing • Continue interactions with vendors • Compiler technology transfer • We benefit from multiple successful commercial hardware/software solutions!