1 / 15

Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling

This article explores the application of emerging computational architectures, such as GPUs and MICs, to atmospheric modeling. It discusses the challenges and benefits of utilizing these architectures and provides insights into the performance and portability of different programming approaches. The article also shares the ongoing research and plans for implementing these architectures in atmospheric modeling.

Download Presentation

Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling Tom Henderson NOAA Global Systems Division Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden, James Rosinski, Craig Tierney

  2. Correlation of Forecast Skill and Compute Power

  3. HPC-Enabled Scientific Goals • NIM • 2013: Run @ global 4km resolution aqua-planet • 2013: Run @ global 30km resolution with real data & topography • 2014: Run @ global 4km resolution with real data & topography • FIM • 2013: Run 60-100 ensemble members @ global 15km resolution • 2014: 100+ members @ 10km coupled to ocean-FIM • However…

  4. HPC Challenges • CPU clock rates have stalled • Emerging “accelerator” architectures crowd many (10s-100s) “cores” on a chip • Graphics Processing Units (GPU): NVIDIA • Many Integrated Core (MIC): Intel • But they require exploitation of fine-grained parallelism

  5. HPC Challenges • Atmospheric Modeling has a lot of fine-grained parallelism • … but it is memory-bandwidth bound • How do we write software that runs efficiently on GPU & MIC? • How can we leverage our existing software investments? • Do we need new algorithms/formulations? • Enter GSD’s ACS…

  6. ESRL’s Advanced Computing Section (Sandy’s Vision in 1991) • Lead HPC R&D group at NOAA for 20+ years • Vector  MPP  COTS  Fine-Grained “accelerators” (GPU and MIC) • Focus on software challenges • HPC: MPI, OpenMP, OpenACC, etc. • Provide Modern SE support • Emphasize performance-portability • Early adoption of new HPC technology • Benefit: competitive HPC procurements • Top500 #8 in 2002 with modest budget

  7. “We did it before, we’ll do it again” GSD MPP (1992- ) GSD GPU (2008- ) 1st Operational NCEP MPP (2000)

  8. Current “Accelerator” Research • GPU (NVIDIA) • NIM dynamical core • FIM dynamical core • Selected WRF physics packages • MIC (Intel) • FIM dynamical core • Ongoing close interaction with technical staff at NVIDIA, Intel, & compiler vendors • Technology transfer to commercial GPU compiler vendors

  9. GPU vs MIC • GPU • >512 cores, 10,000s of “thin” threads • Many threads allow overlap of memory latency with useful computation • Limited working set size • Code restructuring often required • Hardware relatively mature • MIC • Fewer cores, fewer threads • Likely easier to port code (i86) • Code restructuring requirements unclear • Hardware still beta, Intel gag order

  10. Performance-Portable Programming Approaches • GPU • Commercial directive-based compilers • CAPS HMPP 3.0.5 • Portland Group PGI Accelerator 11.10 • Cray (beta), Pathscale (beta) • Directive syntax converging to OpenACC • OpenMP long-term • MIC • OpenMP plus compiler vectorization

  11. NIM NWP Dynamical Core • Science-SE collaboration from the start • “GPU-friendly” design (also good for CPU) • Single-precision floating-point computations • Computations structured as simple vector ops with horizontal indirect addressing and directly-addressed inner vertical loop • Coarse-grained (MPI) parallelism via SMS directives • Initial fine-grained (GPU) parallelism via locally-developed “F2C-ACC” • Followed by PGI Accelerator and CAPS HMPP

  12. Initial NIM Performance Results on GPU • “G5-L96” test case • 10242 columns, 96 levels, 1000 time steps • Expect similar number of columns on each GPU at ~3km target resolution • Optimize for both CPU and GPU • CPU = Intel Westmere (2.66GHz) • GPU = NVIDIA C2050 “Fermi” • ~27% of peak on Westmere 2.8 GHz CPU • Quite respectable for a NWP dynamical core!

  13. Fermi GPU vs. Single/Multiple Westmere CPU cores, “G5-L96” * Total time includes I/O, PCIe, etc. ** Recent result: HMPP now complete

  14. FIM MIC & GPU Work • MIC • Added OpenMP parallelism to FIM (alongside SMS) • Working closely with Intel staff to analyze and tune kernel performance on MIC • Installed new “Knight’s Corner” boards at GSD • Gag order… • GPU • Encouraging early results

  15. Plans • 2013 • NIM aqua-planet runs on DoE Titan (GPU) • FIM dynamics GPU & MIC parallelization complete • Single source code for GPU/MIC/MPI • 2014 • Full FIM and NIM models with physics running on GPU and MIC nodes • Add FIM-ocean model on MIC/GPU • Ongoing • Continue interactions with vendors • Compiler technology transfer • We benefit from multiple successful commercial hardware/software solutions!

More Related