Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling

Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling Tom Henderson NOAA Global Systems Division Thomas.B.Henderson@noaa.gov Mark Govett, Jacques Middlecoff Paul Madden, James Rosinski, Craig Tierney

Correlation of Forecast Skill and Compute Power

HPC-Enabled Scientific Goals • NIM • 2013: Run @ global 4km resolution aqua-planet • 2013: Run @ global 30km resolution with real data & topography • 2014: Run @ global 4km resolution with real data & topography • FIM • 2013: Run 60-100 ensemble members @ global 15km resolution • 2014: 100+ members @ 10km coupled to ocean-FIM • However…

HPC Challenges • CPU clock rates have stalled • Emerging “accelerator” architectures crowd many (10s-100s) “cores” on a chip • Graphics Processing Units (GPU): NVIDIA • Many Integrated Core (MIC): Intel • But they require exploitation of fine-grained parallelism

HPC Challenges • Atmospheric Modeling has a lot of fine-grained parallelism • … but it is memory-bandwidth bound • How do we write software that runs efficiently on GPU & MIC? • How can we leverage our existing software investments? • Do we need new algorithms/formulations? • Enter GSD’s ACS…

ESRL’s Advanced Computing Section (Sandy’s Vision in 1991) • Lead HPC R&D group at NOAA for 20+ years • Vector  MPP  COTS  Fine-Grained “accelerators” (GPU and MIC) • Focus on software challenges • HPC: MPI, OpenMP, OpenACC, etc. • Provide Modern SE support • Emphasize performance-portability • Early adoption of new HPC technology • Benefit: competitive HPC procurements • Top500 #8 in 2002 with modest budget

“We did it before, we’ll do it again” GSD MPP (1992- ) GSD GPU (2008- ) 1st Operational NCEP MPP (2000)

Current “Accelerator” Research • GPU (NVIDIA) • NIM dynamical core • FIM dynamical core • Selected WRF physics packages • MIC (Intel) • FIM dynamical core • Ongoing close interaction with technical staff at NVIDIA, Intel, & compiler vendors • Technology transfer to commercial GPU compiler vendors

GPU vs MIC • GPU • >512 cores, 10,000s of “thin” threads • Many threads allow overlap of memory latency with useful computation • Limited working set size • Code restructuring often required • Hardware relatively mature • MIC • Fewer cores, fewer threads • Likely easier to port code (i86) • Code restructuring requirements unclear • Hardware still beta, Intel gag order

Performance-Portable Programming Approaches • GPU • Commercial directive-based compilers • CAPS HMPP 3.0.5 • Portland Group PGI Accelerator 11.10 • Cray (beta), Pathscale (beta) • Directive syntax converging to OpenACC • OpenMP long-term • MIC • OpenMP plus compiler vectorization

NIM NWP Dynamical Core • Science-SE collaboration from the start • “GPU-friendly” design (also good for CPU) • Single-precision floating-point computations • Computations structured as simple vector ops with horizontal indirect addressing and directly-addressed inner vertical loop • Coarse-grained (MPI) parallelism via SMS directives • Initial fine-grained (GPU) parallelism via locally-developed “F2C-ACC” • Followed by PGI Accelerator and CAPS HMPP

Initial NIM Performance Results on GPU • “G5-L96” test case • 10242 columns, 96 levels, 1000 time steps • Expect similar number of columns on each GPU at ~3km target resolution • Optimize for both CPU and GPU • CPU = Intel Westmere (2.66GHz) • GPU = NVIDIA C2050 “Fermi” • ~27% of peak on Westmere 2.8 GHz CPU • Quite respectable for a NWP dynamical core!

Fermi GPU vs. Single/Multiple Westmere CPU cores, “G5-L96” * Total time includes I/O, PCIe, etc. ** Recent result: HMPP now complete

FIM MIC & GPU Work • MIC • Added OpenMP parallelism to FIM (alongside SMS) • Working closely with Intel staff to analyze and tune kernel performance on MIC • Installed new “Knight’s Corner” boards at GSD • Gag order… • GPU • Encouraging early results

Plans • 2013 • NIM aqua-planet runs on DoE Titan (GPU) • FIM dynamics GPU & MIC parallelization complete • Single source code for GPU/MIC/MPI • 2014 • Full FIM and NIM models with physics running on GPU and MIC nodes • Add FIM-ocean model on MIC/GPU • Ongoing • Continue interactions with vendors • Compiler technology transfer • We benefit from multiple successful commercial hardware/software solutions!

Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling

Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling

Presentation Transcript

Application Architectures

Exploiting Heterogeneous Architectures

GPU Computational Screening of Carbon Capture Materials

Modern GPU Architectures

Power and Performance Characterization of Computational Kernels on the GPU

Cache Coherence for GPU Architectures

Mobile Application Architectures

Parallelizing Iterative Computation for Multiprocessor Architectures

Performance and Productivity of Emerging Architectures

Performance in GPU Architectures: Potentials and Distances

Hardware Transactional Memory for GPU Architectures*

Computational Biology 2008 Advisor: Dr. Alon Korngreen

Application Architectures

Hardware Transactional Memory for GPU Architectures

Exploiting Computing Power of GPU for Data Mining Application

Application Architectures

HETEROGENEOUS ARCHITECTURES

Hardware Transactional Memory for GPU Architectures*

Parallelizing Iterative Computation for Multiprocessor Architectures

Application Architectures