Petascale

Petascale • LLNL Appro AMD: 9K processors [today] • TJ Watson Blue Gene/L: 40K processors [today] • NY Blue Gene/L: 32K processors • ORNL Cray XT3/4 : • 44K processors [Jan 2008] • TACC Sun : 55K processors [Jan 2008] • ANL Blue Gene/P : 160K processors [Jan 2008]

CCSM and Component Models • POP (Ocean) • CICE (Sea Ice) • CLM (Land Model) • CPL (Coupler) • CAM (Atmosphere) • CCSM

Status of POP (John Dennis) • 17K Cray XT4 processors [12.5 years/day] • 29K IBM Blue Gene/L [8.5 years/day] (BG Ready in Expedition Mode) Parallel I/O [Underway] Land causes load imbalance at 0.1 degree resolutions

Status of CAM (John Dennis) • CAM HOMME In Expedition Mode • Standard CAM “may be” run at 1 degree resolution or slightly higher on BG

1/2 1/3 1/4 Simulation rate for HOMME:Held-Suarez

CAM & CCSM BG/L Expedition not from climate scientistsParallel I/O is the biggest bottleneck

Cloud Resolving Models/LES • Active Tracer High-resolution Atmospheric Model (ATHAM): • modularized • parallel-ready (MPI) • Goddard Cloud Ensemble Model (GCE): • well-established ( 70s- present) • parallel-ready (MPI) • scales linearly (99% up to 256 tasks) • comprehensive

Implementations • Been done(NERSC IBM SP, GFSC): • ATHAM: 2D & 3D bulk cloud physics • GCE: 3D bulk cloud physics 2D size-bins cloud physics • Being & to be done(Blue Gene): • GCE(ATHAM): 3D size-bins cloud physics larger domain longer simulation period finer resolution …

From: John Michalakes, NCAR

Single version of code for efficient execution on: Distributed-memory Shared-memory Clusters of SMPs Vector and microprocessors Parallelism in WRF: Multi-level Decomposition Logical domain 1 Patch, divided into multiple tiles Inter-processor communication Model domains are decomposed for parallelism on two-levels • Patch: section of model domain allocated to a distributed memory node • Tile: section of a patch allocated to a shared-memory processor within a node; this is also the scope of a model layer subroutine. • Distributed memory parallelism is over patches; shared memory parallelism is over tiles within patches • Slide Courtesy: NCAR

NCAR WRF Issues With Bluegene/L(from John Michalakes) • Relatively slow I/O • Limited memory per node • Relatively poor processor performance • “Lots of of little gotchas mostly related to immaturity, especially in the programming environment.”

Petascale

Petascale

Presentation Transcript

Scalable Solvers in Petascale Electromagnetic Simulation

Petascale Data Intensive Computing

Preparing for Petascale and Beyond

Scalable Spectral Transforms at Petascale

Petascale I/O Impacts on Visualization

Petascale Data Intensive Computing for eScience

RTM at Petascale and Beyond

Real Science at the Petascale

Introduction to Petascale Computing

FAST-OS: Petascale Single System Image

Techniques for Developing Efficient Petascale Applications

PETASCALE DATA STORAGE INSTITUTE

Petascale astronomy and the SKA

Progress Towards Petascale Virtual Machines

SciDAC-2 Petascale Data Storage Institute

Petascale Science with GTC/ADIOS

Strategies for preparing for PetaScale computing

Numerical Libraries for Petascale Computing

Extreme Performance Engineering: Petascale and Heterogeneous Systems

SciDAC-2 Petascale Data Storage Institute

FAST-OS: Petascale Single System Image