120 likes | 303 Views
Petascale. LLNL Appro AMD: 9K processors [today] TJ Watson Blue Gene/L: 40K processors [today] NY Blue Gene/L: 32K processors ORNL Cray XT3/4 : 44K processors [Jan 2008] TACC Sun : 55K processors [Jan 2008] ANL Blue Gene/P : 160K processors [Jan 2008]. CCSM and Component Models.
E N D
Petascale • LLNL Appro AMD: 9K processors [today] • TJ Watson Blue Gene/L: 40K processors [today] • NY Blue Gene/L: 32K processors • ORNL Cray XT3/4 : • 44K processors [Jan 2008] • TACC Sun : 55K processors [Jan 2008] • ANL Blue Gene/P : 160K processors [Jan 2008]
CCSM and Component Models • POP (Ocean) • CICE (Sea Ice) • CLM (Land Model) • CPL (Coupler) • CAM (Atmosphere) • CCSM
Status of POP (John Dennis) • 17K Cray XT4 processors [12.5 years/day] • 29K IBM Blue Gene/L [8.5 years/day] (BG Ready in Expedition Mode) Parallel I/O [Underway] Land causes load imbalance at 0.1 degree resolutions
Status of CAM (John Dennis) • CAM HOMME In Expedition Mode • Standard CAM “may be” run at 1 degree resolution or slightly higher on BG
1/2 1/3 1/4 Simulation rate for HOMME:Held-Suarez
CAM & CCSM BG/L Expedition not from climate scientistsParallel I/O is the biggest bottleneck
Cloud Resolving Models/LES • Active Tracer High-resolution Atmospheric Model (ATHAM): • modularized • parallel-ready (MPI) • Goddard Cloud Ensemble Model (GCE): • well-established ( 70s- present) • parallel-ready (MPI) • scales linearly (99% up to 256 tasks) • comprehensive
Implementations • Been done(NERSC IBM SP, GFSC): • ATHAM: 2D & 3D bulk cloud physics • GCE: 3D bulk cloud physics 2D size-bins cloud physics • Being & to be done(Blue Gene): • GCE(ATHAM): 3D size-bins cloud physics larger domain longer simulation period finer resolution …
Single version of code for efficient execution on: Distributed-memory Shared-memory Clusters of SMPs Vector and microprocessors Parallelism in WRF: Multi-level Decomposition Logical domain 1 Patch, divided into multiple tiles Inter-processor communication Model domains are decomposed for parallelism on two-levels • Patch: section of model domain allocated to a distributed memory node • Tile: section of a patch allocated to a shared-memory processor within a node; this is also the scope of a model layer subroutine. • Distributed memory parallelism is over patches; shared memory parallelism is over tiles within patches • Slide Courtesy: NCAR
NCAR WRF Issues With Bluegene/L(from John Michalakes) • Relatively slow I/O • Limited memory per node • Relatively poor processor performance • “Lots of of little gotchas mostly related to immaturity, especially in the programming environment.”