Benchmark performance on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl

Benchmark performance on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl.gov NERSC User Group Meeting June 12, 2006

Architectural Comparison

NERSC 5 Application Benchmarks • CAM3 • Climate model, NCAR • GAMESS • Computational chemistry, Iowa State, Ames Lab • GTC • Fusion, PPPL • MADbench • Astrophysics (CMB analysis), LBL • Milc • QCD, multi-site collaboration • Paratec • Materials science,developed LBL and UC Berkeley • PMEMD • Computational chemistry, University of North Carolina-Chapel Hill

Application Summary

CAM3 • Community Atmospheric Model version 3 • Developed at NCAR with substantial DOE input, both scientific and software. • The atmosphere model for CCSM, the coupled climate system model. • Also the most timing consuming part of CCSM. • Widely used by both American and foreign scientists for climate research. • For example, Carbon, bio-geochemistry models are built upon (integrated with) CAM3. • IPCC predictions use CAM3 (in part) • About 230,000 lines codes in Fortran 90. • 1D Decomposition, runs up to 128 processors at T85 resolution (150Km) • 2D Decomposition, runs up to 1680 processors at 0.5 deg (60Km) resolution.

CAM3: Performance

GAMESS • Computational chemistry application • Variety of electronic structure algorithms available • About 550,000 lines of Fortran 90 • Communication layer makes use of highly optimized vendor libraries • Many methods available within the code • Benchmarks are DFT energy and gradient calculation, MP2 energy and gradient calculation • Many computational chemistry studies rely on these techniques • Exactly the same as DOD HPCMP TI-06 GAMESS benchmark • Vendors will only have to do the work once

GAMESS: Performance • Small case: large, messy, low computational-intensity kernels problematic for compilers • Large case depends on asynchronous messaging

GTC • Gyrokinetic Toroidal Code • Important code for Fusion SciDAC Project and for the International Fusion collaboration ITER. • Transport of thermal energy via plasma microturbulence using particle-in-cell approach (PIC) 3D visualization of electrostatic potential in magnetic fusion device

GTC: Performance • SX8 highest raw performance (ever) but lower efficiency than ES • Scalar architectures suffer from low computational intensity, irregular data access, and register spilling • Opteron/IB is 50% faster than Itanium2/Quadrics and only 1/2 speed of X1 • Opteron: on-chip memory controller and caching of FP L1 data • X1 suffers from overhead of scalar code portions

MADbench • Cosmic microwave background radiation analysis tool (MADCAP) • Used large amount of time in FY04 and one of the highest scaling codes at NERSC • MADBench is a benchmark version of the original code • Designed to be easily run with synthetic data for portability. • Used in a recent study in conjunction with Berkeley Institute for Performance Studies (BIPS). • Written in C making extensive use of ScaLAPACK libraries • Has extensive I/O requirements

MADbench: Performance • Dominated by • Blas3 • I/O

MILC • Quantum ChromoDynamics application • Widespread community use, large allocation • Easy to build, no dependencies, standards conforming • Can be setup to run on wide-range of concurrency • Conjugate gradient algorithm • Physics on a 4D lattice • Local computations are 3x3 complex matrix multiplies, with sparse (indirect) access pattern

MILC: Performance

PARATEC • Parallel Total Energy Code • Plane Wave DFT using custom 3D FFT • 70% of Materials Science Computation at NERSC is done via Plane Wave DFT codes. PARATEC capture the performance of a wide range of codes (VASP, CPMD, PETOT).

PARATEC: Performance • All architectures generally perform well due to computational intensity of code (BLAS3, FFT) • SX8 achieves highest per-processor performance • X1/X1E shows lowest % of peak • Non-vectorizable code much more expensive on X1/X1E (32:1) • Lower bisection bandwidth to computational ratio (4D-hypercube) • X1 Performance is comparable to Itanium2 • Itanium2 outperforms Opteron because • Paratec less sensitive to memory access issues (BLAS3) • Opteron lacks FMA unit • Quadrics shows better scaling of all-to-all at large concurrencies

PMEMD • Particle Mesh Ewald Molecular Dynamics • A F90 code with advanced MPI coding should test compiler and stress asynchronous point to point messaging. • PMEMD is very similar to the MD Engine in AMBER 8.0 used in both chemistry and biosciences • Test system is a 91K atom blood coagulation protein

PMEMD: Performance

Summary

Summary • Average ratio bassi to seaborg is 6.0 for N5 application benchmarks

Benchmark performance on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl

Benchmark performance on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl

Presentation Transcript

IO Best Practices For Franklin Katie Antypas User Services Group Kantypas@lbl NERSC User Group Meeting September 19, 200

Richard Gerber NERSC User Services Group Lead

Benchmark Estimating User Guide

Jonathan Patterson Supportability Services Manager Pennant Canada OmegaPS User Group Meeting

FY 2004 Allocations Francesca Verdier NERSC User Services Fverdier@lbl

A New Parallel Debugger for Franklin: DDT Katie Antypas User Services Group kantypas@lbl

IBM Compiler Optimization on Bassi

Web Services for NGF Access David Skinner deskinner@lbl NERSC User Group Meeting

Bassi IBM POWER 5 p575 Richard Gerber NERSC User Services Group RAGerber@lbl

Experiences Configuring, Validating and Monitoring Bassi Richard Gerber NERSC User Services Group

Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead

THE MEDIA SERVICES GROUP USER GROUP 2007

Using HPS Switch on Bassi Jonathan Carter User Services Group Lead jtcarter@lbl

Large Scale Computing at PDSF Iwona Sakrejda NERSC User Services Group ISakrejda@lbl

Running Jobs on Franklin Richard Gerber NERSC User Services ragerber@lbl

Characterizing NAS Benchmark Performance on Shared Heterogeneous Networks

Third-party software plan Zhengji Zhao NERSC User Services ZZhao@lbl NERSC User Group Meeting

HPSS Update Jason Hick Mass Storage Group Jhick@lbl NERSC User Group Meeting

Lead Generation Benchmark Report

IBM Compiler Optimization on Bassi

NetCDF4 Performance Benchmark