NAMD Parallel Performance on Ranger: MPI Tuning

NAMD Parallel Performance on Ranger: MPI Tuning Philip Blood Scientific Specialist Pittsburgh Supercomputing Center

NAMD • NAMD (NAnoscale Molecular Dynamics) is a highly scalable molecular dynamics code used ubiquitously on the TeraGrid and other HPC systems • Improving parallel performance without touching the code: • Tune NAMD input parameters for specific systems • Tune MPI implementations for NAMD’s message-driven Charm++ parallel framework Image courtesy of Ivaylo Ivanov and J. Andrew McCammon, UCSD

Results of Tuning: Apoa1 Benchmark • 92224 atoms (protein + water) • 1 angstrom PME grid • PME every 4 timesteps (1fs step) • NVE

Results of Initial Tuning: Actual User Simulation

Tuning OpenMPI 1.2.6 to Improve NAMD Scaling on Ranger • It is important that eager message passing is optimized to efficiently handle the message-driven execution used by Charm++ in NAMD. • The following pertains to the OpenMPI 1.2.6 installation and default settings on Ranger: • Set cpu and memory affinity: export OMPI_MCA_mpi_paffinity_alone=1 • Turn off use of RDMA for eager message passing over infiniband: export OMPI_MCA_btl_openib_use_eager_rdma=0 • Increase eager limit over infiniband from 12K to just under 32K: export OMPI_MCA_btl_openib_eager_limit=32767 • setting it one byte shy of 32K is significant for some reason, perhaps because btl_openib_min_send_size is 32768. This and the max send size could be other parameters to try tweaking. • Increase self eager limit: export OMPI_MCA_btl_self_eager_limit=32767 • Increase sm eager limit: export OMPI_MCA_btl_sm_eager_limit=32767. • You may need to reduce this to OMPI_MCA_btl_sm_eager_limit=16384 for higher processor counts if you run out of memory

Tuning MVAPICH 1.0.1 for NAMD on Ranger • For these benchmarks: • Turn off RDMA for short messages: • VIADEV_ADAPTIVE_RDMA_LIMIT=0 • Set overall eager message size limit: • VIADEV_RENDEVOUS_THRESHOLD=50000 • Set eager limit and buffer size for intranode (shared-memory) communication: • VIADEV_SMP_EAGERSIZE=64 • VIADEV_SMPI_LENGTH_QUEUE=256 • NO tacc_affinity: at some processor counts tacc_affinity may help • For higher processor counts may need to adjust VIADEV_SMP_EAGERSIZE and VIADEV_SMPI_LENGTH_QUEUE

NAMD Parallel Performance on Ranger: MPI Tuning

NAMD Parallel Performance on Ranger: MPI Tuning

Presentation Transcript

Introduction to High Performance Computing: Parallel Computing, Distributed Computing, Grid Computing and More

Solaris/Linux Performance Measurement and Tuning

ASE133: Performance Tuning of ASE with special emphasis on Linux

Parallel Curriculum Model

Tuning Fork Tests

Parallels

ASE106: Tuning ASE for PeopleSoft Applications

Microsoft ® SQL Server ™ 2000 Performance

Performance Tuning Workshop - Architecture

An Introduction to Parallel Processing

Parallel Graph Algorithms

Computational Physics An Introduction to High-Performance Computing

Parallel Algorithms and Computing Selected topics

Parallel HDF5

Optimization of Java-Like Languages for Parallel and Distributed Environments

Introduction

Introduction to Supercomputers, Architectures and High Performance Computing

Performance Tuning Tips

Parallel Performance Analysis with Open|SpeedShop Half Day Tutorial @ SC 2008 Austin, TX

Introduction to Parallel Computing

Parallel Algorithms and Computing Selected topics