90 likes | 239 Views
Facilitating Efficient Parallel Performance in Classical MD, Coarse-grained MD and Multi-state Empirical Valence Bond Calculations. Philip Blood Scientific Specialist Pittsburgh Supercomputing Center. ASTA Project with Voth Group: 2007-2008. Classical MD: Improving NAMD performance
E N D
Facilitating Efficient Parallel Performance in Classical MD, Coarse-grained MD and Multi-state Empirical Valence Bond Calculations Philip Blood Scientific Specialist Pittsburgh Supercomputing Center
ASTA Project with Voth Group: 2007-2008 • Classical MD: Improving NAMD performance • Coarse-grained MD: Testing performance of new short-range tabulated potential • MS-EVB: Migrate from DL_POLY to more scalable MD code
NAMD • NAMD (NAnoscale Molecular Dynamics) is a highly scalable molecular dynamics code used ubiquitously on the TeraGrid and other HPC systems • Improving parallel performance without touching the code: • Tune NAMD input parameters for specific systems • Tune MPI implementations for NAMD’s message-driven Charm++ parallel framework Image courtesy of Ivaylo Ivanov and J. Andrew McCammon, UCSD
Tuning NAMD Input Parameters • Check input files for options that hurt performance: • too dense PME grid • Steps/cycle too small (affects how often pair lists updated) • Nice summary of tuning procedure here: • http://www.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdPerformanceTuning • In practice, the most important seems to be tuning the PMEPencil decomposition
Tuning OpenMPI Communication Protocols to Improve NAMD Scaling on Ranger • It is important that eager message passing is optimized to efficiently handle the message-driven execution used by Charm++ in NAMD. • The following pertains to the OpenMPI 1.2.6 installation and default settings on Ranger: • Set cpu and memory affinity: export OMPI_MCA_mpi_paffinity_alone=1 • Turn off use of RDMA for eager message passing over infiniband: export OMPI_MCA_btl_openib_use_eager_rdma=0 • Increase eager limit over infiniband from 12K to just under 32K: export OMPI_MCA_btl_openib_eager_limit=32767 • setting it one byte shy of 32K is significant for some reason, perhaps because btl_openib_min_send_size is 32768. This and the max send size could be other parameters to try tweaking. • Increase self eager limit: export OMPI_MCA_btl_self_eager_limit=32767 • Increase sm eager limit: export OMPI_MCA_btl_sm_eager_limit=32767. • You may need to reduce this to OMPI_MCA_btl_sm_eager_limit=16384 for higher processor counts if you run out of memory
Results of OpenMPI Tuning This system runs PME every time step.
MS-EVB • Multi-state Empirical Valence Bond • Method used for simulating proton transport in classical MD simulations • Currently implemented in DL_POLY2: Only scales to 4-8 processors • LAMMPS chosen as new code • Relatively easy to add new features • Good parallel performance http://www.cbms.utah.edu/Research/Protons%20-%20MS-EVB.htm
LAMMPS and MS-EVB While Voth group works on serial implementation: • Identifying opportunities to improve LAMMPS performance • Implemented single-precision FFT in LAMMPS • Improved performance of particle mesh electrostatics calculation by 50% at 1e-5 precision • Especially important for MS-EVB since electrostatics have to be recalculated for every MS-EVB state • Profiling LAMMPS for other opportunities • Working on parallelization strategy • EVB states are independent – parallelize over states
Current ASTA project with Voth Group Team: • Lonnie Crosby: 10% (NICS) • Dodi Heryadi: 10% (NCSA) • John Peterson: 10% (TACC) • Phil Blood: 20% (PSC) • Tasks: • Tune MD codes on new platforms • Help optimize and implement parallel MS-EVB code in LAMMPS • Help optimize implementations of coarse-grained tabulated force fields in MD codes • Optimize workflow across various sites (job submission, data movement and analysis)