570 likes | 874 Views
Opportunities for Biological Consortia on HPC x Code Capabilities and Performance. HPCx and CCP Staff http://www.ccp.ac.uk/ http://www.hpcx.ac.uk/. Welcome to the Meeting. Background HPCx Objectives to consider whether there is a case to bid Agenda Introduction to the HPCx service
E N D
Opportunities for Biological Consortia on HPCxCode Capabilities and Performance HPCx and CCP Staff http://www.ccp.ac.uk/ http://www.hpcx.ac.uk/
Welcome to the Meeting • Background • HPCx • Objectives • to consider whether there is a case to bid • Agenda • Introduction to the HPCx service • Overview of Code Performance • Contributed Presentations • Invited Presentation - • Discussion HPCx/Biology Discussions
Outline • Overview of Code Capabilities and Performance • Macromolecular simulation • DL_POLY, AMBER, CHARMM, NAMD • Localised basis molecular codes • Gaussian, GAMESS-UK, NWChem • Local basis periodic code • CRYSTAL • Plane wave periodic codes • CASTEP • CPMD (Alessandro Curioni talk) • Note - consortium activity is not limited to these codes. HPCx/Biology Discussions
The DL_POLY Molecular Dynamics Simulation Package Bill Smith
DL_POLY Background • General purpose parallel MD code • Developed at Daresbury Laboratory for CCP5 1994-today • Available free of charge (under licence) to University researchers world-wide • DL_POLY versions: • DL_POLY_2 • Replicated Data, up to 30,000 atoms • Full force field and molecular description • DL_POLY_3 • Domain Decomposition, up to 1,000,000 atoms • Full force field but no rigid body description. HPCx/Biology Discussions
DL_POLY Force Field • Intermolecular forces • All common van de Waals potentials • Sutton Chen many-body potential • 3-body angle forces (SiO2) • 4-body inversion forces (BO3) • Tersoff potential -> Brenner • Intramolecular forces • Bonds, angle, dihedrals, inversions • Coulombic forces • Ewald* & SPME (3D), HK Ewald* (2D), Adiabatic shell model, Reaction field, Neutral groups*, Truncated Coulombic, • Externally applied field • Walled cells,electric field,shear field, etc * Not in DL_POLY_3 HPCx/Biology Discussions
Boundary Conditions • None (e.g. isolated macromolecules) • Cubic periodic boundaries • Orthorhombic periodic boundaries • Parallelepiped periodic boundaries • Truncated octahedral periodic boundaries* • Rhombic dodecahedral periodic boundaries* • Slabs (i.e. x,y periodic, z nonperiodic) HPCx/Biology Discussions
Algorithms and Ensembles Algorithms • Verlet leapfrog • RD-SHAKE • Euler-Quaternion* • QSHAKE* • [All combinations] *Not in DL_POLY_3 Ensembles • NVE • Berendsen NVT • Hoover NVT • Evans NVT • Berendsen NPT • Hoover NPT • Berendsen NT • Hoover NT HPCx/Biology Discussions
B A C D Migration from Replicated to Distributed data DL_POLY-3 : Domain Decomposition • Distribute atoms, forces across the nodes • More memory efficient, can address much larger cases (105-107) • Shake and short-ranges forces require only neighbour communication • communications scale linearly with number of nodes • Coulombic energy remains global • strategy depends on problem and machine characteristics • Adopt Smooth Particle Mesh Ewald scheme • includes Fourier transform smoothed charge density (reciprocal space grid typically 64x64x64 - 128x128x128) An alternative FFT algorithm has been designed to reduce communication costs HPCx/Biology Discussions
Conventional routines (e.g. fftw) assume plane or column distributions A global transpose of the data is required to complete the 3D FFT and additional costs are incurred re-organising the data from the natural block domain decomposition. An alternative FFT algorithm has been designed to reduce communication costs. the 3D FFT are performed as a series of 1D FFTs, each involving communications only between blocks in a given column More data is transferred, but in far fewer messages Rather than all-to-all, the communications are column-wise only Migration from Replicated to Distributed data DL_POLY-3: Coulomb Energy Evaluation Plane Block HPCx/Biology Discussions
DL_POLY_2 & 3 Differences • Rigid bodies not in _3 • MSD not in _3 • Tethered atoms not in _3 • Standard Ewald not in _3 • HK_Ewald not in _3 • DL_POLY_2 I/O files work in _3 but NOT vice versa • No multiple timestep in _3 HPCx/Biology Discussions
DL_POLY_2 Developments • DL_MULTI - Distributed multipoles • DL_PIMD - Path integral (ionics) • DL_HYPE - Rare event simulation • DL_POLY - Symplectic versions 2/3 • DL_POLY - Multiple timestep • DL_POLY - F90 re-vamp HPCx/Biology Discussions
DL_POLY_3 on HPCx • Test case 1 (552960 atoms, 300Dt) • NaKSi2O5 - disilicate glass • SPME (1283grid)+3 body terms, 15625 LC) • 32-512 processors (4-64 nodes) HPCx/Biology Discussions
DL_POLY_3 on HPCx • Test case 2 (792960 atoms, 10Dt) • 64xGramicidin(354)+256768 H2O • SHAKE+SPME(2563 grid),14812 LC • 16-256 processors (2-32 nodes) HPCx/Biology Discussions
DL_POLY People • Bill Smith DL_POLY_2 & _3 & GUI • w.smith@dl.ac.uk • Ilian Todorov DL_POLY_3 • i.t.todorov@dl.ac.uk • Maurice Leslie DL_MULTI • m.leslie@dl.ac.uk • Further Information: • W. Smith and T.R. Forester, J. Molec. Graphics, (1996), 14, 136 • http://www.cse.clrc.ac.uk/msi/software/DL_POLY/index.shtml • W. Smith, C.W. Yong, P.M. Rodger,Molecular Simulation (2002), 28, 385 HPCx/Biology Discussions
AMBER, NAMD and Gaussian Lorna Smith and Joachim Hein
AMBER • AMBER (Assisted Model Building with Energy Refinement) • A molecular dynamics program, particularly for biomolecules • Weiner and Kollman, University of California, 1981. • Current version – AMBER7 • Widely used suite of programs • Sander, Gibbs, Roar • Main program for molecular dynamics: Sander • Basic energy minimiser and molecular dynamics • Shared memory version – only for SGI and Cray • MPI version: master / slave, replicated data model HPCx/Biology Discussions
AMBER - Initial Scaling Factor IX protein with Ca++ ions – 90906 atoms HPCx/Biology Discussions
Current developments - AMBER • Bob Duke • Developed a new version of Sander on HPCx • Originally called AMD (Amber Molecular Dynamics) • Renamed PMEMD (Particle Mesh Ewald Molecular Dynamics) • Substantial rewrite of the code • Converted to Fortran90, removed multiple copies of routines,… • Likely to be incorporated into AMBER8 • We are looking at optimising the collective communications – the reduction / scatter HPCx/Biology Discussions
Optimisation – PMEMD HPCx/Biology Discussions
NAMD • NAMD • molecular dynamics code designed for high-performance simulation of large biomolecular systems. • Theoretical and Computational Biophysics Group, University of Illinois at Urbana-Champaign. • Versions 2.4, 2.5b and 2.5 available on HPCx • One of the first codes to be awarded a capability incentive rating – bronze HPCx/Biology Discussions
NAMD Performance Benchmarks from Prof Peter Coveney TCR-peptide-MHC system HPCx/Biology Discussions
NAMD Performance HPCx/Biology Discussions
Molecular Simulation - NAMD Scaling http://www.ks.uiuc.edu/Research/namd/ Parallel, object-oriented MD code High-performance simulation of large biomolecular systems Scales to 100’s of processors on high-end parallel platforms Speedup standard NAMD ApoA-I benchmark, a system comprising 92,442 atoms, with 12Å cutoff and PME every 4 time steps. scalability improves with larger simulations - speedup of 778 on 1024 CPUs of TCS-1 in a 327K particle simulation of F1-ATPase. Number of CPUs HPCx/Biology Discussions
Performance Comparison • Performance comparison between AMBER, CHARMM and NAMD • See: http://www.scripps.edu/brooks/Benchmarks/ • Benchmark • dihydrofolate reductase protein in an explicit water bath with cubic periodic boundary conditions. • 23,558 atoms HPCx/Biology Discussions
Performance HPCx/Biology Discussions
Gaussian • Gaussian 03 • Performs semi-empirical and ab initio molecular orbital calulcations. • Gaussian Inc, www.gaussian.com • Shared memory version available on HPCx • Limited to the size of a logical partition (8 processors) • Phase 2 upgrade will allow access to 32 processors • Task farming option HPCx/Biology Discussions
CRYSTAL and CASTEP Ian Bush and Martin Plummer
Crystal • Electronic structure and related properties of periodic systems • All electron, local Gaussian basis set, DFT and Hartree-Fock • Under continuous development since 1974 • Distributed to over 500 sites world wide • Developed jointly by Daresbury and the University of Turin HPCx/Biology Discussions
Crystal Functionality • Basis Set • LCAO - Gaussians • All electron or pseudopotential • Hamiltonian • Hartree-Fock (UHF, RHF) • DFT (LSDA, GGA) • Hybrid funcs (B3LYP) • Techniques • Replicated data parallel • Distributed data parallel • Forces • Structural optimization • Direct SCF • Visualisation • AVS GUI (DLV) Properties Energy Structure Vibrations (phonons) Elastic tensor Ferroelectric polarisation Piezoelectric constants X-ray structure factors Density of States / Bands Charge/Spin Densities Magnetic Coupling Electrostatics (V, E, EFG classical) Fermi contact (NMR) EMD (Compton, e-2e) HPCx/Biology Discussions
Benchmark Runs on Crambin • Very small protein from Crambe Abyssinica - 1284 atoms per unit cell • Initial studies using STO3G (3948 basis functions) • Improved to 6-31G * * (12354 functions) • All calculations Hartree-Fock • As far as we know the largest HF calculation ever converged HPCx/Biology Discussions
Crambin - Parallel Performance • Fit measured data to Amdahl’s law to obtain estimate of speed up • Increasing the basis set size increases the scalability • About 700 speed up on 1024 processors for 6-31G * * • Takes about 3 hours instead of about 3 months • 99.95% parallel HPCx/Biology Discussions
Results – Electrostatic Potential • Charge density isosurface coloured according to potential • Useful to determine possible chemically active groups HPCx/Biology Discussions
Futures - Rusticyanin • Rusticyanin (Thiobacillus Ferrooxidans) has 6284 atoms and is involved in redox processes • We have just started calculations using over 33000 basis functions • In collaboration with S.Hasnain (DL) we want to calculate redox potentials for rusticyanin and associated mutants HPCx/Biology Discussions
What is Castep? • First principles (DFT) materials simulation code • electronic energy • geometry optimization • surface interactions • vibrational spectra • materials under pressure, chemical reactions • molecular dynamics • Method (direct minimization) • plane wave expansion of valence electrons • pseudopotentials for core electrons HPCx/Biology Discussions
HPCx: biological applications • Examples currently include: • NMR of proteins • hydroxyapatite (major component of bone) • chemical processes following stroke • Possibility of treating systems with a few hundred atoms on HPCx • May be used in conjunction with classical codes (eg DL_POLY) for detailed QM treatment of ‘features of interest’ HPCx/Biology Discussions
Castep 2003 HPCx performance gain HPCx/Biology Discussions
Castep 2003 HPCx performance gain HPCx/Biology Discussions
HPCx: biological applications • Castep (version 2) is written by: • M Segall, P Lindan, M Probert C Pickard, P Hasnip, S Clark, K Refson, V Milman, B Montanari, M Payne. • ‘Easy’ to understand top-level code. • Castep is fully maintained and supported on HPCx • Castep is distributed by Accelrys Ltd • Castep is licensed free to UK academics by the UKCP consortium (contact ukcp@dl.ac.uk) HPCx/Biology Discussions
CHARMM, NWChem and GAMESS-UK Paul Sherwood
Objectives Highly efficient and portable MPP computational chemistry package Distributed Data - Scalable with respect to chemical system size as well as MPP hardware size Extensible Architecture Object-oriented design abstraction, data hiding, handles, APIs Parallel programming model non-uniform memory access, global arrays Infrastructure GA, Parallel I/O, RTDB, MA, … Wide range of parallel functionality essential for HPCx Tools Global arrays: portable distributed data tool: Used by CCP1 groups (e.g. MOLPRO) PeIGS: parallel eigensolver, guaranteed orthogonality of eigenvectors NWChem Physically distributed data Single, shared data structure HPCx/Biology Discussions
Distributed Data SCF Pictorial representation of the iterative SCF process in (i) a sequential process, and (ii) a distributed data parallel process: MOAO represents the molecular orbitals, P the density matrix and F the Fock or Hamiltonian matrix Sequential Distributed Data HPCx/Biology Discussions
NWChem NWChem Capabilities (Direct, Semi-direct and conventional): • RHF, UHF, ROHF using up to 10,000 basis functions; analytic 1st and 2nd derivatives. • DFTwith a wide variety of local and non-local XC potentials, using up to 10,000 basis functions; analytic 1st and 2nd derivatives. • CASSCF; analytic 1st and numerical 2nd derivatives. • Semi-direct and RI-based MP2 calculations for RHF and UHF wave functions using up to 3,000 basis functions; analytic 1st derivatives and numerical 2nd derivatives. • Coupled cluster, CCSD and CCSD(T) using up to 3,000 basis functions; numerical 1st and 2nd derivatives of the CC energy. • Classical molecular dynamics and free energy simulations with the forces obtainable from a variety of sources HPCx/Biology Discussions
Case Studies - Zeolite Fragments • DFT Calculations with Coulomb Fitting Basis (Godbout et al.) DZVP - O, Si DZVP2 - H Fitting Basis: DGAUSS-A1 - O, Si DGAUSS-A2 - H • NWChem & GAMESS-UK Both codes use auxiliary fitting basis for coulomb energy, with 3 centre 2 electron integrals held in core. Si8O7H18 347/832 Si8O25H18 617/1444 Si26O37H36 1199/2818 Si28O67H30 1687/3928 HPCx/Biology Discussions
DFT Coulomb Fit - NWChem Si26O37H36 1199/2818 Si28O67H30 1687/3928 Measured Time (seconds) Measured Time (seconds) Number of CPUs Number of CPUs HPCx/Biology Discussions
Memory-driven Approaches: NWChem - DFT (LDA): Performance on the IBM SP/p690 Zeolite ZSM-5 • DZVP Basis (DZV_A2) and Dgauss A1_DFT Fitting basis: AO basis: 3554 CD basis: 12713 • IBM SP/p690) Wall time (13 SCF iterations): 64 CPUs = 9,184 seconds 128 CPUs= 3,966 seconds MIPS R14k-500 CPUs (Teras) Wall time (13 SCF iterations): 64 CPUs = 5,242 seconds 128 CPUs= 3,451 seconds 3-centre 2e-integrals = 1.00 X 10 12 Schwarz screening = 6.54 X 10 9 % 3c 2e-ints. In core = 100% HPCx/Biology Discussions
GAMESS-UK • GAMESS-UK is the general purpose ab initio molecular electronic structure program for performing SCF-, MCSCF- and DFT-gradient calculations, together with a variety of techniques for post Hartree Fock calculations. • The program is derived from the original GAMESS code, obtained from Michel Dupuis in 1981 (then at the National Resource for Computational Chemistry, NRCC), and has been extensively modified and enhanced over the past decade. • This work has included contributions from numerous authors†, and has been conducted largely at the CCLRC Daresbury Laboratory, under the auspices of the UK's Collaborative Computational Project No. 1 (CCP1). Other major sources that have assisted in the on-going development and support of the program include various academic funding agencies in the Netherlands, and ICI plc. • Additional information on the code may be found from links at: http://www.dl.ac.uk/CFS † M.F. Guest, J.H. Amos, R.J. Buenker, H.J.J. van Dam, M. Dupuis, N.C. Handy, I.H. Hillier, P.J. Knowles, V. Bonacic-Koutecky van Lenthe, J. Kendrick, K. Schoffel & P. Sherwood, with contributions from R.D., W. von Niessen, R.J. Harrison, A.P. Rendell, V.R. Saunders, A.J. Stone and D. Tozer. HPCx/Biology Discussions
GAMESS-UK features 1. • Hartree Fock: • Segmented/ GC + spherical harmonic basis sets • SCF-Energies and Gradients: conventional, in-core, direct • SCF-Frequencies: numerical and analytic 2nd derivatives • Restricted, unrestricted open shell SCF and GVB. • Density Functional Theory • Energies + gradients, conventional and direct including Dunlap fit • B3LYP, BLYP, BP86, B97, HCTH, B97-1, FT97 & LDA functionals • Numerical 2nd derivatives (analytic implementation in testing) • Electron Correlation: • MP2 energies, gradients and frequencies, Multi-reference MP2, MP3 Energies • MCSCF and CASSCF Energies, gradients and numerical 2nd derivatives • MR-DCI Energies, properties and transition moments (semi-direct module) • CCSD and CCSD(T) Energies • RPA (direct) and MCLR excitation energies / oscillator strengths, RPA gradients • Full-CI Energies • Green's functions calculations of IPs. • Valence bond (Turtle) HPCx/Biology Discussions
GAMESS-UK features 2. • Molecular Properties: • Mulliken and Lowdin population analysis, Electrostatic Potential-Derived Charges • Distributed Multipole Analysis, Morokuma Analysis, Multipole Moments • Natural Bond Orbital (NBO) + Bader Analysis • IR and Raman Intensities, Polarizabilities & Hyperpolarizabilities • Solvation and Embedding Effects (DRF) • Relativistic Effects (ZORA) • Pseudopotentials: • Local and non-local ECPs. • Visualisation: tools include CCP1 GUI • Hybrid QM/MM (ChemShell + CHARMM QM/MM) • Semi-empirical : MNDO, AM1, and PM3 hamiltonians • Parallel Capabilities: • MPP and SMP implementations (GA tools) • SCF/DFT energies, gradients, frequencies • MP2 energies and gradients • Direct RPA HPCx/Biology Discussions
Parallel Implementation of GAMESS-UK • Extensive use of Global Array (GA) Tools and Parallel Linear Algebra from NWChem Project (EMSL) • SCF and DFT • Replicated data, but … • GA Tools for caching of I/O for restart and checkpoint files • Storage of 2-centre 2-e integrals in DFT Jfit • Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix) • SCF and DFT second derivatives • Distribution of <vvoo> and <vovo> integrals via GAs • MP2 gradients • Distribution of <vvoo> and <vovo> integrals via Gas • Direct RPA Excited States • Replicated data with parallelisation of direct integral evaluation HPCx/Biology Discussions