1 / 57

Opportunities for Biological Consortia on HPC x Code Capabilities and Performance

Opportunities for Biological Consortia on HPC x Code Capabilities and Performance. HPCx and CCP Staff http://www.ccp.ac.uk/ http://www.hpcx.ac.uk/. Welcome to the Meeting. Background HPCx Objectives to consider whether there is a case to bid Agenda Introduction to the HPCx service

nevaeh
Download Presentation

Opportunities for Biological Consortia on HPC x Code Capabilities and Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Opportunities for Biological Consortia on HPCxCode Capabilities and Performance HPCx and CCP Staff http://www.ccp.ac.uk/ http://www.hpcx.ac.uk/

  2. Welcome to the Meeting • Background • HPCx • Objectives • to consider whether there is a case to bid • Agenda • Introduction to the HPCx service • Overview of Code Performance • Contributed Presentations • Invited Presentation - • Discussion HPCx/Biology Discussions

  3. Outline • Overview of Code Capabilities and Performance • Macromolecular simulation • DL_POLY, AMBER, CHARMM, NAMD • Localised basis molecular codes • Gaussian, GAMESS-UK, NWChem • Local basis periodic code • CRYSTAL • Plane wave periodic codes • CASTEP • CPMD (Alessandro Curioni talk) • Note - consortium activity is not limited to these codes. HPCx/Biology Discussions

  4. The DL_POLY Molecular Dynamics Simulation Package Bill Smith

  5. DL_POLY Background • General purpose parallel MD code • Developed at Daresbury Laboratory for CCP5 1994-today • Available free of charge (under licence) to University researchers world-wide • DL_POLY versions: • DL_POLY_2 • Replicated Data, up to 30,000 atoms • Full force field and molecular description • DL_POLY_3 • Domain Decomposition, up to 1,000,000 atoms • Full force field but no rigid body description. HPCx/Biology Discussions

  6. DL_POLY Force Field • Intermolecular forces • All common van de Waals potentials • Sutton Chen many-body potential • 3-body angle forces (SiO2) • 4-body inversion forces (BO3) • Tersoff potential -> Brenner • Intramolecular forces • Bonds, angle, dihedrals, inversions • Coulombic forces • Ewald* & SPME (3D), HK Ewald* (2D), Adiabatic shell model, Reaction field, Neutral groups*, Truncated Coulombic, • Externally applied field • Walled cells,electric field,shear field, etc * Not in DL_POLY_3 HPCx/Biology Discussions

  7. Boundary Conditions • None (e.g. isolated macromolecules) • Cubic periodic boundaries • Orthorhombic periodic boundaries • Parallelepiped periodic boundaries • Truncated octahedral periodic boundaries* • Rhombic dodecahedral periodic boundaries* • Slabs (i.e. x,y periodic, z nonperiodic) HPCx/Biology Discussions

  8. Algorithms and Ensembles Algorithms • Verlet leapfrog • RD-SHAKE • Euler-Quaternion* • QSHAKE* • [All combinations] *Not in DL_POLY_3 Ensembles • NVE • Berendsen NVT • Hoover NVT • Evans NVT • Berendsen NPT • Hoover NPT • Berendsen NT • Hoover NT HPCx/Biology Discussions

  9. B A C D Migration from Replicated to Distributed data DL_POLY-3 : Domain Decomposition • Distribute atoms, forces across the nodes • More memory efficient, can address much larger cases (105-107) • Shake and short-ranges forces require only neighbour communication • communications scale linearly with number of nodes • Coulombic energy remains global • strategy depends on problem and machine characteristics • Adopt Smooth Particle Mesh Ewald scheme • includes Fourier transform smoothed charge density (reciprocal space grid typically 64x64x64 - 128x128x128) An alternative FFT algorithm has been designed to reduce communication costs HPCx/Biology Discussions

  10. Conventional routines (e.g. fftw) assume plane or column distributions A global transpose of the data is required to complete the 3D FFT and additional costs are incurred re-organising the data from the natural block domain decomposition. An alternative FFT algorithm has been designed to reduce communication costs. the 3D FFT are performed as a series of 1D FFTs, each involving communications only between blocks in a given column More data is transferred, but in far fewer messages Rather than all-to-all, the communications are column-wise only Migration from Replicated to Distributed data DL_POLY-3: Coulomb Energy Evaluation Plane Block HPCx/Biology Discussions

  11. DL_POLY_2 & 3 Differences • Rigid bodies not in _3 • MSD not in _3 • Tethered atoms not in _3 • Standard Ewald not in _3 • HK_Ewald not in _3 • DL_POLY_2 I/O files work in _3 but NOT vice versa • No multiple timestep in _3 HPCx/Biology Discussions

  12. DL_POLY_2 Developments • DL_MULTI - Distributed multipoles • DL_PIMD - Path integral (ionics) • DL_HYPE - Rare event simulation • DL_POLY - Symplectic versions 2/3 • DL_POLY - Multiple timestep • DL_POLY - F90 re-vamp HPCx/Biology Discussions

  13. DL_POLY_3 on HPCx • Test case 1 (552960 atoms, 300Dt) • NaKSi2O5 - disilicate glass • SPME (1283grid)+3 body terms, 15625 LC) • 32-512 processors (4-64 nodes) HPCx/Biology Discussions

  14. DL_POLY_3 on HPCx • Test case 2 (792960 atoms, 10Dt) • 64xGramicidin(354)+256768 H2O • SHAKE+SPME(2563 grid),14812 LC • 16-256 processors (2-32 nodes) HPCx/Biology Discussions

  15. DL_POLY People • Bill Smith DL_POLY_2 & _3 & GUI • w.smith@dl.ac.uk • Ilian Todorov DL_POLY_3 • i.t.todorov@dl.ac.uk • Maurice Leslie DL_MULTI • m.leslie@dl.ac.uk • Further Information: • W. Smith and T.R. Forester, J. Molec. Graphics, (1996), 14, 136 • http://www.cse.clrc.ac.uk/msi/software/DL_POLY/index.shtml • W. Smith, C.W. Yong, P.M. Rodger,Molecular Simulation (2002), 28, 385 HPCx/Biology Discussions

  16. AMBER, NAMD and Gaussian Lorna Smith and Joachim Hein

  17. AMBER • AMBER (Assisted Model Building with Energy Refinement) • A molecular dynamics program, particularly for biomolecules • Weiner and Kollman, University of California, 1981. • Current version – AMBER7 • Widely used suite of programs • Sander, Gibbs, Roar • Main program for molecular dynamics: Sander • Basic energy minimiser and molecular dynamics • Shared memory version – only for SGI and Cray • MPI version: master / slave, replicated data model HPCx/Biology Discussions

  18. AMBER - Initial Scaling Factor IX protein with Ca++ ions – 90906 atoms HPCx/Biology Discussions

  19. Current developments - AMBER • Bob Duke • Developed a new version of Sander on HPCx • Originally called AMD (Amber Molecular Dynamics) • Renamed PMEMD (Particle Mesh Ewald Molecular Dynamics) • Substantial rewrite of the code • Converted to Fortran90, removed multiple copies of routines,… • Likely to be incorporated into AMBER8 • We are looking at optimising the collective communications – the reduction / scatter HPCx/Biology Discussions

  20. Optimisation – PMEMD HPCx/Biology Discussions

  21. NAMD • NAMD • molecular dynamics code designed for high-performance simulation of large biomolecular systems. • Theoretical and Computational Biophysics Group, University of Illinois at Urbana-Champaign. • Versions 2.4, 2.5b and 2.5 available on HPCx • One of the first codes to be awarded a capability incentive rating – bronze HPCx/Biology Discussions

  22. NAMD Performance Benchmarks from Prof Peter Coveney TCR-peptide-MHC system HPCx/Biology Discussions

  23. NAMD Performance HPCx/Biology Discussions

  24. Molecular Simulation - NAMD Scaling http://www.ks.uiuc.edu/Research/namd/ Parallel, object-oriented MD code High-performance simulation of large biomolecular systems Scales to 100’s of processors on high-end parallel platforms Speedup standard NAMD ApoA-I benchmark, a system comprising 92,442 atoms, with 12Å cutoff and PME every 4 time steps. scalability improves with larger simulations - speedup of 778 on 1024 CPUs of TCS-1 in a 327K particle simulation of F1-ATPase. Number of CPUs HPCx/Biology Discussions

  25. Performance Comparison • Performance comparison between AMBER, CHARMM and NAMD • See: http://www.scripps.edu/brooks/Benchmarks/ • Benchmark • dihydrofolate reductase protein in an explicit water bath with cubic periodic boundary conditions. • 23,558 atoms HPCx/Biology Discussions

  26. Performance HPCx/Biology Discussions

  27. Gaussian • Gaussian 03 • Performs semi-empirical and ab initio molecular orbital calulcations. • Gaussian Inc, www.gaussian.com • Shared memory version available on HPCx • Limited to the size of a logical partition (8 processors) • Phase 2 upgrade will allow access to 32 processors • Task farming option HPCx/Biology Discussions

  28. CRYSTAL and CASTEP Ian Bush and Martin Plummer

  29. Crystal • Electronic structure and related properties of periodic systems • All electron, local Gaussian basis set, DFT and Hartree-Fock • Under continuous development since 1974 • Distributed to over 500 sites world wide • Developed jointly by Daresbury and the University of Turin HPCx/Biology Discussions

  30. Crystal Functionality • Basis Set • LCAO - Gaussians • All electron or pseudopotential • Hamiltonian • Hartree-Fock (UHF, RHF) • DFT (LSDA, GGA) • Hybrid funcs (B3LYP) • Techniques • Replicated data parallel • Distributed data parallel • Forces • Structural optimization • Direct SCF • Visualisation • AVS GUI (DLV) Properties Energy Structure Vibrations (phonons) Elastic tensor Ferroelectric polarisation Piezoelectric constants X-ray structure factors Density of States / Bands Charge/Spin Densities Magnetic Coupling Electrostatics (V, E, EFG classical) Fermi contact (NMR) EMD (Compton, e-2e) HPCx/Biology Discussions

  31. Benchmark Runs on Crambin • Very small protein from Crambe Abyssinica - 1284 atoms per unit cell • Initial studies using STO3G (3948 basis functions) • Improved to 6-31G * * (12354 functions) • All calculations Hartree-Fock • As far as we know the largest HF calculation ever converged HPCx/Biology Discussions

  32. Crambin - Parallel Performance • Fit measured data to Amdahl’s law to obtain estimate of speed up • Increasing the basis set size increases the scalability • About 700 speed up on 1024 processors for 6-31G * * • Takes about 3 hours instead of about 3 months • 99.95% parallel HPCx/Biology Discussions

  33. Results – Electrostatic Potential • Charge density isosurface coloured according to potential • Useful to determine possible chemically active groups HPCx/Biology Discussions

  34. Futures - Rusticyanin • Rusticyanin (Thiobacillus Ferrooxidans) has 6284 atoms and is involved in redox processes • We have just started calculations using over 33000 basis functions • In collaboration with S.Hasnain (DL) we want to calculate redox potentials for rusticyanin and associated mutants HPCx/Biology Discussions

  35. What is Castep? • First principles (DFT) materials simulation code • electronic energy • geometry optimization • surface interactions • vibrational spectra • materials under pressure, chemical reactions • molecular dynamics • Method (direct minimization) • plane wave expansion of valence electrons • pseudopotentials for core electrons HPCx/Biology Discussions

  36. HPCx: biological applications • Examples currently include: • NMR of proteins • hydroxyapatite (major component of bone) • chemical processes following stroke • Possibility of treating systems with a few hundred atoms on HPCx • May be used in conjunction with classical codes (eg DL_POLY) for detailed QM treatment of ‘features of interest’ HPCx/Biology Discussions

  37. Castep 2003 HPCx performance gain HPCx/Biology Discussions

  38. Castep 2003 HPCx performance gain HPCx/Biology Discussions

  39. HPCx: biological applications • Castep (version 2) is written by: • M Segall, P Lindan, M Probert C Pickard, P Hasnip, S Clark, K Refson, V Milman, B Montanari, M Payne. • ‘Easy’ to understand top-level code. • Castep is fully maintained and supported on HPCx • Castep is distributed by Accelrys Ltd • Castep is licensed free to UK academics by the UKCP consortium (contact ukcp@dl.ac.uk) HPCx/Biology Discussions

  40. CHARMM, NWChem and GAMESS-UK Paul Sherwood

  41. Objectives Highly efficient and portable MPP computational chemistry package Distributed Data - Scalable with respect to chemical system size as well as MPP hardware size Extensible Architecture Object-oriented design abstraction, data hiding, handles, APIs Parallel programming model non-uniform memory access, global arrays Infrastructure GA, Parallel I/O, RTDB, MA, … Wide range of parallel functionality essential for HPCx Tools Global arrays: portable distributed data tool: Used by CCP1 groups (e.g. MOLPRO) PeIGS: parallel eigensolver, guaranteed orthogonality of eigenvectors NWChem Physically distributed data Single, shared data structure HPCx/Biology Discussions

  42. Distributed Data SCF Pictorial representation of the iterative SCF process in (i) a sequential process, and (ii) a distributed data parallel process: MOAO represents the molecular orbitals, P the density matrix and F the Fock or Hamiltonian matrix Sequential Distributed Data HPCx/Biology Discussions

  43. NWChem NWChem Capabilities (Direct, Semi-direct and conventional): • RHF, UHF, ROHF using up to 10,000 basis functions; analytic 1st and 2nd derivatives. • DFTwith a wide variety of local and non-local XC potentials, using up to 10,000 basis functions; analytic 1st and 2nd derivatives. • CASSCF; analytic 1st and numerical 2nd derivatives. • Semi-direct and RI-based MP2 calculations for RHF and UHF wave functions using up to 3,000 basis functions; analytic 1st derivatives and numerical 2nd derivatives. • Coupled cluster, CCSD and CCSD(T) using up to 3,000 basis functions; numerical 1st and 2nd derivatives of the CC energy. • Classical molecular dynamics and free energy simulations with the forces obtainable from a variety of sources HPCx/Biology Discussions

  44. Case Studies - Zeolite Fragments • DFT Calculations with Coulomb Fitting Basis (Godbout et al.) DZVP - O, Si DZVP2 - H Fitting Basis: DGAUSS-A1 - O, Si DGAUSS-A2 - H • NWChem & GAMESS-UK Both codes use auxiliary fitting basis for coulomb energy, with 3 centre 2 electron integrals held in core. Si8O7H18 347/832 Si8O25H18 617/1444 Si26O37H36 1199/2818 Si28O67H30 1687/3928 HPCx/Biology Discussions

  45. DFT Coulomb Fit - NWChem Si26O37H36 1199/2818 Si28O67H30 1687/3928 Measured Time (seconds) Measured Time (seconds) Number of CPUs Number of CPUs HPCx/Biology Discussions

  46. Memory-driven Approaches: NWChem - DFT (LDA): Performance on the IBM SP/p690 Zeolite ZSM-5 • DZVP Basis (DZV_A2) and Dgauss A1_DFT Fitting basis: AO basis: 3554 CD basis: 12713 • IBM SP/p690) Wall time (13 SCF iterations): 64 CPUs = 9,184 seconds 128 CPUs= 3,966 seconds MIPS R14k-500 CPUs (Teras) Wall time (13 SCF iterations): 64 CPUs = 5,242 seconds 128 CPUs= 3,451 seconds 3-centre 2e-integrals = 1.00 X 10 12 Schwarz screening = 6.54 X 10 9 % 3c 2e-ints. In core = 100% HPCx/Biology Discussions

  47. GAMESS-UK • GAMESS-UK is the general purpose ab initio molecular electronic structure program for performing SCF-, MCSCF- and DFT-gradient calculations, together with a variety of techniques for post Hartree Fock calculations. • The program is derived from the original GAMESS code, obtained from Michel Dupuis in 1981 (then at the National Resource for Computational Chemistry, NRCC), and has been extensively modified and enhanced over the past decade. • This work has included contributions from numerous authors†, and has been conducted largely at the CCLRC Daresbury Laboratory, under the auspices of the UK's Collaborative Computational Project No. 1 (CCP1). Other major sources that have assisted in the on-going development and support of the program include various academic funding agencies in the Netherlands, and ICI plc. • Additional information on the code may be found from links at: http://www.dl.ac.uk/CFS † M.F. Guest, J.H. Amos, R.J. Buenker, H.J.J. van Dam, M. Dupuis, N.C. Handy, I.H. Hillier, P.J. Knowles, V. Bonacic-Koutecky van Lenthe, J. Kendrick, K. Schoffel & P. Sherwood, with contributions from R.D., W. von Niessen, R.J. Harrison, A.P. Rendell, V.R. Saunders, A.J. Stone and D. Tozer. HPCx/Biology Discussions

  48. GAMESS-UK features 1. • Hartree Fock: • Segmented/ GC + spherical harmonic basis sets • SCF-Energies and Gradients: conventional, in-core, direct • SCF-Frequencies: numerical and analytic 2nd derivatives • Restricted, unrestricted open shell SCF and GVB. • Density Functional Theory • Energies + gradients, conventional and direct including Dunlap fit • B3LYP, BLYP, BP86, B97, HCTH, B97-1, FT97 & LDA functionals • Numerical 2nd derivatives (analytic implementation in testing) • Electron Correlation: • MP2 energies, gradients and frequencies, Multi-reference MP2, MP3 Energies • MCSCF and CASSCF Energies, gradients and numerical 2nd derivatives • MR-DCI Energies, properties and transition moments (semi-direct module) • CCSD and CCSD(T) Energies • RPA (direct) and MCLR excitation energies / oscillator strengths, RPA gradients • Full-CI Energies • Green's functions calculations of IPs. • Valence bond (Turtle) HPCx/Biology Discussions

  49. GAMESS-UK features 2. • Molecular Properties: • Mulliken and Lowdin population analysis, Electrostatic Potential-Derived Charges • Distributed Multipole Analysis, Morokuma Analysis, Multipole Moments • Natural Bond Orbital (NBO) + Bader Analysis • IR and Raman Intensities, Polarizabilities & Hyperpolarizabilities • Solvation and Embedding Effects (DRF) • Relativistic Effects (ZORA) • Pseudopotentials: • Local and non-local ECPs. • Visualisation: tools include CCP1 GUI • Hybrid QM/MM (ChemShell + CHARMM QM/MM) • Semi-empirical : MNDO, AM1, and PM3 hamiltonians • Parallel Capabilities: • MPP and SMP implementations (GA tools) • SCF/DFT energies, gradients, frequencies • MP2 energies and gradients • Direct RPA HPCx/Biology Discussions

  50. Parallel Implementation of GAMESS-UK • Extensive use of Global Array (GA) Tools and Parallel Linear Algebra from NWChem Project (EMSL) • SCF and DFT • Replicated data, but … • GA Tools for caching of I/O for restart and checkpoint files • Storage of 2-centre 2-e integrals in DFT Jfit • Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix) • SCF and DFT second derivatives • Distribution of <vvoo> and <vovo> integrals via GAs • MP2 gradients • Distribution of <vvoo> and <vovo> integrals via Gas • Direct RPA Excited States • Replicated data with parallelisation of direct integral evaluation HPCx/Biology Discussions

More Related