170 likes | 288 Views
Advanced User Support Project Overview Adrian E. Roitberg University of Florida July 2nd 2009. By Ross C. Walker. Project Overview Improvements to the AMBER MD Code. Project composed of 3 distinct sections: QM/MM performance, accuracy and feature improvements. (SANDER)
E N D
Advanced User SupportProject Overview Adrian E. RoitbergUniversity of Florida July 2nd 2009 By Ross C. Walker
Project OverviewImprovements to the AMBER MD Code • Project composed of 3 distinct sections: • QM/MM performance, accuracy and feature improvements. (SANDER) • Classical AMBER MD performance improvements. (PMEMD) • GPU acceleration. (PMEMD)
QM/MM Improvements Performance • Amber QM/MM performance in parallel / serial • Looking at new / novel approaches for QM/MM in parallel. • predictive scf. • parallel Fock matrix diagonalization. • automated selection of optimum routines. • Faster distance algorithms for variable QM solvent models. Features (Requiring Parallel Support) • QM/MM Replica Exchange Simulations. (Complete) • QM/MM Thermodynamic Integration. (Work in progress)
Automated Optimum Routine Selection | QMMM: *** Diagonalization Routine Information *** | QMMM: Pseudo diagonalizations are allowed. | QMMM: Auto diagonalization routine selection is in use. | QMMM: | QMMM: Timing diagonalization routines: | QMMM: norbs = 168 | QMMM: diag iterations used for timing = 50 | QMMM: | QMMM: Internal diag routine = 1.03 seconds | QMMM: Dspev diag routine = 1.05 seconds | QMMM: Dspevd diag routine = 0.87 seconds | QMMM: Dspevx diag routine = 2.55 seconds | QMMM: Dsyev diag routine = 0.98 seconds | QMMM: Dsyevd diag routine = 0.75 seconds | QMMM: Dsyevr diag routine = 0.52 seconds | QMMM: | QMMM: Pseudo diag routine = 0.38 seconds | QMMM: | QMMM: Using dsyevr routine (diag_routine=7). Latest version also selects optimum SMP threads to use in hybrid MPI/SMP mode.
Replica Exchange Simulations • Low T poorly converged • Replica exchange simulations: MD runs over range of T • Periodically swap structures • Faster convergence • No direct kinetics • Populations as a function of T
Science Example • Manuscript just accepted (without revision!) • Seabra, G.M., Walker, R.C., Roitberg, A.E., “Are Current Semi-Empirical Methods Better than Force Fields? A Study from the Thermodynamics Perspective”, J. Phys. Chem. A., 2009, in press. • Critical evaluation of the accuracy of semi-empirical QM approaches. • 1.6 microseconds of QM/MM MD using REMD completed for various polypeptides. • Calculations run on NICS Kraken-XT5 • Unprecedented timescales for QM/MM simulations.
Classical MD Performance • Optimization of PMEMD for NICS Kraken-XT5 and Athena-XT4. • Optimizations aimed at Large (>100K atom systems), on high cpu counts (> 512 cpus). • Precompilation for fixed system size • Provides fixed vector lengths. • New parallel random number generator for Langevin dynamics. Performance improvement from 6.2ns/day to 16.3 ns/day for Neuraminidase (125K atoms) on NICS Athena-XT4 (1024 cpus).
Neuraminidase Simulations • H1N1 Swine Flu. • Collaborative project between: • Ross Walker (SDSC) • Adrian Roitberg (UFL) • Rommie Amaro (UC Irvine) • Andy McCammon (UCSD) • ca. 125K atoms each. • 100ns per system run in 2 weeks on NICS Athena.(Previously this would have been impossible)
GPU Support • Collaboration with NVIDIA to produce CUDA version of AMBER. • PMEMD Engine • Implicit Solvent GB (V1.0 complete) • Explicit Solvent PME (in progress) • Focus on accuracy. • It MUST pass the AMBER regression tests. • Energy conservation MUST be comparable to double precision CPU code.
GPU Accuracy • Use of double precision in all places severely limits performance. • Make careful use of double precision where needed. • Calculate in single precision. • Accumulate in double precision. • Avoid large dynamic ranges in arithmetic expressions. • Switch over to double precision automatically if the dynamic range is too high.
Provisional Performance • 1 x Tesla C1060 (Same as NCSA Lincoln) • Multi-GPU support is work in progress. Timings ------- System Wall Time (s) NS/day ACE-ALA3-NME(GPU) 48.57 355.92 ACE-ALA3-NME(4xCPU)* 34.78 496.84 (*Note 4 cpu since we need 10x > atoms than processors) TRPCage(GPU) 62.48 276.57 TRPCage(8xCPU) 150.84 114.56 Myoglobin(GPU) 62.22 27.72 Myoglobin(8xCPU) 392.34 4.04 Nucleosome(GPU) 332.03 0.520 Nucleosome(8xCPU) 2877.60 0.060 Cray XT5 (1024xCPU) 175.49 0.985