120 likes | 277 Views
First-Principles Molecular Dynamics for Petascale Computers. Fran ç ois Gygi Dept of Applied Science, UC Davis fgygi@ucdavis.edu http://eslab.ucdavis.edu Zhaojun Bai Dept of Computer Science, UC Davis Giulia Galli Dept of Chemistry, UC Davis Kwan-Liu Ma Dept of Computer Science, UC Davis.
E N D
First-Principles Molecular Dynamics for Petascale Computers François Gygi Dept of Applied Science, UC Davis fgygi@ucdavis.edu http://eslab.ucdavis.edu Zhaojun Bai Dept of Computer Science, UC Davis Giulia Galli Dept of Chemistry, UC Davis Kwan-Liu Ma Dept of Computer Science, UC Davis Supported by NSF-ITR-HECURA 0749217
The Qbox project • Qbox is a C++/MPI implementation of First-Principles Molecular Dynamics (FPMD) • Qbox includes a quantum mechanical description of electronic structure within Density Functional Theory • Applications to Materials Science, Chemistry, Nanoscience • Software development focuses on large-scale parallelism
Qbox code architecture Qbox ScaLAPACK/PBLAS XercesC (XML parser) BLAS/ATLAS BLACS FFTW lib DGEMM lib MPI http://eslab.ucdavis.edu/software/qbox
Qbox performance results 1 k-point: 108.8 TFlop/s (30% of peak) • Electronic structure of a 1000-atom Molybdenum sample • 12,000 electrons • LLNL BlueGene/L 4 k-points: 187.7 TFlop/s (51% of peak) 8 k-points: 207.3 TFlop/s (56% of peak) 2006 ACM/IEEE Gordon Bell Award for peak performance
Current Qbox availability on Teragrid Platforms • Mercury, NCSA • Cobalt, NCSA • Tungsten, NCSA • BlueGene/L, SDSC • IBM p655, SDSC Other platforms • ANL BG/L • ANL BG/P • NERSC Franklin, Cray XT4 • NCSA Abe
New scalable algorithms for electronic structure calculations • One-sided Jacobi simultaneous diagonalization algorithm used in electronic structure calculations • 64-node dual-dual-core AMD Opteron/Infinipath cluster • 1 rack ANL BlueGene/L
Qbox scalability for nanoscience applications • Electronic structure of a 2260-atom silicon nanowire • Cray-XT4, up to 8k CPUs • Superlinear scaling due to cache effects and size-dependent MPI protocols • 86% parallel efficiency between 2k and 8k CPUs
Qbox parallel I/O strategy • Advanced functions in MPI-IO are not supported by all file systems (MPI_File_write_shared, etc.) • Qbox uses a strategy based on shared file pointer objects • Achieves >700 MB/s write rate for file sizes of 50–250 GB
Analysis of MPI message traffic patterns in Qbox • Multiple traffic patterns are involved during a Qbox simulation • physics kernels • 3D Fourier transforms • ScaLAPACK linear algebra • Logical-to-physical mapping of tasks has a large impact on performance on large platforms (> 4k CPUs) • We are developing instrumentation and visualization tools to analyze message traffic patterns on various interconnect architectures Mapping of 65536 MPI tasks on the 32x32x64 torus of the LLNL BG/L
Analysis of MPI message traffic patterns in Qbox • Screenshot of the message traffic visualization tool showing MPI calls in a ScaLAPACK matrix multiplication (C. Muelder, K-L Ma, UCDavis)
Qbox current developments • Deployment on TeraGrid track-2 platforms • Applications to Nanoscience simulations • G. Galli, Chemistry UCDavis • Specialized linear algebra algorithms • Z. Bai, Computer Science, UCDavis • Visualization • K-L. Ma, Computer Science, UCDavis • Application-specific data compression algorithms • Large dataset management (1010 – 1012 bytes) • XML standards for electronic structure data (http://www.quantum-simulation.org) http://eslab.ucdavis.edu Supported by NSF-ITR-HECURA 0749217