HFODD for Leadership Class Computers

HFODD for Leadership Class Computers N. Schunck, J. McDonnell, Hai Ah Nam

HFODD

HFODD for Leadership Class Computers DFT AND HPC COMPUTING

Classes of DFT solvers • Coordinate-space:Direct integration of the HFB equations • Accurate: provide “exact” result • Slow and CPU/memory intensive for 2D-3D geometries • Configuration space: Expansion of the solutions on a basis (HO) • Fast and amenable to beyond mean-field extensions • Truncation effects: source of divergences/renormalization issues • Wrong asymptotic unless different bases are used (WS, PTG, Gamow, etc.) Resources needed for a “standard HFB” calculation

Why High Performance Computing? Core of DFT: Global theory which averages out individual degrees of freedom • From light nuclei to neutron stars • Rich physics • Fast and reliable • Treatment of correlations ? • ~100 keV level precision ? • Extrapolability ? g.s. of even nucleus can be computed in a matter of minutes on a standard laptop: why bother with supercomputing? • Large-scale DFT • Static: fission, shape coexistence, etc. – compute > 100k different configurations • Dynamics: restoration of broken symmetries, correlations, time-dependent problems – combine > 100k configurations • Optimization of extended functionals on larger sets of experimental data

Computational Challenges for DFT • Self-consistency = iterative process: • Not naturally prone to parallelization (suggests: lots of thinking…) • Computational cost : (number of iterations) × (cost of one iteration) O(everything else) • Cost of symmetry breaking: triaxiality, reflection asymmetry, time-reversal invariance • Large dense matrices(LAPACK) constructed and diagonalized many times – size of the order of (2,000 x 2,000) – (10,000 x 10,000) (suggests: message passing) • Many long loops (suggests: threading) • Finite-range forces/non-local functionals: exact Coulomb, Yukawa-, Gogny-like • Many nestedloops (suggests: threading) • Precision issues

HFODD • Solve HFB equations in the deformed, Cartesian HO basis • Breaks all symmetries (if needed) • Zero-range and finite-range forces coded • Additional features: cranking, angular momentum projection, etc. • Technicalities: • Fortran 77, Fortran 90 • BLAS, LAPACK • I/O with standard input/output + a few files Redde Caesari quae sunt Caesaris

HFODD for Leadership Class Computers OPTIMIZATIONS

Loop reordering • Fortran: matrices are stored in memory column-wise  elements must be accessed first by column index, then by row index (good stride) • Cost of bad stride grows quickly with number of indexes and dimensions Ex.: Accessing Mijk do i = 1, N do j = 1, N do k = 1, N do k = 1, N do j = 1, N do i = 1, N Time of 10 HF iterations as function of the model space (Skyrme SLy4, 208Pn, HF, exact Coulomb exchange)

Threading (OpenMP) • OpenMP designed to auto-matically parallelize loops • Ex: calculation of density matrix in HO basis • Solutions: • Thread it with OpenMP • When possible, replace all such manual linear algebra with BLAS/LAPACK calls (threaded version exist) do j = 1, N do i = 1, N do  = 1, N Time of 10 HFB iterations as function of the number of threads (Jaguar Cray XT5 – Skyrme SLy4, 152Dy, HFB, 14 full shells)

Parallel Performance (MPI) • DFT = naturally parallel • 1 core = 1 configuration (only if ‘all’ fits into core) • HFODD characteristics • Very little communication overhead • Lots of I/O per processor (specific to that processor): 3 ASCII files/core • Scalability limited by: • File system performance • Usability of the results (handling of thousands of files) • ADIOS library being implemented Time of 10 HFB iterations as function of the cores (Jaguar Cray XT5, no threads – Skyrme SLy4, 152Dy, HFB, 14 full shells)

ScaLAPACK M M M M • Multi-threading: more memory/core available • How about scalability of diagonalization for large model spaces? • ScaLAPACK successfully implemented for simplex-breaking HFB calculations (J. McDonnell) • Current issues: • Needs detailed profiling as no speed-up is observed: bottleneck? • Problem size adequate?

Hybrid MPI/OpenMP Parallel Model • Spread the HFB calculation across a few cores (<12-24) • MPI for task management Threading (OpenMP) 1 HFB calculation MPI sub-communicator (optional) for very large bases needing ScaLapack ScaLAPACK (MPI) Task management (MPI) Cores Threads for loop optimization Time HFB - i/N HFB - (i+1)/N

Conclusions • DFT codes are naturally parallel and can easily scale to 1 M processors or more • High-precision applications of DFT are time- and memory-consuming computations  need for fine-grain parallelization • HFODD benefits from HPC techniques and code examination • Loop-reordering give N ≫1 speed-up (Coulomb exchange: N ~ 3, Gogny force, N ~ 8) • Multi-threading gives extra factor > 2 (only a few routines have been upgraded) • ScaLAPACK implemented: very large bases (Nshell > 25) can now be used (Ex.: near scission) • Scaling only average on standard Jaguar file system because of un-optimized I/O

Year 4 – 5 Roadmap • Year 4 • More OpenMP, debugging of ScaLAPACK routine • First tests of ADIOS library (at scale) • First development of a prototype python visualization interface • Tests of large-scale, I/O-briddled, multi-constrained calculations • Year 5 • Full implementation of ADIOS • Set up framework for automatic restart (at scale) • SVN repository (ask Mario for account) http://www.massexplorer.org/svn/HFODDSVN/trunk

HFODD for Leadership Class Computers

HFODD for Leadership Class Computers

Presentation Transcript

Computers for Beginners

Computers for Today

Arithmetic for Computers

Accelerated Leadership Class (ALC)

Arithmetic for Computers

Using Your Students/Leadership Class

Computers For Learning

Leadership Class

Arithmetic for Computers

“CLASS” The California Leadership Alliance for Student Success

Professional and Ethical Leadership Class

IT Leadership Class Activities

Arithmetic For Computers

Computers for Beginners

DFT requirements for leadership-class computers

Leadership Cullman Class

CACO’s Leadership Development Training Curriculum for Class Officers

Computers for You

Computers for Sale

Arithmetic for Computers

Arithmetic for Computers

DFT requirements for leadership-class computers