E N D
Order-N DFT calculations with the Conquest codeMichael J. Gillan1,2, T. Miyazaki3 and D. Bowler1,21Physics and Astronomy Department, University College London, U.K.2London Centre for Nanotechnology, University College London, U.K.3Computational Materials Science Center, NIMS, Tsukuba, Japan • Summary of O(N) DFT: the Conquest code • Summary of practical operation of Conquest: linear scaling, parallel scaling • Validation against standard plane-wave codes • Work in progress on Ge/Si hut-clusters CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Si/Ge nanostructures Epitaxial deposition of Ge on the Si (001) surface. There is a strain mismatch of a few percent, and the strain is relieved by formation of missing-row trenches. Evolution of surface structure with increasing coverage... M x N ‘Hut’ clusters Early 2 x N Late 2 x N CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Why traditional methods scale as N2 or worse • There are N Kohn-Sham orbitals • Each Kohn-Sham orbital extends over the entire volume V, so information in each is proportional to N. Hence, amount of information to be calculated is proportional to N 2. • In standard methods, we have to calculate all overlap integrals • Hence, number of operations is proportional to N 3. Prefactor of N 3 is small, so practical dependence is N 2, except for very large systems. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Density matrix is key to O(N) • Density matrix is defined in terms of Kohn-Sham eigenfunctions as: with f n orbital occupation numbers: at T = 0, f n = 0 or 1. The operator is idempotent: its eigenvalues are 0 or 1 (it is a projector). • But Etot in DFT can be expressed entirely in terms of , so we have a variational principle: minimize Etot with respect to density matrix, subject to conditions that : (i) is symmetric; (ii) is idempotent; (iii) gives the correct number of electrons. It is enough to require that be weakly idempotent: its eigenvalues lie between 0 and 1. • Now as . So we get an upper bound to Etot if we minimize Etot with the additional constraint that for . CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Density matrix and localised orbitals • In practice, we cannot work directly with , since it is a function of two vector variables. • Instead, express as: with localized orbitals that vanish outside ‘localization regions’. In practice, take the localization regions spherical, radius Rreg and centred on the atoms. Label specifies different localized orbitals on given atom i. • Matrix is the density matrix in the (non-orthogonal) representation of . • Search for ground state by minimizing Etot with respect to and subject to idempotency and correct electron number. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Enforcing idempotency • In orthogonal tight-binding, the equivalent problem is to minimize: • An effective method is that of Li, Nunes and Vanderbilt, in which: subject to weak idempotency of K, and fixed electron number. with L the ‘auxiliary density matrix’. This works because of the properties of the polynomial • The same scheme is applied in CONQUEST, but in a non-orthogonal version: where CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Summary of CONQUEST • Ground-state search is done as three nested loops: • inner: minimization with respect to auxiliary density matrix , with cut-off distance RL. • middle: self-consistency • outer: minimization with respect to with cut-off distance Rreg • Minimization with respect to at fixed Kohn-Sham potential and fixed is equivalent to non-orthogonal tight-binding; done by a combination of McWeeny purification and Li-Nunes-Vanderbilt. • Self-consistency: reduction of density residual to zero: done by ‘Guaranteed Reduction Pulay’. • Localized orbitals represented by B-splines, or by pseudo-atomic orbitals; minimization by conjugate gradients. • Written as a parallel code from an early stage. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Parallel operation • Main aspects of CONQUEST parallel coding: • Formation of overlap and Hamiltonian matrix elements by integration over grid points: integration grid is divided ito ‘domains’, with one processor responsible for a domain. • Matrix operations: atoms divided into ‘primary sets’, with one processor responsible for rows of matrices associated with one primary set. • Fourier transformation (Hartree potential etc...) is done in parallel. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Practical linear scaling Tests of convergence of total energy with respect to L-matrix cut-off RLand localization-region radius Rreg. Bulk silicon. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Practical parallel scaling Parallel scaling: increasing number of processors, constant number of atoms per processor O(N) scaling: constant number of processors, increasing number of atoms CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Relaxation of Si (001) surface Comparison of relaxed structure of Si (001) surface from non-self-consistent and self-consistent SIESTA and CONQUEST calculations using different basis sets with plane-wave results (VASP). Quantities compared are bond length and tilt angle of surface dimer pair. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Structure and energetics of Ge/Si hut clusters The faces of the hut-clusters are Ge (105) surfaces • Energetics of hut clusters with O(N) electronic-structure calculations: aims of the investigation: • The smallest stable hut-clusters contain ~10,000 atoms of Ge. Previous investigations have used continuum elasticity theory combined with DFT calculations. Energetics of faces and edges is important. • Combination of DFT and elasticity theory is not easy, and it is not known how large the clusters must be for this approach to be valid. • Our aim: to use the hierarchy of electronic-structure methods to test previous calculations. We are investigating the energetics using: tight-binding; non-self-consistent DFT tight-binding, and self-consistent DFT tight-binding. In the future, also full DFT with plane-wave accuracy. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Ge(105) : surface energy by different methods Ge(105): surface energy (LDA) empirical TB : 74.3 meV/Å2 NSC-AITB : 76.5 meV/Å2 SC-AITB : 81.5 meV/Å2 full DFT(blip) : 74.8 meV/Å2 STATE(planewave): 70.0 meV/Å2 Ge/Si(105): surface energy (GGA) STATE(12.25 Ry): 43.7 meV/Å2 STATE( 36 Ry ): 45.6 meV/Å2 by Hashimoto et al. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Ge(105): cutoff of (auxiliary) density matrix L Ge(105) : NSC-AITB RL=20.4 (or 25.4) bohr is enough. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
d a b Ge hut cluster (empirical) TB DFT calculations Strategy for calculations on hut clusters Si (001) Ge= (Etot-Esub)/(# of Ge hut atoms) • Ge: total energy per 1 Ge atom of hut cluster and its dependence on • size of hut cluster: a • spacings of hut clusters: b • thickness of Si substrate: d Ge of 2xN Ge/Si(001) surface CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
28x28 on 32x32, thickness 8 layer 22,746 atoms Calculations are done on the Earth Simulator using 64 nodes (512 processors, 1/10 of the ES) DMM : ~ 10min. Initial forces are already small. Structure Optimisation within NSC-AITB level 174Å 48Å Size: CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Ge hut cluster v.s. 2xN on Si(001): Tight-binding O(N) • Type 2 ‘hut’ is more stable than type 1 for small coverage. • crossing of (Ge) between 2xN and hut? • substrate of hut cluster: p2x2 (without dimer vacancies) • thicker substrates will reduce (Ge) of larger hut clusters. • -> hut should be more stable • Similar calculations by order-N DFT in progress.... CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Conquest O(N) DFT on Ge/Si hut-clusters • Type 1 ‘hut’ is more stable than type2. • Further calculations being done now. • Self-consistent DFT has also been achieved for this size of system. Minimal-basis, non-self-consistent DFT. Size of system: 22,746 atoms CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Summary and continuation... • Conquest O(N) DFT has been tested on simple systems, and shows satisfactory agreement with calculations using pseudo-atomic orbitals (Siesta) and with plane-wave calculations. • For Ge (105) surface, O(N) DFT calculations with different density-matrix cut-off radius RL show rapid convergence with respect to RL. Full blip-function basis in Conquest (equivalent to plane waves) gives good agreement with plane-wave results for surface formation energy. • Ge/Si hut clusters: O(N) tight-binding calculations on systems of 20,000 – 30,000 atoms are straightforward. O(N) self-consistent DFT calculations on this size of system are feasible. • Preliminary results on Ge/Si hut-clusters suggest cross-over to stable hut-clusters at coverage of about 3 monolayers. However, much more calculation and analysis is still needed. CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006
Credits • National Institute for Materials Science, Tsukuba • International Center for Young Scientists, NIMS, Tsukuba • MEXT grant for Scientific Research in Priority Areas “Development of New Quantum Simulators and Quantum Design” (No. 17064017) • Earth Simulator Center, Yokohama • UK Engineering and Physical Sciences Research Council • Royal Society CCP6 Conference on Computational Physics, Gyeongju , Republic of Korea, August 2006