310 likes | 514 Views
Large-Scale Density-Functional calculations for nano-meter size Si materials. Jun-Ichi Iwata Center for Computational Sciences University of Tsukuba. Feb 23, 2010, Tsukuba-Edinburgh Computational Science Workshop, Edinburgh. Outline. Quantum Mechanical ( First-Principles )
E N D
Large-Scale Density-Functional calculations for nano-meter size Si materials Jun-Ichi Iwata Center for Computational Sciences University of Tsukuba Feb 23, 2010, Tsukuba-Edinburgh Computational Science Workshop, Edinburgh
Outline Quantum Mechanical (First-Principles) Simulation in Solid-State Physics Density-Functional Theory W. Kohn (Nobel Prize in 1998) Density-Functional simulations for large systems Real-Space DFT program code for Parallel Computation -RSDFT- Applications of RSDFT for Si nano materials >10,000-atom system
First-Principles Calculation in Material Physics • We describe material properties from the behavior of electrons and ions. • ions → classical, electrons → quantum • We solve the Schrodinger equation for electronic ground state • Density-functional theory is a powerful tool for this purpose.
electron density Density-Functional Theory Energy Functional (minimize) We get stable atomic & electronic structures. minimize with respect to Kohn-Sham equation Potential → We have to solve this equation self-consistently ( Nonlinear eigenvalue problem ) P. Hohenberg and W. Kohn, Phys. Rev. 136 (1964) B864. W. Kohn and L. J. Sham, Phys. Rev. 140 (1965) A1133.
Performance of DFT with simple approximation Exchange functional in Local-Density Approx. Correctly describe various properties quantitatively good results Si(in diamond structure) M. T. Yin and M. L. Cohen Phys. Rev. B26, 5668 (1982).
Everybody wants to apply the DFT for Large systems A. Ichimiya et al., Surf. Sci. 493, 555 (2001). • Usually, we treat 10- to 1000-atom systems by DFT. • However, we need to treat larger systems. • to study large objects (nano structures, proteins) • to make the atomic model more realistic Proteins(cytochrome c oxidase) ~30,000 atoms Nano structures (Si pyramid) ~100,000 atoms
Real-Space DFT program code(RSDFT) Solve Kohn-Sham equation (eigenvalue problem) → Computational costs ~ O(N3) Developed for parallel computers
discretize Higher-order finite difference pseudopotential method J. R. Chelikowsky et al., Phys. Rev. B, (1994) Real-Space Method ( ⇔ Reciprocal-Space (Plane-Wave) Method ) continuous space discrete space function Column vector Laplacian→ Higher-Order Finite-Difference Typical number of grid points:10,000~1,000,000
RSDFT – suitable for parallel first-principles calculation - • Real-Space Finite-Difference • Sparse Matrix • FFT free (FFT is inevitable in the conventional plane-wave code) • MPI ( Message Passing Interface ) library 3D grid is divided by several regions for parallel computation. Kohn-Sham eq. (finite-difference) CPU7 CPU8 CPU6 Higher-order finite difference CPU3 CPU4 CPU5 MPI_ISEND, MPI_IRECV CPU0 CPU1 CPU2 Integration MPI_ALLREDUCE
with our recently developed code “RSDFT” Massively Parallel Computing Iwata et al, J. Comp. Phys. (2010) Real-Space Density-Functional Theory code (RSDFT) Based on the finite-difference pseudopotential method (J. R. Chelikowsky et al., PRB1994) Highly tuned for massively parallel computers Computations are done on a massively-parallel cluster PACS-CS at University of Tsukuba. (Theoretical Peak Performance = 5.6GFLOPS/node) The largest system in the present study →Si10701H1996 Grid points = 3,402,059 Bands = 22,432 Convergence behavior for Si10701H1996 Computational Time(with 1024 nodes of PACS-CS) 6781 sec. × 60 iteration step = 113 hour
Flow chart Algorithm → subspace iteration method (Rayleigh-Ritz method) Input initial configuration of Ions Calc. Ionic Potentials Conjugate-Gradient Method O(N2) Gram-Schmidt orthonormalization O(N3) Convergence Check Convergence Check Density, Potentials update O(N) Atomic structure optimization yes Hellman-Feynman Force Move ions Subspace Diagonalization O(N3) yes Electronic structure optimization Electronic structure optimization must be performed in each atomic optimization step Total Computational Cost ~O(N3)
Algorithm1 →Subspace Iteration Method(Rayleigh-Ritz Method) Problem M-dimensional eigenvalue problem We need smallest N(≪M) eigen-pairs Initial guess Minimize Reyleigh quotients by Conjugate-Gradient Method wave function update
Algorithm 2 Gram-Schmidt Orthogonalization O(MN2) Subspace Diagonalization → as a basis set Calc. Matrix Elements O(MN2) O(MN2) (Ritz vectors) O(N3) ← initial guess for the next iteration
Gram-Schmidt orthogonalization ~Active use of Level 3 BLAS in O(N3) computation~ → Collaboration with computer scientists much improve the performance of the RSDFT! Time & Performance for Gram-Schmidt Theoretical peak performance = 5.6 GFLOPS/node O(N3) part can be computed at 80% of the theoretical peak performance! Algorithm of GS Part of the calculations can be performed as Matrix × Matrix operation!
PACS-CS(5.6GFLOPS/node) 256nodes Elapsed time for 1 step of iteration O(N2) O(N3) O(N3) → time for O(N2)-part and O(N3)-part become comparable
Application 1 Nano-meter size Si quantum dots
Si quantum dot is a promising material for several device applications • Memory • Single-electron transistor • Optical Device Clarifying the relation between the “Dot size” and “Band gap” is important for controlling the device properties. First-principles calculations are useful for such studies? → Yes, but … • System size is very large! A model of the Si quantum dot of 6.6 nm diameter(Si7055H1596)
Band Gaps Experimental fit curve From STS measurement B.Zanknoon et al., Nano letters 8, 1689 (2008). (eV) 300 atoms >10,000 atoms The ΔSCF gap seems to be closer to the ΔKS gap …
Application 2 Si nanowires
Several size of Si nanowires 4 nm diameter ( 425 atoms) 20 nm diameter ( 8941 atoms) 10 nm diameter ( 2341 atoms) There may be an optimum diameter in the region of 10 nm ~ 20 nm.
Band Structure and DOS of SiNW (d=1nm) X d=1nm Si21H20(41atoms) Eg=2.60eV(LDA Bulk : 0.53eV)
Band Structure and DOS of SiNW (d=4nm) X d=4nm Si341H84(425 atoms) Eg=0.81eV (LDA Bulk=0.53eV)
Band Structure and DOS of SiNW (d=8nm) Si1361H164(1525atoms), Eg=0.61eV X Bulk Si X Eg=0.53eV
Si nano wire with surface roughness Si12822H1544 Side View Top View Si12822H1544(14,366 atoms) ・10nm diameter、3.3nm height、(100) ・Grid spacing:0.45Å (~14Ry) ・# of grid points:4,718,592 ・# of bands:29,024 ・Memory:1,022GB~2,044GB
PACS-CS1024 nodes(peak performance:5.6 GFLOPS/node) Subspace diagonalization:4600 sec. Gram-Schmidt:2300 sec. Conjugate-Gradient Method:3700 sec. Total Energy calc.:1200 sec. Total(1 step):12,000 sec. DOS of SiNW with roughness DOS of Bulk Si d=10nm(with roughness) Si12822H1544(14,366 atoms) Eg=0.57eV
Application3 Si divacancy
There are two possibilities for the structure of Si divacancy. Resonant-Bond type Large-paring type Structure of Si divacancy : Small-yellow balls : vacancies (no atoms) Green balls : Si atoms with dangling bonds. Si divacancy What is the stable structure ? LDA calculation (Saito & Oshiyama, 1994) EPR experiment (Watkins & Corbett, 1965) Resonant-Bond typeis stable (Large-Paring type was not found) Large-Paring type Model size ~ 60 atoms More recent LDA calculation (Oguet et al., 1999) ・Both “Large-paring” and “Resonant-Bond” structure were found. ・Large-Paring type is the most stable (RB type is a local minimum) →Model Size dependence ? Model size ~ 300 atoms
There are two possibilities for the structure of Si divacancy. Resonant-Bond type Large-paring type Structure of Si divacancy : Small-yellow balls : vacancies (no atoms) Green balls : Si atoms with dangling bonds. Sidivacancy • Structures converge at • 998-atom model. • LPstructure appears • at 510 or larger models. • RB structure is most • stable, but the energy • difference is very small • (<10 meV) dac, dab (Å) Model size (# of atoms) Large-paring Resonant-Bond Small-Paring J.-I. Iwata, et al., Phys. Rev. B 77 (2008) 115208
Summary • We have developed Real-Space DFT program code for large systems • by utilizing the massively parallel computers • Collaboration with computer scientist much improve the performance of RSDFT • (Especially, O(N3)-part calculation with BLAS 3) • By using a few hundred~1000CPUs, we have achieved the first-principles calculation for • ・Si 1000-atom system with atomic structure optimization • ・Self-Consistent electronic structures of Si 10,000-atom systems • By using large atomic models → eliminate the model-size dependence • We have applied the RSDFT for nano-meter scale Si materials (SiNW, SiQD) • I think the RSDFT becomes an useful tool for future device development