230 likes | 331 Views
ELECTRONIC STRUCTURE COMPUTATIONS OF MULTI-MILLION ATOM NON-PERIODIC SAMPLES OF METAL OXIDES IN LESS THAN 10 MINUTES? YES. Marek T. MICHALEWICZ * and Per NYBERG **
E N D
ELECTRONIC STRUCTURE COMPUTATIONS OF MULTI-MILLION ATOMNON-PERIODIC SAMPLES OF METAL OXIDES IN LESS THAN 10 MINUTES? YES.
Marek T. MICHALEWICZ* and Per NYBERG** *Joint Bureau of Meteorology / CSIROHigh Performance Computing and Communication CentreCSIRO Mathematical and Information Sciences24th Floor, 150 Lonsdale St., Melbourne, Victoria 3000, Australia. e-mail: Marek.Michalewicz@hpc.csiro.au** NEC Australia Pty. Ltd635 Ferntree Gully Road, Glen Waverley, Victoria 3150, Australia
Goal:Study electronic structure of disordered materials disorder: point defects: vacancies, interstitials, substitutions, random alloys extended defects: surfaces, atomic steps, islands, microfacets on surfaces Problem: solve Schroedinger equation for electrons in non-periodic solid; disorder -> no periodicity -> hence Bloch Theorem can not be applied physical quantity of interest: electronic density of states (DOS) theoretical methods: Recursion Method and Equation of Motion Method
We use the tight binding Hamiltonian H = incici + i;j,n (ti;j,ncic j,n + h.c.) i hFi/t = Hi;j,nFj,n The total electronic density of states is given by: = --1Im { Tr G+()} = --1Im[ iexp{-ifi,m} Fi (t) exp{iwt} dt ] N() = n,i|<n|i,>|2(n) where |n> is the eigenvector and en eigenvalue of the Hamiltonian, i and m are the site and the orbital index, respectively. G(w) is the Green’s function (in energy domain); Fi (t)= j,na j,nG i;j,n(t) is the amplitude of the Green’s Function.
Performance criteria for the electronic structure code for “real” materials: • it should be able to account for very large systems, consisting of ~102 - 106 atoms, or more; • it should exhibit linear computational complexity (scaling) O(N), i.e. computational time grows linearly with the number of atoms in a system • computer implementation should have a good parallel performance • computations should be very fast. • These criteria are complementary since the first one can hardly be achieved without all the others.
COMPUTATIONAL PERFORMANCE, BENCHMARK RESULTSMachines:1. SX-4/16A with 32 GBytes SDRAM main memory, 16 CPUs located at NEC research and training centre in Fuchu, Japan 2. SX-4/32 with 8 GBytes SSRAM main memory, 32 CPUs processors owned and operated by Joint Bureau of Meteorology / CSIRO High Performance Computing and Communication Centre and located in Melbourne, Australia. Both machines have 8 ns clock, 8 vector pipelines per processor and their theoretical peak speed is 2 GFLOPS/s per CPU.
The elapsed time for runs on 1,2,4 and 8 processors for the systems from 384,000 to 2,058,000 atoms. Linear scaling, i.e. the time of computation grows linearly with the number of atoms, O(N) Speed-up: the total elapsed time decreases like 1:1/2:1/4:1/8 for increasing number of processors. 1 2 4 8
Timing results for runs on 24 and 32 processors. The model system sizes were from 384,000 to 750,000. The bending downwords of the timing lines represents “superlinear” scaling - for the system size of more than 600,000. The elapsed time for a sample of 750,000 on 32 CPUs was less than 2 minutes. The parallel performance was 42.8 GFLOPS/s. This represents 66.8% of the theoretical peak speed of this machine.One of the fastest real applications on SX-4 supercomputer! 24 32
Timing results for runs on 16 CPU SX4/16A machine. The largest system studied had 7,623,000 atoms. This is, to the authors best knowledge the largest system ever computed at the quantum level. The computation took only 41 minutes. The scaling is linear O(N). The peak parallel performance for the largest system size was 21GFLOPS/s.
Speed-up vs. number of CPUs on which it was run. • For most applications a degradation of performance is observed when some usually moderate number of processors is exceeded. • We achieve nearly ideal speed-up for the sample consisting of 750,000 atoms. • We “compress” the entire hour of computation (on a single CPU) into less than two minutes of real time computations on 32 CPUs! 384k 480k 600k 750k ideal speed-up
TiO2 model system sizes and number of NEC SX-4 CPUs they were run number of processors used System size (ni x nj x nk) Number of atoms 1 2 4 8 16 24 32 40 x 40 x 40 384000 x x x x x & & 40 x 50 x 40 480000 x x x x x & & 40 x 50 x 50 600000 x x x x x & & 50 x 50 x 50 750000 x x x x x & & 70 x 70 x 70 2058000 x x x x x 80 x 80 x 80 3072000 x 90 x 90 x 90 4374000 x 100 x 100 x 100 6000000 x 105 x 105 x 105 6945750 x 105 x 110 x 110 7623000 x
Table 1. System size increases and performance improvement of the electronic structure code on different machines 1990-1998
Test system: Titanium dioxideTiO2 rutile structure tetragonal Bravais lattice our model: each Ti ion has 5d orbitals each O ion 1s + 3p orbitals each Ti has 16 neighbours each O has 14 neighbours
2x2x2 Why TiO2? •energy conversion(solar panels) •corrosion inhibition •fiber optic coatings •electronic material (capacitors, computer memories) •photoinduced hydrolysis of water •unique amphiphilic and oleophilic properties of UV iluminated TiO2 surfaces (speculations of “self-cleaning” windows) •size dependant optical properties •white paint!!!
The TiO2 model system sizes were defined by ni x nj x nk, where the size of the sample is nia x njb x nkc and a,b,c are the rutile lattice parameters (a=b=4.59A, c=2.96A). The largest sample studied was 105x110x110 and had dimensions of 48nm x 50nm x 32nm. With this scale of computation we enter into domain of nanosystems observed under the electron microscope and having many interesting and important properties. 10x10x10
Electron micrographs of TiO2 microcrystallites I. Harrowfield et al. CSIRO Minerals
STM image of TiO2(110) annealed at 760K size: 69.3 x 69.3 nm from H. Onishi, Dept. Of Chemistry The University of Tokyo
STM image of TiO2(110) annealed at 860K size: 35.4 x 35.4nm from H. Onishi, Dept. Of Chemistry The University of Tokyo
P.F. Murray, et al. Phys. Rev. Lett. 72 (1994) 689 STM image of microfacets TiO2(100)-(1x3)
Summary: • Extremely fast implementation of the equation-of-motion method for the electronic structure computations was presented. • The program can be applied to non-periodic, disordered nanocrystalline samples, metals, transition metal oxides and other systems. It scales linearly, O(N), runs with a speed of up to 43 GFLOPS/s on NEC SX-4 vector-parallel supercomputer with 32 processors and computes electronic densities of states for multi-million atom samples in mere minutes. • The largest test computation performed was for the electronic density of states (DOS) for the TiO2 sample consisting of 7,623,000 atoms. Mathematically, this is equivalent to obtaining a spectrum of an n x n Hermitian operator (Hamiltonian) where n = 38,115,000.
M.T.Michalewicz wishes to thank R. Bell and R. Thurling for their support and assistance during this work. We wish to thank NEC for generously granting us time on their NEC SX4/16A in Fuchu, Japan.