360 likes | 468 Views
Development of a 3-D tight-binding-based electronic structure simulator for multi-million atom systems NEMO 3-D Gerhard Klimeck, Fabiano Oyafuso, Paul von Allmen Jet Propulsion Laboratory, Caltech Tim Boykin, U Alabama in Huntsville gekco@jpl.nasa.gov, 818-354-2182
E N D
Development of a 3-D tight-binding-basedelectronic structure simulator formulti-million atom systems NEMO 3-D Gerhard Klimeck, Fabiano Oyafuso, Paul von Allmen Jet Propulsion Laboratory, Caltech Tim Boykin, U Alabama in Huntsville gekco@jpl.nasa.gov, 818-354-2182 http://hpc.jpl.nasa.gov/PEP/gekco This research was carried out by at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration.
Presentation Outline • Introduction / Motivation • NASA Motivation, Critical look at the SIA Roadmap. • Nano-scale device examples. • What is a quantum dot? • Modeling agenda. • Software Issues • Problem size. • Parallel computing. • Graphical user interfaces. • Physics modeling results • Strain. • Alloy Disorder. • Interface Interdiffusion. • Conclusion / Outlook
Limit of Military Interest Limit of Commercial Interest NASA radiation and temperature requirements are outside commercial and military interest
2-D Lithography feature 1-D Growth feature 5-100 Å Technology PushToward Fundamental Limitations • Commercial market pushes computing (FLOPS/weight/power): • Enabled by • device miniaturization • chip size increase • Limited by: • Costs of fabrication • Discrete atoms/electrons Moore’s Law for Lithography 2D Feature
2-D Lithography feature 1-D Growth feature 5-100 Å Technology PushToward Fundamental Limitations • Commercial market pushes computing (FLOPS/weight/power): • Enabled by • device miniaturization • chip size increase • Limited by: • Costs of fabrication • Discrete atoms/electrons • Additional NASA Requirements: • High radiation tolerance • Extreme temperature operation-hot/cold Moore’s Law for Lithography 2D Feature
2-D Lithography feature 1-D Growth feature 5-100 Å Technology PushToward Fundamental Limitations • Commercial market pushes computing (FLOPS/weight/power): • Enabled by • device miniaturization • chip size increase • Limited by: • Costs of fabrication • Discrete atoms/electrons • Additional NASA Requirements: • High radiation tolerance • Extreme temperature operation-hot/cold
2-D Lithography feature 1-D Growth feature 5-100 Å Technology PushToward Fundamental Limitations • Commercial market pushes computing (FLOPS/weight/power): • Enabled by • device miniaturization • chip size increase • Limited by: • Costs of fabrication • Discrete atoms/electrons • Additional NASA Requirements: • High radiation tolerance • Extreme temperature operation-hot/cold • Quantum Dots • Detectors / lasers • Memory and logic Quantum dots go beyond the SIA roadmap and enable near and long term NASA applications
What is a Quantum Dot ?Basic Application Mechanisms • Physical Structure: • Well conducting domain surrounded in all 3 dim. by low conducting region(s) • Domain size on the nanometer scale • Electronic structure: • Contains a countable number of electrons • Electron energy may be quantized -> artificial atoms (coupled QD->molecule)
What is a Quantum Dot ?Basic Application Mechanisms • Physical Structure: • Well conducting domain surrounded in all 3 dim. by low conducting region(s) • Domain size on the nanometer scale • Electronic structure: • Contains a countable number of electrons • Electron energy may be quantized -> artificial atoms (coupled QD->molecule)
Photon Absorption Tunneling/Transport Occupancy of states Photon Emission Detectors/ Input Lasers/ Output Logic / Memory What is a Quantum Dot ?Basic Application Mechanisms • Physical Structure: • Well conducting domain surrounded in all 3 dim. by low conducting region(s) • Domain size on the nanometer scale • Electronic structure: • Contains a countable number of electrons • Electron energy may be quantized -> artificial atoms (coupled QD->molecule)
Photon Absorption Tunneling/Transport Occupancy of states Photon Emission Detectors/ Input Lasers/ Output Logic / Memory What is a Quantum Dot ?Basic Application Mechanisms • Physical Structure: • Well conducting domain surrounded in all 3 dim. by low conducting region(s) • Domain size on the nanometer scale • Electronic structure: • Contains a countable number of electrons • Electron energy may be quantized -> artificial atoms (coupled QD->molecule) Quantum dots are artificial atoms that can be custom designed for a variety of applications
Self-assembled , InGaAs on GaAs. Pyramidal or dome shaped R.Leon,JPL(1998) JPL Application: IR Sensors Nanocrystals: Si implanted in SiO2 Atwater, Caltech (1996) Nanotube Arrays, Jimmy Xu, Brown Univ. (1999) JPL Applications: Transducers, filters Molecular Dots Ruthenium-based molecule Ru4(NH3)16(C4H4N2)410+ proposed by Marya Lieberman, Notre Dame (1999) JPL Applications: Non-volatile Memory Computing Appl. Nanotechnology / Nanoelectronic Example Implementations Low Dimensional quantum confinement can be achieved in a variety of material systems
Need for Nanoelectronic Simulation • Problems: • Design space is huge • Choice of materials, shapes, orientations, dopings, heat anneals • Characterizations are incomplete and invasive / destructive • Simulation Impact: • Aide Design • Fast, cost effective. • -> Device performance • successful for 1-D quantum devices • Aide Characterization • Non-invasive • More accurate • -> Structure and doping analysis • successful for 1-D quantum devices Simulation Fabrication Characterization Modeling, Characterization and Fabrication are inseparable for nano-scale devices
Nano-scale Device Analysis / Synthesis Development of a Bottom-Up Nanoelectronic Modeling Tool Atomic Orbitals size: 0.2nm Nanoscale Structures (~20nm) • Assertions / Problems: • Nanoscale structures are built today!The design space is huge: choice of materials, compositions, doping, size, shape • Radiation on today’s sub-micron devices modifies the electronics on a nanoscale. • Approach: • Deliver a 3-D atomistic simulation tool • Enable analysis of arbitrary crystal structures, particles, atom compositions and bond/structure at arbitrary temperatures and ambient electric and magnetic fields. • Collaborators: • U. of Alabama, Ames, Purdue, Ohio State, NIST
New Devices for Sensingand Computing Analyze Devices: EnvironmentandFailures Nano-scale Device Analysis / Synthesis Development of a Bottom-Up Nanoelectronic Modeling Tool Atomic Orbitals size: 0.2nm Nanoscale Structures (~20nm) • Assertions / Problems: • Nanoscale structures are built today!The design space is huge: choice of materials, compositions, doping, size, shape • Radiation on today’s sub-micron devices modifies the electronics on a nanoscale. • Approach: • Deliver a 3-D atomistic simulation tool • Enable analysis of arbitrary crystal structures, particles, atom compositions and bond/structure at arbitrary temperatures and ambient electric and magnetic fields. • Collaborators: • U. of Alabama, Ames, Purdue, Ohio State, NIST • NASA Relevance: • Enable new devices needed for NASA missions beyond existing industry roadmap: • Water detection -> 2-5mm Lasers and detectors. • Avionics -> High density, low power computing. • Analyze state-of-the-art devices for non-commercial environments: • Europa -> Radiation and low temperature effects. Aging and failure modes. • Jovian system -> Magnetic field effects • Venus -> high temperature materials: SiGe • Impact: • Low cost development of revolutionary techn. Modeling will narrow the empirical search space!
Presentation Outline • Introduction / Motivation • NASA motivation, Critical look at the SIA Roadmap. • Nano-scale device examples. • What is a quantum dot? • Modeling agenda. • Software Issues • Problem size. • Parallel computing. • Graphical user interfaces. • Physics modeling results • Strain. • Alloy Disorder. • Interface Interdiffusion. • Conclusion / Outlook
How big is a realistic problem? How many atoms? • Rule of thumb: 43 atoms/nm3 • 1 quantum dot: 40x40x15nm3-> 1 million atoms • 2x2 array of dots: 90x90x15nm3-> 5.2 million atoms • 70x70x70nm3 cube of Si -> 15 million atoms What is the Basis? • 5 orbitals (sp3s*), 2 spins -> basis=10 • 10 orbitals (sp3d5s*), 2 spins -> basis 20 How big are the matrices? • Atoms x basis ~ 107x107, sparse • 4 neighbors/atom • Storage=106x5x(20x20)x16bytes/2=16GB Basis States Impurity in quantum dot GaAs Non-ideal interface AlAs A deca-nano device contains >1 Million Atoms
Four Generations of Cluster Experience NewYork (2002) 66 Xserve G4 1GHz 1GB RAM per node 33 GB total 60 GB Disc per node 2 TB total 100 Mb/s ethernet crossbar MAC OS X, MPI 495GFlops Pluto (2001) 64 Pentium IIIs 800MHz dual CPUs 2 GB RAM per node 64 GB total 10 GB Disc per node 320 GB total 2 Gb/s Myricom crossbar Linux, MPI 51.2 GFlops Nimrod (1999) 32 Pentium IIIs 450MHz 512 MB RAM per node 16 GB total 8GB Disc per node 256 GB total 100 Mb/s ethernet crossbar Linux, MPI 14.4 GFlops Hyglac (1997) 16 Pentium Pros 200MHz 128 MB RAM per node 2 GB total 5GB Disc per node 80 GB total 100 Mb/s ethernet crossbar Linux, MPI 3.2GFlops Gordon Bell Prize 1997
Parallelization of NEMO 3-D • Divide Simulation domain into slices. • Communication only from one slice to the next (nearest neighbor) • Communication overhead across the surfaces of the slices. • Limiting operation: sparse matrix-vector multiplication • Enable Hamiltonian storage or re-computation on the fly.
Code Parallelization • Problem: • Need to calculate eigenvalues of a complex matrix of the order of 40 million.=> must parallelize code • Approach: • Evaluate 2 parallel programming paradigms • Shared memory (OpenMP) - CPUs can access the same memory. • Distributed memory - CPUs exchange data through messages (MPI) - data synchronization performed explicitly by program. • Vision: • Utilize a designated Beowulf cluster of PC’s as a workhorse for these simulations. Each node might have 1-4 shared memory CPUs on one motherboard. • Envision a “mixed” code with outer level MPI parallelism and inner level OpenMP parallelism. • This will run on a commercial supercomputer like an SGI Origin 2000 as well as a beowulf. • Results: • Inner level OpenMP parallelism does not speed up code significantly. Dynamic creation and destruction of threads is too expensive. • Decided to abandon the OpenMP implementation and concentrate on the optimization and scaling of the MPI version. • Impact: • Enabled simulation of 2 million atom systems with 20 orbitals on each atom => matrix of order 40million 30 Lanczos Iterations (sparse matrix-vector multiply) Algorithm scales very nicely on commodity cluster (Beowulf)
Parallel Eigenvalue Solver on a Beowulf(32 node, dual CPU Pentium III, 800MHz, 64GB RAM, Linux)
Computation scales almost linear with system size Parallel Eigenvalue Solver on a Beowulf(32 node, dual CPU Pentium III, 800MHz, 64GB RAM, Linux) • Eigenvalue computation ranging from 1/4 to 16 million atoms • Hamiltonian storage provides speed-up of >3x • If the Hamiltonian is stored the maximum system size is reduced to about 8 million atoms • Dual CPU Intel Pentium III has serious memory latency problems.
Comparison of 2 Beowulf Generations • 2.5 Years old: 32 CPU, 450 MHz Pentium III, 500MB RAM each CPU • 1.5 Years old: 64 CPU (32 nodes), 933 MHz Pentium III, 1GB RAM each node • Brand new: 128 CPU (64 nodes), 2.2GHz P4, 1GB RAM each node (RESULTS not in plot) • 450MHz and 933MHz results: • Plot total CPU cycles in (time x MHz rating) • Our problem remains CPU limited, small communication overhead • No Problem feeding the memory to the CPUs. Doubling the CPU Frequency, doubled the speed on slower machines only, 2.2GHz does not scale as nice
Comparison of 2 Computing Platforms • SGI Origin 2000, 128 CPUs, 300MHz R12000, 2GB RAM per 4 CPUs (4 years old) • Beowulf, 64 CPUs, 933MHz Pentium PIII, 1GB per 2 CPUs (1.5 years old) • If matrix is recomputed on the fly: Beowulf ~8x faster • If matrix is stored: Beowulf is ~ 2.5x faster • SGI very fast memory access Cluster of commodity PC’s can beat a supercomputer for our problem
Parallelization of Strain Calculation Problem: • Serial strain computation: ~43 min. • Serial electronic structure calculation:~ 9 hours • Parallel electronic structure computation on 20 CPUs:~30 min. Solution: • Parallelize strain calculation as well Result: • Reduce time to 2-5 minutes on a parallel machine. • See difference between a fast 2Gbps and a 100Mbps network. • Do not see that difference in the electronic structure calculation. • Parallel strain computation is more communication dependent than the electronic structure calculation.
Semi-classical Potential • Specify desired outputs • Quantum region:“Where are wave-functions?” • Non-equilibrium region:“Where are the reservoirs?” • Quantum Potential • exchange & correlation? • how to go from bias to bias? • Specify desired outputs • Quantum region:“Where are wave-functions?” • Non-equilibrium region:“Where are the reservoirs?” • Quantum Charge region:“Where is the charge quantum mechanically calculated?” Hierarchical Ordering of User Input Fact of Life: Typical simulators have a huge wash list of parameters. Problem: What parameters are really needed? What is the dependence between these parameters? Approach: Parent/child related hierarchical ordering Dynamic window generation.
Generic Data Structure I/O • Dynamic GUI Design. • data structure • member descriptor • -> I/O for GUI or files Create Read Data Structure PotType potential real hbarovertau Boolean Ec RangeStruct NonEq Graphical User Interface Translator File/Batch User Interface Read potential=Hartree hbarovertau=0.0066 Ec=FALSE < start=45, end=69 > Create Theorist Software Engineer Flexible software design enables use in various different simulators
XML Based Generic Data Structure I/O C++ data structures Read Create Create Read XML Structure PotType potential real hbarovertau Boolean Ec RangeStruct NonEq Graphical User Interface Translator File/Batch User Interface Read potential=Hartree hbarovertau=0.0066 Ec=FALSE < start=45, end=69 > Create Theorist
Presentation Outline • Introduction / Motivation • NASA Motivation, Critical look at the SIA Roadmap. • Nano-scale device examples. • What is a quantum dot? • Modeling agenda. • Software Issues • Problem size. • Parallel computing. • Graphical user interfaces. • Physics modeling results • Strain. • Alloy Disorder. • Interface Interdiffusion. • Conclusion / Outlook
Results: • Implemented a mechanical strain model. • Implemented atomistic bandstructure model that comprehends strain. • Impact: • Can simulate realistic quantum dots. • Can estimate optical transition energies. Electronics Problem: Effect of overlap changes Equilibrium Strained Orbital overlap changes bandgapsand masses j Rst Ga In As i 1.33eV 1.02eV Eeh Eeh Unstrained Dot Strained Dot Pyramidal InAs Dot Simulation Base: 7nmx7nm Height: 3nm Embedded in GaAs Mechanical Strain Calculations • Problem: • Self-assembly dot formation due to strain • Small mechanical strain (5% bond length)-> dramatic effects on electronic structures • Approach: • Nanomechanical strain calculation • Nanoelectronic strain calculation. Mechanics Problem: Minimize elastic strain (Keating) Small strain has dramatic effects on the electronic structure.
Results: • Simulated 50 dots with random cation distributions. • Inhomogeneous broadening factor of 9.4meV due to alloy disorder. • Impact: • Fundamental uniformity limit for ensemble of alloy-based quantum dots. Simulation of Alloy Dot Ensemble Eeh=1.05eV G=0.1-5meV Measured G=34.6 meV (R. Leon, PRB, 58, R4262) 5meV Theoretical Lower Limit Alloy Disorder in Quantum Dots • Problem: • Cations are randomly distributed in alloy dots. • Does alloy disorder limit electronic structure uniformity for dot ensembles? • Approach: • Simulate a statistical ensemble of alloyed dots. • Requires atomistic simulation tool. In0.6Ga0.4AsLens Shaped Dot (Diameter=30nm, Height=5nm, GaAs embedded) In and Ga atoms are randomly distributed Inhomogeneous Broadening? Alloy disorder presents a theoretical lower limit on optical linewidths
Transport in Molecules End of SIA Roadmap Dopant Fluctuations in Ultra-scaled CMOS Carbon Nanotubes Modeling is an Integral Part of Nano-Science Electron Transport in Exotic Dielectrics DNA TiO2 (Ba,Sr)TiO3 Conclusions / Future Vision • Parallelization (16 million atoms), • Graded junctions, alloy disorder, strain • Made significant progress towards a general atomistic simulation tool • Envision this tool to have impact on quantum dots, end of SIA roadmap issues, and molectronics. Quantum Dots Grading Atomistic Simulation Abrupt Graded We are looking for motivated people!!! CS, EE, Phys At all degree levels!!!!
Results: • More Ga in the quantum dot raises the energy of the transition energies. • Less Ga in the barriers softens the barriers, reduces the binding of the excited states to the quantum dot and reduces DE=E2-E1. • Impact: • Verify experimentally suggested interdiffusion process may be responsible for blue shift and reduction in DE. Ga In As Pyramidal InAs in GaAs, Diameter=10nm, Height=4.2nm 5 samples per data point Atomistic Grading Simulation • Problem: • Quantum dot interfaces may not be sharp. • There may be cation redistribution around the interface => grading of the concentration. • How does the interfacial grading affect the electronic structure? • Approach: • Simulate quantum dot atomistically with graded interfaces as a function of interdiffusion length. Cartoon Visualization of Interdiffusion Slice through2 Qdots with thickness of 3 atoms - with andwithout interdiffusion Interdiffusion widens the bandgap=> blueshift
Progressive Spacecraft Miniaturization Cassini 1000 kg Mars Pathfinder Clark Lewis Solar Probe NEAR Pluto/Kuiper Express Mars 98 Lander/Orbiter 100 kg Spacecraft Mass Europa Orbiter Stardust 10 kg “Microspacecraft” Past Present Future Low weight, low power and high efficiency Have a special meaning to NASA
Motivation Atmospheric Probes Distributed Sensors Penetrators Hydrobot Minaturized-Spacecraft Miniaturized-Rovers Our enthusiasm for Nano-technology stems from its potential value in addressing Deep Space technology needs: • Autonomous navigation and maneuvering, • Miniature in-situsensors, • Radiation and temperature tolerant electronics. Integrated Inflatable Sailcraft Landing on Small Bodies Nano-technology will provide essential computing and sensing capabilities.
Parallel Eigenvalue Solver on a Beowulf(32 node, dual CPU Pentium III, 800MHz, Linux) • Eigenvalue computation ranging from 1/4 to 16 million atoms • Problems are too big for a single CPU (memory requirements) • sp3s* basis set • Matrix sizes up to 1.6 108x1.6 108 • Recompute Hamiltonian matrix on the fly. • Measure time for 30 Lanczos iterations, Full problem 1000-5000 iterations • Solver behaves almost linear in system size!!!