Overview of Lab Capabilities, Expertise & Hardware

Overview of Lab Capabilities, Expertise & Hardware Jack Wells, ORNL; (for David Skinner (LBNL) Director of Science, Oak Ridge Leadership Computing Facility Oak Ridge National Laboratory HPC4mobility: Stakeholders’ Workshop Chattanooga, 1 October 2018

Overview of Lab Capabilities, Expertise & Hardware Jack Wells, Oak Ridge National Laboratory Acknowledgement: David Skinner, LBNL

HPC4mobility: Principle LaboratoriesAll DOE Labs are eligible

High Performance Computing is … … leveraging "supercomputers" to advance our study of that which is otherwise too big (e.g. stars and galaxies), too small (e.g. atomic to nano-scale), too fast (e.g. nuclear fusion), too slow (e.g. cosmology), too dangerous/expensive (e.g. destructive testing). Understanding How Proteins Work Extreme Climate Events Cheap & Efficient Solar Designing Better Batteries Understanding The Universe ...your app here

What is a Leadership Computing Facility (LCF)? • Collaborative DOE Office of Science user-facility program at ORNL and ANL • Mission: Provide the computational and data resources required to solve the most challenging problems. • 2-centers/2-architectures to address diverse and growing computational needs of the scientific community • Highly competitive user allocation programs (INCITE, ALCC). • Projects receive 10x to 100x more resource than at other generally available centers. • LCF centers partner with users to enable science & engineering breakthroughs (Liaisons, Catalysts).

HPC4X, where X = ... • Higher-performing components for higher temperature engines • Design new metal alloys to enable the production of lightweight, temperature-resistant cylinder heads that allow more flexibility in future advanced engine designs. • Increased fuel-efficiency for transportations systems • Use simulations to met the EPA’s rigorous Phase 2 regulations, which aim to reduce carbon emissions and improve the fuel efficiency of heavy-duty vehicles.

Titan Speeds Inquiry into Engine Alloys that Can Take the HeatSimulations reveal a crucial underlying mechanism during the design process of a new innovative high-temperature aluminum alloy family The Science Simulations on the OLCF’s Titan filled in a knowledge gap about high-temperature–capable alloys and inspired engineers to successfully develop a new cast aluminum–copper alloy that promises to withstand engine temperatures as high as 300°C and offer more design flexibility. North American automaker FCA US LLC, Nemak, and ORNL collaborated on the project to explore how some aluminum–copper alloys retain their strength and resist cracking at higher temperatures. FCA and Nemak engineers worked with ORNL experimentalists to leverage newfound insights into the design of a new family of alloys that are durable, castable, and affordable: the aluminum–copper–manganese–zirconium (ACMZ) alloys. A cast cylinder head made from 16HT, an alloy in the ACMZ family. Image Credit: Carlos Jones, ORNL The Impact FCA and Nemak have cast more than 100 production-scale cylinder heads from various ACMZ alloys. One ACMZ alloy composition, 16HT, underwent a rigorous proof-of-concept engine test in December 2017 at FCA. The new family of ACMZ alloys might allow engineers to produce lightweight, temperature-resistant cylinder heads that allow more flexibility in future advanced engine designs. PI(s)/Facility Lead(s): Amit Shyam/Dongwon Shin ASCR Program/Facility: DD ASCR PM: Christine Chalk Date submitted to ASCR: Publication(s) for this work: Dongwon Shin, Sangkeun Lee, Amit Shyam, and J. Allen Haynes, “Petascale Supercomputing to Accelerate the Design of High-Temperature Alloys,” Science and Technology of Advanced Materials 18, no. 1 (2017): 828–38. doi:10.1080/14686996.2017.1371559.

SmartTruck Steps up Simulations for Certification by ComputationSmall business takes on certification by CFD with OLCF’s big HPC resources The Science SmartTruck completed, for the first time, a detailed unsteady analysis of its TopKit Aero product—which reduces drag on the sides and back of a trailer—using modeling and simulation on the OLCF’s resources. SmartTruck engineers compared a computational baseline vehicle to the same model with the SmartTruckTopKit Aero product attached using new modeling techniques that more accurately represent real life conditions. Using computational fluid dynamics calculations, the team successfully solved across five different physics equations with additional turbulence equations and found that the overall drag on the vehicle was reduced by approximately 6.6 percent. By forcing airflow to veer off the trailer towards the wake behind it, the TopKit Aero System (red) reduces the base wake (blue) and increases the pressure on the truck rear, creating a significant reduction in overall vehicle drag. Image Credit: SmartTruck The Impact The complex physics SmartTruck captured in its simulations successfully met the EPA’s rigorous Phase 2 regulations, which aim to reduce carbon emissions and improve the fuel efficiency of heavy-duty vehicles. The drag reduction shown by the team’s simulations meets the Phase 2 regulations and also translates to a roughly 4–5 percent increase in fuel efficiency. SmartTruck is the first company to request certification from the EPA through computational analysis instead of physical testing, and using numerical simulation, the team reduced the time to complete the certification process by 25 percent and reduced the cost by 75 percent. Eventually, the SmartTruck team plans to release the details of its methods and process to accelerate the adoption of simulation for certification of aerodynamic components. PI(s)/Facility Lead(s): Nathan See ASCR Program/Facility: DD ASCR PM: Christine Chalk

...where, X = f(HPC systems, HPC experts, HPC apps) • Systems • HPC systems are the computing machines that can drive your science faster. • Systems come and go every 5 years or so. • Experts • People who scale algorithms, software, and workflows to run on HPC systems. • Plugged in to rapidly evolving HPC trends and approaches • Apps • HPC apps are the software which must exist already or be newly developed to enable a project. • New app development requires real investment. Leverage prior art.

Applied High Performance Computing includes … ...using a supercomputer to design/discover new materials ..using a supercomputer to validate a reduced-order model with large-scale reference calculations ...using a supercomputer in real-time to understand complex system dynamics ...using a supercomputer in transportation systems optimization.

Improvements and new capabilities at BES facilities are creating challenges for which the community is not prepared. These include unprecedented growth in data volumes, complexity, and access; the need for curation; and integration of diverse datasets. Data Office of Science facilities have a pressing need to expand their capabilities in data science. This need cuts across all six exascale reviews. Needs include data storage, management, analysis, transfer, and I/O.

Common Themes Across DOE/SC Offices Across all the exascale requirements reviews a number of common themes emerged as particularly challenging areas Data management, archiving, and curation Large-scale data storage and analysis Input/Output (I/O) performance Remote access, sharing, and data transfer Experimental and simulation workflows

Emerging Science Activities:SelectedMachine Learning Projects on Titan: 2016-2017

Lab Resource Overview Capabilities, Expertise, Hardware

No more free lunch: Moore’s Law continues, Denard Scaling is over National Research Council (NRC) – Computer Science and Telecommunications Board (2012) https://www.karlrupp.net/2018/02/42-years-of-microprocessor-trend-data/

Classic Dennard Scaling (MOSFET Scaling) Dennard scaling, also known as MOSFET scaling, is a scaling law based on a 1974 paper co-authored by Robert H. Dennard, after whom it is named.[1] Originally formulated for MOSFETs, it states, roughly, that as transistors get smaller their power density stays constant, so that the power use stays in proportion with area: both voltage and current scale (downward) with length. -- https://en.wikipedia.org/wiki/Dennard_scaling Scale chip features down 0.7x per process generation 1.4x faster transistors 0.7x capacitance 2x more transistors 0.7x voltage

Post-Dennard Scaling Dennard scaling, also known as MOSFET scaling, is a scaling law based on a 1974 paper co-authored by Robert H. Dennard, after whom it is named.[1] Originally formulated for MOSFETs, it states, roughly, that as transistors get smaller their power density stays constant, so that the power use stays in proportion with area: both voltage and current scale (downward) with length. -- https://en.wikipedia.org/wiki/Dennard_scaling Scale chip features down 0.7x per process generation Transistors are no faster 1.4x faster transistors 0.7x capacitance 2x more transistors 0.7x capacitance Static leakage limits reduction in Vth  Vdd stays constant 0.7x voltage

Capabilities. • Software and algorithms that are cutting edge and scale to hundreds of thousands of cores. • Well defined methods-of-scale in simulation science and data analysis. Matrix, PDE, and machine learning libraries provide an on-ramp for new problems seeking massive computing. • A broad variety of scalable research application codes often re-purposable to solve applied HPC challenges. Leverage existing HPC software by adapting it.

Expertise • Applied mathematicians capable of delivering new algorithms in areas of interest. • HPC scientists and engineers who can collaborate within your HPC agenda. • Established mechanisms for postdoc, visiting faculty, and entrepreneur-in-residence programs. • An R&D community concerned with durable solutions in HPC software & capable of leveraging computing advances between fields.

Above are HPC4Materials Principals. All DOE Labs are eligible.

DOE HPC Facilities System Acquisitions Exascale Systems Pre-Exascale Systems 2021-2022 2013 2016 2020 2018 Mira Summit 2021 A21 Theta Argonne IBM BG/Q Unclassified Argonne Intel/Cray KNL Unclassified Argonne Intel/Cray TBD Unclassified Sierra NERSC-9 CORI LBNL Cray/Intel Xeon/KNL Unclassified LBNL TBD Unclassified Frontier Titan ORNL IBM/NVidia P9/Volta Unclassified ORNL TBD Unclassified ORNL Cray/NVidia K20 Unclassified El Capitan Sequoia Trinity LLNL IBM/NVidia P9/Volta Classified Crossroads LLNL IBM BG/Q Classified LLNL TBD Classified LANL/SNL TBD Classified LANL/SNL Cray/Intel Xeon/KNL Classified

ORNL Summit System Overview System Performance Each node has The system includes • Peak of 200 Petaflops (FP64) for modeling & simulation • Peak of 3.3 ExaOps (FP16) for data analytics and artificial intelligence • 2 IBM POWER9 processors • 6 NVIDIA Tesla V100 GPUs • 608 GB of fast memory (96 GB HBM2 + 512 GB DDR4) • 1.6 TB of NV memory • 4,608 nodes • Dual-rail Mellanox EDR InfiniBand network • 250 PB IBM file system transferring data at 2.5 TB/s

Supercomputer Specialization vs ORNL Summit • As supercomputers got larger and larger, many expected them to be more specialized and limited to a reduced number of applications that can exploit their growing scale. • We also predicted that power consumption would be a dominant issue as we approached exascale. • Summit’s architecture seems to have stumbled into a sweet spot. It is not showing power consumption growth and has broad capability across: • Traditional HPC modeling and simulation • High performance data analytics • Artificial Intelligence

Summit Excels Across Simulation, Analytics, and AI Advanced simulations High-performance data analytics Artificial intelligence • Data analytics – CoMet bioinformatics application for comparative genomics. Used to find sets of genes that are related to a trait or disease in a population. Exploits cuBLAS and Volta tensor cores to solve this problem 5 orders of magnitude faster than previous state-of-art code. • Has achieved 2.36 ExaOpsmixed precision (FP16-FP32) on Summit • Deep Learning – global climate simulations use a half-precision version of a neural network to learn to detecting extreme weather patterns in the global climate output • Has achieved a sustained throughput of 1.0 ExaOps (FP16) on Summit • Nonlinear dynamic low-order unstructured finite-element solver accelerated using mixed precision (FP16 thru FP64) and AI generated preconditioner. Answer in FP64 • Has achieved 25.3x speedup on Japan earthquake – city structures simulation • Many Early Science codes are reporting >10x speedup on Summit vs Titan

Five Gordon Bell Finalists Credit Summit Supercomputer Five Summit users are among the finalists for the prestigious Gordon Bell Prize, one of the top annual honors in supercomputing. The finalists—representing Oak Ridge, Lawrence Berkeley, and Lawrence Livermore National Laboratories and the University of Tokyo—leveraged Summit’s unprecedented computational capabilities to tackle a broad range of science challenges and produced innovations in machine learning, data science, and traditional modeling and simulation to maximize application performance. The Gordon Bell Prize winner will be announced at SC18 in Dallas in November. Finalists include: • ORNL developed a genomics algorithm capable of using mixed-precision arithmetic to attain exascale speeds. • University of Tokyo led team applied AI and mixed-precision arithmetic to accelerate the simulation of earthquake physics in urban environments. • A Lawrence Berkeley National Laboratory-led collaboration that trained a deep neural network to identify extreme weather patterns from high-resolution climate simulations. • An ORNL team scaled a deep learning technique on Summit to produce intelligent software that can automatically identify materials’ atomic-level information from electron microscopy data. • A LBNL and Lawrence Livermore National Laboratory team led by physicists André Walker-Loud and PavlosVranas that developed improved algorithms to help scientists predict the lifetime of neutrons and answer fundamental questions about the universe. PI(s)/Facility Lead(s): Dan Jacobson, ORNL; Tsuyoshi Ichimura, Univ. Tokyo; Prabhat, LBNL; Robert Patton, ORNL; André Walker-Loud and PavlosVranas, LBNL and LLNLASCR Program/Facility: Summit Early Science ASCR PM: Christine Chalk

The US exascale strategy includes four major elements within the Exascale Computing Initiative (ECI) Exascale Computing Project (ECP) ECI is a partnership between the US DOE Office of Science and the National Nuclear Security Administration to accelerate research, development, acquisition, and deployment to deliver exascale computing capability to US national laboratories by the early-to-mid-2020s for essential science and mission simulations. Computer facility site preparation investments US computer vendor R&D investments (NRE) Focused on the delivery of “capable” exascale computing – an enduring computing capability that a wide range of applications of importance to the US will be able to use. Exascale systems procurement activities

ECP Areas of Technical Focus Application Development Software Technology Hardware and Integration • The Application Development effort develops and enhances the predictive capability of applications critical to the DOE, including the science, energy, and national security mission space. The scope of the AD focus area includes • targeted development of requirements-based models, algorithms, and methods, • integration of appropriate software and hardware via co-design methodologies, • systematic improvement of exascale system readiness and utilization, and • demonstration and assessment of effective software integration. • Software Technology spans low-level operational software to high-level applications software development environments, including the software infrastructure to support large data management and data science for the DOE SC and NNSA computational science and national security activities at exascale. Projects will have: • line of sight to application’s efforts • inclusion of a Software Development Kit to enhance the drive for collaboration, and • delivery of specific software products across this focus area. • This focus area is centered on the integrated delivery of specific outcomes (ECP Key Performance Parameters, or KPPs) and products (e.g., science as enabled by applications, software, and hardware innovations) on targeted systems at leading DOE computing facilities. Areas include: • PathForward • Hardware Evaluation • Application Integration at Facilities • Software Deployment at Facilities • Facility Resource Utilization • Training and Productivity

ECP History: from the Start to Present

What is CORAL? The program through which ORNL, ANL, and LLNL are procuring supercomputers. • Several DOE labs have strong supercomputing programs and facilities. • DOE created CORAL (the Collaboration of Oak Ridge, Argonne, and Livermore) to jointly procure these systems, and in so doing, align strategy and resources across the DOE enterprise. • Collaboration grouping of DOE labs was done based on common acquisition timings. Collaboration is a win-win for all parties. Paving The Road to Exascale Performance

DOE HPC Facilities System Acquisitions Exascale Systems Pre-Exascale Systems 2021-2022 2013 2016 2020 2018 Mira Summit 2021 A21 CORAL-1 Theta Argonne IBM BG/Q Unclassified Argonne Intel/Cray KNL Unclassified Argonne Intel/Cray TBD Unclassified Sierra NERSC-9 CORI LBNL Cray/Intel Xeon/KNL Unclassified CORAL-2 LBNL TBD Unclassified Frontier Titan ORNL IBM/NVidia P9/Volta Unclassified ORNL TBD Unclassified ORNL Cray/NVidia K20 Unclassified El Capitan Sequoia LLNL TBD Classified Trinity LLNL IBM/NVidia P9/Volta Classified Crossroads LLNL IBM BG/Q Classified LANL/SNL TBD Classified LANL/SNL Cray/Intel Xeon/KNL Classified

CORAL-2 Acquisition Time Line • Released RFP April 9, 2018 • Responses due May 24, 2018 • Evaluations by Lab teams • Selections made in June 2018 • Negotiations for NRE contracts and System Build contracts begin LLNL El Capitan contract (2022 delivery) Possible ANL contract (2022 delivery) ORNL Frontier contract (2021 delivery) One NRE contract per awarded vendor (1-3) RFP Evaluation & Select

ALCF 2021 Exascale Supercomputer – A21 2021 Intel Aurora supercomputer planned for 2018 shifted to 2021 Scaled up from 180 PF to over 1000 PF Support for three “pillars” Simulation Data Learning Pre-planning review Design review Rebaseline review NRE contract award Build contract modification ALCF-3 Facility and Site Prep, Commissioning Build/Delivery ALCF-3 ESP: Application Readiness NRE: HW and SW engineering and productization Acceptance CY 2017 CY 2018 CY 2019 CY 2020 CY 2021 CY 2022

Frontier: The OLCF 2021 Exascale computer Advanced simulations High-performance data analytics Artificial intelligence 2021 delivery planned with >1000 PF performance (50–100 the scientific productivity of current Titan system) Design review Award contract Solicit bids for computer build Frontier NRE: hardware and software engineering Prepare power, cooling, and space Installation Prepare scientific software applications Frontier available Summit available: 200 PF production system Titan available: 27 PF CY 2017 CY 2018 CY 2019 CY 2020 CY 2021 CY 2022 CY 2023

HPC is expandingfrom the realm of scientific discovery (the lab) to the factory, power plant & shop floor. • The HPC “app store” is growing: materials genomics, turbines, furnaces, metals, PV, batteries, & paper. Summary • HPC models of energy systems can reveal cost savings & deliver confidence in design changes. • HPC software used in discovery science can be re-purposed to solve applied problems. Open source. Above: Lowering the risk of HPC adoption in industry. Below: NERSC’s Cori system advances time-to-solution. • HPC is not just simulation. HPC data analytics, e.g. from sensor networks can inform operation of energy systems. • HPC algorithms deliver game changing speed-ups. Can change how we think about models. Digital twins, e.g.

Questions? Jack Wells, wellsjc@ornl.gov

Overview of Lab Capabilities, Expertise & Hardware