Computing Architectures in the HL-LHC Era

Computing Architectures in the HL-LHC Era Paolo Calafiura (LBNL) FastML Workshop Sep 2019

Overview/Disclaimers • Apologies for the ATLAS/DOE/Offline bias • The only certainty about all HL-LHC predictions is that they will be proven wrong! • but we can still learn something from them...

Rushing towards Heterogenous Computing Evolution of DOE HPC Systems • Over 20x more Flops available by 2026 • At most 10% will come from CPUs • GPUs will dominate • Multiple vendor platforms (NVidia cuda, Intel OneAPI,…)

Why is this happening? • Transistor density keeps growing • But 15x gap wrto 1975 Moore’s law • Power keeps frequency down • The multicore era Amdahl limits core count • HEP typically <~1% serial Dennard scaling Amdahl’s Law Credit: John Hennessy, David Patterson

History of a Benchmark We use 20%/yr in our HL-LHC extrapolations The SoC/Cloud Era The PC Era

A New Golden Age for Computer Architecture Post CMOS New Materials and Devices The 2030s (10 year lead time) Domain Specific Credit: Shalf, Leland More Efficient Architectures and Packaging The 2020s Hardware Specialization

Domain Specific Architectures: Deep Learning Google Intel/Nervana Intel Graphcore NVidia Credit: Dean, Patterson, Young

DSA: Neural Network Inference Examples • Google TPU • Introduced to meet user inference needs (speech recognition) • First commercial implementation of a systolic array (matrix unit) • 50x more power efficient than a CPU • Edge TPU available as USB dongle TOps/s Source: Google Ops/Byte

DSA: Neural Network Training Examples • Nvidia Volta Tensor Core • 4-8x speedup wrto to Pascal • Mixed-precision matrix multiply • Intel/Nervana Spring Crest • 32 GB 1 TB/s HBM2 Memory • 5x faster than GDDR5 • Goal is to speed-up NN training 100x by 2020 Source: Intel

Domain Specific Architectures: QUBO D-Wave Quantum Annealer A special purpose computer that is designed to solve a particular optimization problem, namely finding the ground state of a classical Ising spin glass (Vazirany et al) Tracking with QUBOs (Linder, Zlokapa) D-Wave Instruction =

Domain Specific Architectures: QUBO Fujitsu Digital Annealer A next-generation architecture inspired by quantum phenomena, for the high-speed resolution of combinatorial optimization problems. (Fujitsu) • Qubits replaced by “bit updating blocks” with on-chip memory for its ai and bijs • “Logic blocks” perform bit flips • U Tokyo run Linder QUBO on DAU 1 • Results to be presented at CHEP • Will start testing DAU 2 @LBL “soon” • 1M blocks annealer under development DAU Instruction =

A New Golden Age for Computer Architecture? Domain Specific • Many unproven candidates yet to be investigated at scale. Most are disruptive to our current ecosystem.

A New Golden Age for Computer Architects! • Developers left to deal with the uncertainty

Usable Performance Portability is Hard • A GPU does not work like a TPU, much less like a CPU • Of course, Nvidia, AMD, and Intel GPUs have a lot in common • But proprietary software platforms. • No magic C++ compiler or library yet • @ Loop-level frameworks like SYCL, Kokkos, OMP, Raja, and numba offer good performance and good portability • At a cost on non-trivial restrictions on what can be done • Multiplication of fixed size arrays parallelizes everywhere

Tensorflow (+ torch, mxnet, etc) to the Rescue!

AI and Compute: Distributed ML 3.5 Months doubling time (5x faster than Moore) OpenAI Blog

AI and Compute: Science Applications Some “intensive” HEP models: • CWoLabump hunting (composition of 10K NN) • Detector simulation (GANs) • Full chain neutrino reconstruction ν CWoLa HEP Det GANs NERSC Users Survey

Final Thoughts “it’s worth preparing for the implications of systems far outside today’s capabilities” (OpenAI blog) • HEP starting the transition to large-scale AI • Millions of parameters • Data Parallelism • Model parallelism and model composition • Error estimation and Sensitivity studies

Thanks! John Shalf, Charles Leggett, Lucy Linder, IllyaShapoval, VakhoTsulaia, Wahid Bhimji, Steve Farrell, Ben Nachman, David Rousseau, Kazu Terao

Computing Architectures in the HL-LHC Era

Computing Architectures in the HL-LHC Era

Presentation Transcript

ASTROPARTICLES IN THE LHC ERA

HL-LHC Quality management

Beam Instrumentation in the HL-LHC Insertion Regions

Software for HL-LHC

Cryogenics for HL-LHC

From LHC to HL/S-LHC

HL-LHC Configuration Management

Cryogenics for HL-LHC

HL-LHC 3D Models

Japan Contribution to the HL-LHC

HL-LHC Accelerator

Computing and Data Management for CMS in the LHC Era

ATLAS in the LHC collision era

Computing Architectures

MQY as Q5 in the HL LHC era

HL-LHC Challenges

The HL-LHC Project

Astroparticle Physics in the LHC Era

Collimation Backgrounds in HL-LHC

HE at HL LHC

2nd LHC Computing Workshop Architectures

Computing and Data Management for CMS in the LHC Era