1 / 19

Computing Architectures in the HL-LHC Era

Explore the evolution of computing architectures in the HL-LHC era, with a focus on the shift towards heterogeneous computing, domain-specific architectures, and the challenges of achieving usable performance portability. Discover the potential of new materials and devices, such as post-CMOS technologies, and the impact of AI and machine learning on science applications. Gain insights into the future of computer architecture and the implications for high-energy physics.

youngk
Download Presentation

Computing Architectures in the HL-LHC Era

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing Architectures in the HL-LHC Era Paolo Calafiura (LBNL) FastML Workshop Sep 2019

  2. Overview/Disclaimers • Apologies for the ATLAS/DOE/Offline bias • The only certainty about all HL-LHC predictions is that they will be proven wrong! • but we can still learn something from them...

  3. Rushing towards Heterogenous Computing Evolution of DOE HPC Systems • Over 20x more Flops available by 2026 • At most 10% will come from CPUs • GPUs will dominate • Multiple vendor platforms (NVidia cuda, Intel OneAPI,…)

  4. Why is this happening? • Transistor density keeps growing • But 15x gap wrto 1975 Moore’s law • Power keeps frequency down • The multicore era Amdahl limits core count • HEP typically <~1% serial Dennard scaling Amdahl’s Law Credit: John Hennessy, David Patterson

  5. History of a Benchmark We use 20%/yr in our HL-LHC extrapolations The SoC/Cloud Era The PC Era

  6. A New Golden Age for Computer Architecture Post CMOS New Materials and Devices The 2030s (10 year lead time) Domain Specific Credit: Shalf, Leland More Efficient Architectures and Packaging The 2020s Hardware Specialization

  7. Domain Specific Architectures: Deep Learning Google Intel/Nervana Intel Graphcore NVidia Credit: Dean, Patterson, Young

  8. DSA: Neural Network Inference Examples • Google TPU • Introduced to meet user inference needs (speech recognition) • First commercial implementation of a systolic array (matrix unit) • 50x more power efficient than a CPU • Edge TPU available as USB dongle TOps/s Source: Google Ops/Byte

  9. DSA: Neural Network Training Examples • Nvidia Volta Tensor Core • 4-8x speedup wrto to Pascal • Mixed-precision matrix multiply • Intel/Nervana Spring Crest • 32 GB 1 TB/s HBM2 Memory • 5x faster than GDDR5 • Goal is to speed-up NN training 100x by 2020 Source: Intel

  10. Domain Specific Architectures: QUBO D-Wave Quantum Annealer A special purpose computer that is designed to solve a particular optimization problem, namely finding the ground state of a classical Ising spin glass (Vazirany et al) Tracking with QUBOs (Linder, Zlokapa) D-Wave Instruction =

  11. Domain Specific Architectures: QUBO Fujitsu Digital Annealer A next-generation architecture inspired by quantum phenomena, for the high-speed resolution of combinatorial optimization problems. (Fujitsu) • Qubits replaced by “bit updating blocks” with on-chip memory for its ai and bijs • “Logic blocks” perform bit flips • U Tokyo run Linder QUBO on DAU 1 • Results to be presented at CHEP • Will start testing DAU 2 @LBL “soon” • 1M blocks annealer under development DAU Instruction =

  12. A New Golden Age for Computer Architecture? Domain Specific • Many unproven candidates yet to be investigated at scale. Most are disruptive to our current ecosystem.

  13. A New Golden Age for Computer Architects! • Developers left to deal with the uncertainty

  14. Usable Performance Portability is Hard • A GPU does not work like a TPU, much less like a CPU • Of course, Nvidia, AMD, and Intel GPUs have a lot in common • But proprietary software platforms. • No magic C++ compiler or library yet • @ Loop-level frameworks like SYCL, Kokkos, OMP, Raja, and numba offer good performance and good portability • At a cost on non-trivial restrictions on what can be done • Multiplication of fixed size arrays parallelizes everywhere

  15. Tensorflow (+ torch, mxnet, etc) to the Rescue!

  16. AI and Compute: Distributed ML 3.5 Months doubling time (5x faster than Moore) OpenAI Blog

  17. AI and Compute: Science Applications Some “intensive” HEP models: • CWoLabump hunting (composition of 10K NN) • Detector simulation (GANs) • Full chain neutrino reconstruction ν CWoLa HEP Det GANs NERSC Users Survey

  18. Final Thoughts “it’s worth preparing for the implications of systems far outside today’s capabilities” (OpenAI blog) • HEP starting the transition to large-scale AI • Millions of parameters • Data Parallelism • Model parallelism and model composition • Error estimation and Sensitivity studies

  19. Thanks! John Shalf, Charles Leggett, Lucy Linder, IllyaShapoval, VakhoTsulaia, Wahid Bhimji, Steve Farrell, Ben Nachman, David Rousseau, Kazu Terao

More Related