190 likes | 198 Views
Explore the evolution of computing architectures in the HL-LHC era, with a focus on the shift towards heterogeneous computing, domain-specific architectures, and the challenges of achieving usable performance portability. Discover the potential of new materials and devices, such as post-CMOS technologies, and the impact of AI and machine learning on science applications. Gain insights into the future of computer architecture and the implications for high-energy physics.
E N D
Computing Architectures in the HL-LHC Era Paolo Calafiura (LBNL) FastML Workshop Sep 2019
Overview/Disclaimers • Apologies for the ATLAS/DOE/Offline bias • The only certainty about all HL-LHC predictions is that they will be proven wrong! • but we can still learn something from them...
Rushing towards Heterogenous Computing Evolution of DOE HPC Systems • Over 20x more Flops available by 2026 • At most 10% will come from CPUs • GPUs will dominate • Multiple vendor platforms (NVidia cuda, Intel OneAPI,…)
Why is this happening? • Transistor density keeps growing • But 15x gap wrto 1975 Moore’s law • Power keeps frequency down • The multicore era Amdahl limits core count • HEP typically <~1% serial Dennard scaling Amdahl’s Law Credit: John Hennessy, David Patterson
History of a Benchmark We use 20%/yr in our HL-LHC extrapolations The SoC/Cloud Era The PC Era
A New Golden Age for Computer Architecture Post CMOS New Materials and Devices The 2030s (10 year lead time) Domain Specific Credit: Shalf, Leland More Efficient Architectures and Packaging The 2020s Hardware Specialization
Domain Specific Architectures: Deep Learning Google Intel/Nervana Intel Graphcore NVidia Credit: Dean, Patterson, Young
DSA: Neural Network Inference Examples • Google TPU • Introduced to meet user inference needs (speech recognition) • First commercial implementation of a systolic array (matrix unit) • 50x more power efficient than a CPU • Edge TPU available as USB dongle TOps/s Source: Google Ops/Byte
DSA: Neural Network Training Examples • Nvidia Volta Tensor Core • 4-8x speedup wrto to Pascal • Mixed-precision matrix multiply • Intel/Nervana Spring Crest • 32 GB 1 TB/s HBM2 Memory • 5x faster than GDDR5 • Goal is to speed-up NN training 100x by 2020 Source: Intel
Domain Specific Architectures: QUBO D-Wave Quantum Annealer A special purpose computer that is designed to solve a particular optimization problem, namely finding the ground state of a classical Ising spin glass (Vazirany et al) Tracking with QUBOs (Linder, Zlokapa) D-Wave Instruction =
Domain Specific Architectures: QUBO Fujitsu Digital Annealer A next-generation architecture inspired by quantum phenomena, for the high-speed resolution of combinatorial optimization problems. (Fujitsu) • Qubits replaced by “bit updating blocks” with on-chip memory for its ai and bijs • “Logic blocks” perform bit flips • U Tokyo run Linder QUBO on DAU 1 • Results to be presented at CHEP • Will start testing DAU 2 @LBL “soon” • 1M blocks annealer under development DAU Instruction =
A New Golden Age for Computer Architecture? Domain Specific • Many unproven candidates yet to be investigated at scale. Most are disruptive to our current ecosystem.
A New Golden Age for Computer Architects! • Developers left to deal with the uncertainty
Usable Performance Portability is Hard • A GPU does not work like a TPU, much less like a CPU • Of course, Nvidia, AMD, and Intel GPUs have a lot in common • But proprietary software platforms. • No magic C++ compiler or library yet • @ Loop-level frameworks like SYCL, Kokkos, OMP, Raja, and numba offer good performance and good portability • At a cost on non-trivial restrictions on what can be done • Multiplication of fixed size arrays parallelizes everywhere
AI and Compute: Distributed ML 3.5 Months doubling time (5x faster than Moore) OpenAI Blog
AI and Compute: Science Applications Some “intensive” HEP models: • CWoLabump hunting (composition of 10K NN) • Detector simulation (GANs) • Full chain neutrino reconstruction ν CWoLa HEP Det GANs NERSC Users Survey
Final Thoughts “it’s worth preparing for the implications of systems far outside today’s capabilities” (OpenAI blog) • HEP starting the transition to large-scale AI • Millions of parameters • Data Parallelism • Model parallelism and model composition • Error estimation and Sensitivity studies
Thanks! John Shalf, Charles Leggett, Lucy Linder, IllyaShapoval, VakhoTsulaia, Wahid Bhimji, Steve Farrell, Ben Nachman, David Rousseau, Kazu Terao