160 likes | 179 Views
Explore the journey of supercomputers from CDC 6600 to PostK, the evolution of benchmark software for HPC systems, and the significance of SPEC CPUint, Linpack, HPCC, HPCG, and more. Discover the transition from MFLOPS to EFLOPS, the challenges in comparing diverse architectures, and the future outlook towards Zflops technologies.
E N D
Benchmark software for HPC systems Kei Hiraki The University of Tokyo
My Position • Working in “Computer Architecture” • For me, Benchmark means SPEC CPU
My Position • Working in “Computer Architecture” • For me, Benchmark means SPEC CPU But SPEC CPUint of most supercomputers are small except for Intel x86 HPC is different world for measuring benchmarks
Diversity of Supercomputers • 1 MFLOPS • 1964, CDC 6600 • ILP (Pipeline parallel)、Out of order、Score board • 1 GFLOPS • 1984, Cray XMP/4 • Vector architecture、SMP • 1 TFLOPS • 1997, ASCI-Red • Cluster computer of many MPU • 1 PFLOPS • 2008, IBM Roadrunner (Cell base) • GPGPU • On Chip Multi CPU, Huge parallelism
Development of Supercomputers(1964-2018) CDC6600 CDC IBM 360/67 Vector CDC7600 SIMD GPU Cluster Distributed Memory Shared Memory TI 1970 ASC STAR-100 Burroghs Fastest system at one time FPS Cray ILLIAC IV AP-120B Research/Special system Fujitsu Cray-1 CMU Hitachi C.mmp 230-75APU Denelcor ICL M180IAP HEP Cyber205 DAP 1980 Goodyear Cray-XMP Intel Cray Computer VP-200 S-810 MPP Cosmic Cube NEC Ncube Thinking Machines Cydrome Allient Encore SX-2 VP-400 CM-1 Ncube Cray-2 iPSC Multimax FX-8 Multiflow CMU WARP Sequent S-820 CM-2 FX800 ETA-10 IBM Cray-YMP Maspar RP3 CS-1 FX2800 VP-2600 1990 Fujitsu SX-3 MP-1 T.S.Delta KSR-1 QCD-PAX AP1000 Cray-C90 MP-2 CM-5 SUN SGI NWT CS6400 S-3800 Paragon SP1 Challenge Cray-3 AP3000 Cray T3D CS-2 Hitachi SX-4 Cray-T90 VPP700 Intel T3E SR-2201 Tera/Cray ASCI RED Origin2000 MMX SP2 Starfire MTA SX-5 U of Tokyo Cray-SV1 SR8000 2000 Sony/IBM VPP5000 SSE SP3 PrimePower Origin3800 PS2EE GRAPE-6 ASCI White QCDSP Regatta SUN Fire ES SSE2 GPGPU Cray-X1 SX-6 HPC2500 SSE3 SR11000 XT3 Altix G80 BG/L CELL Cray X2 SX-8 FX1 GTX280 XT5 Road runner BG/P SR16000 Fermi AVX XT6 SX-9 IA Clusters GRAPE-DR GPGPU K 2010 Tianhe1A Xeon/Phi Power7 Blue Waters 星雲 XK6 BlueWater BG/Q FX10 SR16000M1 XK7 Tianhe2 XC30 PEZYSC1 SX-ACE XC40 FX100 SUNWAY Xeon/Phi PEZYSC2 IBM XC50 XC50 Today POWER9 +GV100 SX-Aurora PEZYSC3 2020 PostK
History of Fastest supercomputers (1) Name Year to start LinpackperformancePeak performance • UNIVAC LARC 1960 (0.16Mflops) • IBM STRECH 1961 (0.3Mflops) • CDC-6600 1964 0.5Mflops (3 Mflops) *N=100Linpack • CDC-7600 1969 3.3Mflops (10 Mflops)*N=100 Linpack • TI ASC 1972 ~30 Mflops (64 Mflops) • ILLIAC IV 1975 ~40 Mflops (150 Mflops) • Cray-1 1976 110 Mflops (160 Mflops)*N=1000 Linpack • Cray-XMP4 1982 714 Mflops (800 Mflops) • SX-2 1985 885 Mflops (1.3Gflops) • Cray-2 1985 1.4Gflops (1.9Gflops) • CM-2 1987 2.4Gflops (5 Gflops) (ETA-10 1988 496 Mflops (9.1 G (single)/4.6G(double) /8proc 並列動作は不動 *N=1000 Linpack, 1 proc. 7ns • Every fastest supercomputer has its interesting drama. • Behind fastest supercomputers, there are numerous supercomputers that fail to become the world fastest
History of Fastest supercomputers (2) Name Year to start LinpackperformancePeak performance • SX-3/44R 1990 23.2Gflops (25.6Gflops) • CM-5 1993 60 Gflops (131 Gflops) • Fujitsu NWT 1993 124 Gflops (236 Gflops) • Intel Paragon XP 1994 143 Gflops (184 Gflops) • Fujitsu NWT 1994 170 Gflops (236 Gflops) • Hitachi SR-2201 1996 220 Gflops (307 Gflops) • Hitachi CP-PACS 1996 368 Gflops (614 Gflops) • Intel ASCI RED 1997 1.1Tflops ( 1.5Tflops) • IBM ASCI White 2000 4.9Tflops (12.4Tflops) • NEC ES 2002 35 Tflops (40.1Tflops) • IBM BlueGene/L 2004 71 Tflops (92 Tflops) • IBM Roadrunner 2008 1.0Pflops (1.4 Pflops) • Cray XT-5 2009 1.8Pflops (2.3 Pflops) • Tianhe-1A 2010 2.5 Pflops (4.7 Pflops) • K-computer 2011 10.5 Pflops (11 Pflops) • BlueGene/Q 2012 16 Pflops (20 Pflops) • Cray XK7 2012 17.6 Pflops (27 Pflops) • Tianhe-2 2013 33.9 Pflops (55 Pflops) • Sunway 2016 93.9 Pflops (125 Pflops) • IBMAC922+NVIDIAV100 2018 122.3 Pflops (188 Pflops)
Various Benchmarks • SPEC CPUint • We cannot submit papers without SPEC CPUint • Even Dhrystone is useful • Linpack • HPC linpack is not a bad benchmarks • Good for time-line comparison • HPCC • Too many result figures • Reduncant • DGEMM, FFT, Stream are useful • HPCG • Today’s topic
HPC Benchmarks • How can I compare apples and oranges? Distributed Memory Shared Memory Vectors GPGPUs SIMD
Simplest history of supercomputers • 1 MFLOPS • 1964, CDC 6600 • ILP (Pipeline parallel)、Out of order、Score board • 1 GFLOPS • 1984, Cray XMP/4 • Vector architecture、SMP • 1 TFLOPS • 1997, ASCI-Red • Cluster computer of many MPU • 1 PFLOPS • 2008, IBM Roadrunner (Cell base) • GPGPU • On Chip Multi CPU, Huge parallelism • 1 EFLOPS • 2022? • Special purpose accelerator?3D semiconductor? • 1 Zflops • 2038?? • Billion core?More specialized accelerator? 20 years 13 yesars 11 yesars 14 years 16years?
Today’s Topics • What is the best benchmarks for • Exa flops developments • Zetta flops development • Comparison to Quantum computers
Why XXX/Rpeak important Ratio improves when CPU has less FPUs Now area for FPU is not a major factor of CPU
Return to simplicity • Weighted means of • DGEMM • STREAM • FFT • Selection of Weight is the problem OR
Return to simplicity • Weighted means of • Linpack • HPCG • FFT
Purpose of Benchmark software • Characterization of the system • Balance of system components • Proof of improvements • Evidences for purchase decisions • Performance / Cost • Performance / Power • Time-line comparison • Single number v.s. Multiple number