370 likes | 571 Views
How fast are fast computers?. Xing Cai October 26, 1998. Overview. Modern fast computers at a glimpse Fast computers & scientific computing A closer look at SC performance Current situation & future trends Concluding remarks. An indirect answer.
E N D
How fast are fast computers? Xing Cai October 26, 1998
Overview • Modern fast computers at a glimpse • Fast computers & scientific computing • A closer look at SC performance • Current situation & future trends • Concluding remarks
An indirect answer The slowest fast computer is faster than the fastest slow computer.
http://www.top500.org • Performance ranking of world’s 500 most powerful computers • LINPACK benchmark (floating-pt intensive) • J. Dongara, H. Meuer, E. Strohmaier • Report every 6 months since June 93 • A good correction of peak performance KFlops MFlops GFlops TFlops
ASCI Red TFLOPS 85 cabinets, 9216 Intel Pentium Pro processors http://www.sandia.gov/ASCI/Red/main.html
Some “high-end” computers • SGI Cray T3E 1200 • SGI Cray Origin 2000 • Fujitsu VPP 700 • NEC SX-4 • IBM RS/6000 SP
Vendor overview http://www.netlib.org/utk/people/JackDongarra/top500-698/
Vendor overview http://www.netlib.org/utk/people/JackDongarra/top500-698/
Scientific computing 50 years ENIAC - world’s 1st electronic computer for scientific computing
Advance in hardware • Rapid advance of microprocessor tech. • World’s most powerful computer • ENIAC 330 Flops, 1946 • Digital Alpha-21164 processor 1.2 GFlops, 1997 • World’s most powerful computing site • ONR 583.73 KFlops, 1956 • NSA 4,088.76 GFlops, 1998-Oct-14 http://www.cnct.com/~gunter “If car industry had made equal progress, you could buy a car for a few $, drive across US in a few minutes, and park it in your pocket!”
Scientific computing today Earth & environment DNA modelling & medical research http://www.psc.edu/science/projects.html
Grand challenge “Fundamental problem in science or engineering, with potentially broad economic, political and/or scientific impact, that could be advanced by applying high performance computing resources.” Keyword: simulation
Numerical simulation 3rd paradigm of science! Phy. phenom Math. model hardware Algorithm Software
Advance in numerics • Solution of Poisson’s equation • For “standard” size n=106(100x100x100) • Multigrid 14.42 seconds • Banded LU 232.96 days Linear system with sparse matrices 56 MBytes 160 GBytes
How fast (and big) should fast computers be? Global weather prediction • Navier-Stokes on 3D grid for the earth • 100 m cells, 100 levels - 5x1012 cells • 5 variables per cell - 200 TBytes • 100 Flops/cell/minute • Required performance: 8TFlops There is never enough computing power?
Electrical potential depolarization in human heart • Grid node spacing 2 nodes/mm • Estimated 3D grid - 4,200,000 nodes • Estimated CPU time - one processor • cpu per node 3.3 seconds • total: 4,200,000x3.3 = 160 days • Elapsed physical time: 300 ms We need parallel computing http://www.ifi.uio.no/~xingca/HEART/
Parallel computing • We are approaching the limit of single microprocessor performance • We want to run larger simulations • We want shorter simulation time • More cost-effective computing
Oil reservoir simulation Simulation of 1000 days of gas injection • Single-processor workstation simulation • one day for 80,000 unknowns • 10 days for 800,000 unknowns • 200 days for 32,000,000 unknowns (impossible) • Efficient parallel computing • 128 processor IBM SP • 23 minutes for 32,000,000 unknowns (PETSc) Importance of efficient parallel computing! http://www.mcs.anl.gov/petsc/petsc.html
Main question Actual performance of real-life SC applications are well below the peak performance. Why?
LINPACK benchmark revisited • Direct solution of dense matrix systems • Limited application in SC • Simple data structure • Close to artificial test problem • Only a more realistic upper-bound of achievable peak performance - 20% of reported performance can be expected
Characteristics of SC • Data intensive computing • 1 GFlops - memory bandwidth 24GB/s (example DAXPY) • Memory hierarchy • Complex data structure • Sparse matrices • Structured grid vs unstructured grid • Adaptive grid refinement • Communication & synchronization
Multigrid method • Suits well for large sparse systems • asymptotically optimal operation count • less 100 floating pt ops per unknown • Complex data structure • Relatively low performance Stals & Rüde - Techniques for improving the data locality of iterative methods
Architecture bottleneck • Imbalance between processor speed and memory access speed • Processor speed annual increase >= 60% • Memory access speed annual increase 5%-10% • Inter-processor communication latency & bandwidth • Memory size
SC software today • Inefficient (not very cache-aware) • Not very portable • Not very easy to maintain • Not very user-friendly • Hard to program real-life applications • Limited compiler parallelism • Hard to program parallel codes
O-O numerical software • Better representation of mathematics • Manpower effective • Stable code, easy maintenance • Good flexibility & extensibility • Structured & efficient parallelization • Need care for efficiency • Standard is not settled yet
Trend in architecture http://www.netlib.org/utk/people/JackDongarra/top500-698/
Trend in CPU technology http://www.netlib.org/utk/people/JackDongarra/top500-698/
Future trends • Progress of semi-conductor technology • over 109 transistors per chip in future • increased on-chip parallelism • Architecture changes are needed • Impact on scientific computing • Rüde:Technological trends and their impact on the future of supercomputers • Different levels of parallelism
Metacomputing • Demand for enormous computing power • US airforce battle simulation (8 US supercomputing centers) • Unicore project (link supercomputers in Germany and US) • Better utilization of idle comp. power • “Seamless web” - heterogeneous comp. • Need a balanced system connected by high-speed networks • Need a scalable distr. operating system
Supercomputers in future • ASCI Option White - IBM 10 TFlops • 100 TFlops computers in near future • Petaflops (1015) • 10,000-1,000,000 procs • feasible and “affordable” in 2010?
Some observations • HPSC is a small but exciting field • Supercomputers adopt commodity tech • Affordable parallel systems available • SMP, distributed shared memory • cluster of shared memory machines • parallel computing standard appearing • Scientific software industry is still in its early stage
Challenges for SC • Numerics • faster algorithms • good data locality • low communication requirement • Software • efficient (performance, manpower) • high-level problem solving environment • Hardware • changes of architecture
Some citations ‘There’s a future for high-performance parallel computing out there.’ Tony Hey, Univ. Southampton ‘Allow datastructures and algorithms to guide us to the appropriate architecture.’ John Vrolyk, SGI senior vice president ‘Intentions of the scientific users strongly differ from the industrial users.’ Ulrich Trottenberg, GMD
Government Industry General Public? The whole picture Supercomputer Vendor Scientific Computing We are in the same boat...
Concluding remarks • Huge potential of scientific computing • More real-life applications to come • Growing demand of computing power • Scientific computing needs advances in • numerical algorithms • software technology • hardware
Quiz What was world’s fastest computer on June 2nd 1998? • ‘It was a HP notebook used on Space shuttle “Discovery” • to compute orbital position. The speed was 17,500 mph.’ • Jack Dongara