120 likes | 138 Views
Developments in High Performance Computing A Preliminary Assessment of the NAS SGI 256/512 CPU SSI Altix (1.5 GHz) Systems SC’03 November 17-20, 2003 Jim Taft NASA Ames Research Center Jtaft@nas.nasa.gov. A statement from the heart.
E N D
Developments in High Performance Computing A Preliminary Assessment of the NAS SGI 256/512 CPU SSI Altix (1.5 GHz) Systems SC’03 November 17-20, 2003 Jim Taft NASA Ames Research Center Jtaft@nas.nasa.gov
A statement from the heart “I am quite concerned that benchmarks for stripped down elements of a CFD code are being confused with performance numbers for a full CFD code, which are usually 2-10 times slower” “Taking these two things into account, my bet is that we will not beat a C-90 (e.g. 4-6 GFLOP/s) on any SNX architecture. I would love to be proved wrong, but there have been the CM2s, the Hypercubes, the Paragons, the DaVincis, the SP1s, the SP2s, etc. all of which were supposed to outperform the Crays on real problems, and which really never got close.” “Please convince me otherwise” Crusty Old CFDer (May, 1997) “Well, these points were well taken just 6 years ago. Since, then a revolution has taken place in RISC based HPC systems. We routinely beat C90 times by a factor of 100, and this is only the beginning.” Less Crusty CFDer (Nov, 2003)
A Revolution is Brewing in HPC Accelerating Scientific Discovery and Understanding The NASA Advanced Supercomputing (NAS) Division, working with NASA’s Aeronautics and Earth Sciences Enterprises, has begun exploring new ways of doing high performance computing. The effort is focused on providing adequate HPC platforms with high availability to mission critical problems, in order to significantly accelerate the science discovery process in those areas. Earth Sciences - The ECCO Project The Consortium for Estimating the Circulation and Climate of the Ocean (ECCO) is a joint venture between Jet Propulsion Laboratory, MIT, and the Scripps Institute of Oceanography. A major current effort is to execute a number of decadal ocean simulations using MIT’s MITgcm code at 1/4 degree global resolution, or better. This effort is the first major project to use the new SGI Altix 512p single-system image (SSI) computer at NAS. NASA’s new HPC system has provided a significant increase in throughput to the ECCO team. Some hint of this is seen in the performance chart (below). Additional optimizations are planned for this code on the Altix platform. These efforts are expected to increase performance to almost a simulated decade per day on 512 CPUs. This level of computing capability, dedicated to a core science team, can revolutionize the rate of scientific discovery, a process critical for national leadership in the sciences. Aerosciences - The OSP and RTF Projects A second major NAS effort to accelerate the time to solution for key mission programs, centers around work in the Aerosciences. NASA, in particular, needs extensive computational resources to address the specific needs of its Return to Flight (RTF) and Orbital Space Plane (OSP) projects. The NAS Division is working with NASA Enterprise leadership to define a small series of focused efforts in support of these activities. In the past, the OVERFLOW code has been essential for NASA and Aero industry needs in flow simulations over existing and notional designs. The Altix system has significantly improved the time to solution for this code. The chart (right) shows a comparison with previous best efforts. MITgcm Performance - 1/4 Degree (yrs/day vs CPU Count) SGI Altix - 512p SSI SystemThe SGI Altix SSI system has been operational since 10/30/03. It is routinely scaling production applications to 512 CPUs with excellent results. OVERFLOW - 35M Point Problem (GFLOP/s vs CPU Count)
Chapman: The “Current” Generation of NAS HPC CPUs 1024 (MIPS R14000) 600 MHz CPUs 1200 MFLOP/s per CPU 1.2 TFLOPS total 8 MByte cache per CPU 8 GByte total Cache Memory 256 GB main memory Disk 10 TB FC Raid disks System Software OS single system image Single XFS File System OpenMP support to 1024 CPUs The Largest SSI HPC System in the World
Altix: The First of the Next Generation at NAS CPUs 512 (Intel Itanium 2 - 1.5 GHz) 6.0 GFLOP/s per CPU 3 TFLOPS total 6 MB cache per CPU 3 GB total Cache Memory 1 TB total Cache Coherrent Globally addressable Shared Disk 20 TB FC Raid disks System Software Linux OS - Single System Image Single XFS File System OpenMP support to 512 CPUs Altix 512 - The Fastest Cache Coherent SSI HPC System in the World
Quick Note on Altix Status Event Noteworthy Items Aug, 2003 1st 128 CPUs arrive (1.3 GHz) Booted initially as 2x64 Few days later booted as 128 Excellent stability, scalability, performance Sep, 2003 2nd 128 CPUs arrive (1.5 GHz) Systems run as 2x128 and intermittent 1x256 Oct, 2003 2nd 256 CPUs arrive (1.5) All CPUs updated to 1.5 GHz System Booted as 512 CPU SSI Codes scale to 256p and 512p Routinely Nov, 2003 Limited Production use System is stable - uptime in weeks, busy 24/7 Executing mission critical code 4x previous NOTE: This joint development between NASA and SGI to move from a 64p to 512p LINUX based Altix SSI system has been highly successful. It has produced a system with wide applicability to NASA mission critical codes.
Performance Results for • Applications in the Earth Sciences • MM5 (NCAR) • MITgcm (ECCO) • POP (LANL) • CCSM (NCAR)
MM5 10Km Performance Results MM5 is a classic NCAR weather code benchmark. It is known for its excellent scaling on clusters given the right problem size. It is NOT a climate model. It is inappropriate to use MM5 for setting expectations for most climate models, which usually have great difficulty in scaling to large CPU counts on clustered systems without shared memory interconnects NOTE: The Cray X-1 results have been plotted using SSP count as the “processor” count instead of MSP. This is more of an apples to apples comparison of “processors”. Note X-1 scaling is already falling rapidly. Altix 1.5 GHz is outstanding on this code. MM5 Performance (GFLOP/s) versus “Processor” Count GFLOP/s GFLOP/s Processor Count NOTE: Non Altix Data replotted from Paul Muzio charts presented at IDC-Utah
Altics 1.5Ghz MM5 10Km Performance Results The MM5 results below are plotted for all other systems. NOTE: Paul Muzio chart presented at IDC-Utah
ECCO Code Performance 11/04/03 The ECCO code is a well known ocean circulation model, with features that allow it to run in a coupled mode where land, ice, and atmospheric models are run to provide a complete earth system modeling capability. In addition the code can run in a “data assimilation” mode that allows observational data to be used to improve the quality of the various physical sub-models in the calculation. The chart below shows the current performance on the Altix and other platforms for a “1/4 degree” resolution global ocean circulation problem. (in reality, much of the calculation runs at an effective much higher resolution due to grid shrink at the poles). Note: Virtually no changes to the code have been made across platforms. Only changes needed to make it functional have been done. The preliminary Altix results are very good to date. A number of code modifications have been identified that will significantly improve on this performance number. NOTE: The performance on both Chapman and Altix with full I/O are super-linear. That is, as you add more CPUs you get even faster speedups. The Alpha numbers show a knee at 256 CPUs. CURRENT PERFORMANCE: 256p Altix 1.5GHz = 1.6 yrs/day !!!! (or a decade a week) Performance (Yrs/day) CPU Count NOTE: Alpha Data re-plotted from Gerhard Theurich charts in NCCS paper