1 / 21

Overview of Hitachi’s Super Technical Server SR8000

The Third International Workshop on Next Generation Climate Models. Overview of Hitachi’s Super Technical Server SR8000. March, 2001. Yoshiro Aihara. Hitachi, Ltd. Enterprise Server Division. Advanced RISC Parallel. RISC Parallel. Vector Type. HITACHI Supercomputers.

carl
Download Presentation

Overview of Hitachi’s Super Technical Server SR8000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Third International Workshop on Next Generation Climate Models Overview of Hitachi’s Super Technical Server SR8000 March, 2001 Yoshiro Aihara Hitachi, Ltd. Enterprise Server Division

  2. Advanced RISC Parallel RISC Parallel Vector Type HITACHI Supercomputers New concept machine for advanced HPC users (Combination of Parallel and Vector) 10T SR8000 First commercially available distributed memory parallel processor SR2201 Series 1T Single CPU peak performance 8GFlops (Fastest in the world) 100G S-3000 Series Single CPU peak performance 3GFlops Peak Performance(FLOPS) 10G S-820 Series First Japanese Vector Supercomputer 1G S-810 Series Integrated Array Processor system 0.1G M-680 M-280H IAP IAP 0.01G M-200H IAP '77 '78 '79 '80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 ‘01 Year Announcement IAP:Integrated Array Processor

  3. Design Concept of SR8000 SR8000: New Concept combining advantages of Vector processor and RISC Parallel Processor Hitachi’s Solution Target of Design Vector processor SR8000 New Feature - PVP feature - Vector processing High Single Node Performance - COMPAS feature - Element parallel processing - High Memory Throughput Multi-dimensional Crossbar Network (High-speed inter-node network) High Scalability Short Development Cycle Easy Enhancement: RISC based processor (HITACHI developed) PVP: Pseudo Vector Processing COMPAS: Co-operative Micro-Processors in single Address Space

  4. Basic Configuration of SR8000 COMPAS: CO-operative Micro-Processors in single Address Space High performance RISC Microprocessor (Hitachi develop.) Pseudo-Vector Processing Multi-dimensional Crossbar Network High speed inter-node network Node Node Node High performance RISC High performance RISC SP System control PCI Network control Main memory I/O adapter MCD SVP : SerVice Processor MCD : Maintenance Console Device SP : System Processor Ether ATM HiPPi SVP I/O Device RAID Disk

  5. 2 nodes 8 nodes y z x 8 nodes : X axis crossbar : Nodes : Y axis crossbar Multi-dimensional Crossbar network Ex) 8x8x2 (128 nodes) configuration

  6. SR8000 Hardware Specification

  7. Pseudo-Vector Processing(PVP) Problems of conventional RISC processor - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak Main memory Prefetch Cache memory Preload Prefetch - Read data from main memory to cache before calculation - Accelerate sequential data access Preload - Read data from main memory to Extended Floating Registers before calculation - Accelerate stride memory access and indirectly addressed memory access load Extended floating point registers(160) FPU

  8. COMPAS Feature of SR8000 Realization of elementwise parallel processing of DO Loops, employed in vector supercomputer, by multiple processors in a node (Automatic elementwise parallelization in a node by compiler) Program Behavior IP IP IP IP (waiting for startup) (waiting for startup) (waiting for startup) Scalar Part ・・・ Start Parallel Inst. Loop Part Loop Part Loop Part Loop Part End Parallel Inst. Scalar Part Hardware Feature(COMPAS) ・・・ IP IP IP IP Realization of high speed processing of multiple processors by hardware high-speed communication mechanism SC High-speed Communication Mechanism MS IP:Instruction Processor SC:Storage Controller MS:Main Storage COMPAS: CO-operative Micro-Processors in single Address Space

  9. Programming Models

  10. Physical Data of SR8000 Example; 128 Node Configuration (G1 model) Power Consumption; approx. 370 kVA Heat Dissipation; approx. 330 kW Cooling Air Inlet Temperature; 16--22 deg C Weight; approx. 15,000 kg Floor Space; approx. 50 sq. meters (incl. service area) approx. 8.0 m Foot Print (128 node) approx. 3.3 m Height: approx. 1.8 m

  11. Overview of Software Products HI-UX/MPP OSF/1 Microkernelbased OS NQS, BGT, DIFF, SFF, PFF OS Language Processor Optimizing FORTRAN77/90, HPF, Optimizing C, C++, OpenMP (Ver1) Program Development Parallel Library MPI-2, PVM, PARALLELWARE Numerical Calculation MATRIX/MPP,MATRIX/MPP/SSS,MSL2 Development Support Symbolic Debugger OptimizingC /FORTRAN90 Performance Monitor(for HP-UX) Graphics X11R6, Motif1.2 GUI Graphic Library GKS, PEX, PHIGS Network Ethernet / Fast Ethernet, GbE, HiPPi, ATM TCP/IP, NFS V3, telnet, rlogin

  12. 3500 Series H-9000V Series WS PC X Terminal UNIX(OSF/1) Server (Functional co-operation with other nodes) Micro-kernel (Control of all IPs) Single UNIX System • Single UNIX System : Single System Operation (File system, Process control, Network) • Open System (Standardized OS, Compiler, Network) • Flexible System Operation (Partitioning Operation, Automatic Operation) • Scalable System (4 to 512 nodes) SR8000 Console Graphic 3D-XB Other Vendor (SGI, etc........) Node Node Node Node Disk HIPPI Node Node Node Node Network Single UNIX System RAID Node Node Node Node Node Node Disk HIPPI SR2000 Series 3D-XB Ethernet Node Node Node Node COMPAS Feature Main Storage ... IP IP IP IP IP SP COMPAS (CO-operative Micro-Processors in single Address Space) IP:Instruction Processor 3D-XB: 3-dimensional Cross-bar Network

  13. Remote DMA Transfer ● Direct Memory Copy between User Program on Different Nodes that minimizes OS Overhead Protocol Processing Context Switch Interrupt Handling Remote DMA Transfer No Buffering in Kernel No OS System Call Normal Transfer Node Node Program Program data data memory copy memory copy OS OS Send Buffer Receive Buffer data data Crossbar Network

  14. Examples of ISV Package MSC.Nastran MSC.Marc LS-DYNA PAM_CRASH ABAQUS/Standard ABAQUS/Explicit Structural Analysis STAR-CD PHOENICS SCRYU STREAM Computational Fluid Dynamics FLUENT Chemical Analysis GAUSSIAN98 AMBER NAG Libraries IMSL TotalView Vampir Tools AVS/EXPRESS

  15. Leibniz Rechenzentrum (Germany) High Energy Accelerator Research Organization University of Tokyo Japan Meteorological Agency University of Tokyo / Institute for Solid State Physics Tsukuba advanced Computing Center - TACC / AIST Meteorological Research Institute Hokkaido University Institute of Statistical Mathematics HWW / Universitat Stuttgart & DLR (Germany) .. SR8000 Installation Sites (Example)

  16. TOP500 Supercomputing Sites - November 3rd, 2000 Rmax/Rpeak > 75 % Hitachi SR8000 works efficiently.

  17. TOP500 Supercomputing Sites - November 3rd, 2000 Rmax/Rpeak = 85.3 % on SR8000/128 Rmax/Rpeak = 90.0 % on SR8000-E1/80 Hitachi SR8000 works efficiently.

  18. SR8000 F1 & G1 LINPACK Performance SR8000G1 SR8000F1 313.30 Gflop/s on SR8000F1/32 with Nmax=65000 ↓ 6% Speed Up 331.50 Gflop/s on SR8000F1/32 with Nmax=84800 ↓ 20% Speed Up 398.50 Gflop/s on SR8000G1/32 with Nmax=84800

  19. NAS Parallel Benchmark (FT) Model G1 is 1.28~1.30 times faster than Model F1. FT: A 3-D fast-Fourier transform partial differential equation benchmark

  20. NAS Parallel Benchmark (MG) Model G1 is 1.22~1.24 times faster than Model F1. MG: a simple 3D multigrid benchmark

  21. MPI Ping-Pong Performance Remote DMA (Direct Memory Access) is sender driven and makes memory to memory copy of data. Remote DMA provides a high-speed inter-processor communication function without redundant copying.

More Related