1 / 63

WTEC Panel on High End Computing in Japan Site visits : March 29 - April 3, 2004

WTEC Panel on High End Computing in Japan Site visits : March 29 - April 3, 2004. Study Commissioned By: National Coordination Office Department of Energy National Science Foundation National Aeronautics and Space Administration. WTEC Overview.

rona
Download Presentation

WTEC Panel on High End Computing in Japan Site visits : March 29 - April 3, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WTEC Panel on High End Computing in JapanSite visits: March 29 - April 3, 2004 Study Commissioned By:National Coordination Office Department of Energy National Science Foundation National Aeronautics and Space Administration

  2. WTEC Overview • Provides assessments of research and development • This was one of 55 international technology assessments done by WTEC • WTEC Process • Write proposals for NSF “umbrella” grants • Put together a coalition of sponsors • Recruit a panel of experts • Conduct the study with on-site visits • Publish a report • Full text reports at wtec.org WTEC High End Computing in Japan

  3. Purpose & Scope of this Study • Gather information on current status and future trends in Japanese high end computing • Govt agencies, research communities, vendors • Focus on long-term HEC research in Japan • Compare Japanese and U.S. HEC R&D • Provide review of ES development process and operational experience • Include user experience and its impact on computer science and computational science communities • Report on follow-on projects • Determine HEC areas amenable for Japan-U.S. cooperation to accelerate future advances WTEC High End Computing in Japan

  4. WTEC HEC Panel Members Al Trivelpiece (Panel Chair) Former Director Oak Ridge National Laboratory Peter Paul Deputy Director, S&T Brookhaven National Laboratory Rupak Biswas Group Lead, NAS Division NASA Ames Research Center Kathy Yelick Computer Science Professor University of California, Berkeley Jack Dongarra Director, Innovative Computing Lab University of Tennessee & Oak Ridge National Laboratory Horst Simon (Advisor) Director, NERSC Lawrence Berkeley National Lab Dan Reed (Advisor) Computer Science Professor University of North Carolina, Chapel Hill Praveen Chaudhari (Advisor) Director Brookhaven National Laboratory WTEC High End Computing in Japan

  5. Sites Visited (1) • Earth Simulator Center • Frontier Research System for Global Change • National Institute for Fusion Science (NIFS) • Japan Aerospace Exploration Agency (JAXA) • University of Tokyo • Tokyo Institute of Technology • National Institute of Advanced Industrial S&T (AIST) • High Energy Accelerator Research Org. (KEK) • Tsukuba University • Inst. of Physical and Chemical Research (RIKEN) • National Research Grid Initiative (NAREGI) • Research Org. for Information Sci. & Tech. (RIST) • Japan Atomic Energy Research Institute (JAERI) WTEC High End Computing in Japan

  6. Sites Visited (2) • Council for Science and Technology Policy (CSTP) • Ministry of Education, Culture, Sports, Science, and Technology (MEXT) • Ministry of Economy, Trade, and Industry (METI) • Fujitsu • Hitachi • IBM-Japan • Sony Computer Entertainment Inc. (SECI) • NEC WTEC High End Computing in Japan

  7. HEC Business and Government Environment in Japan

  8. Government Agencies • Council for Science & Tech. Policy (CSTP) • Cabinet Office, PM resides over monthly meetings • Sets strategic directions for S&T • Rates proposals submitted to MEXT, METI and others • Ministry of Education, Culture, Sports, Science, and Technology (MEXT) • Funds most of S&T R&D activities in Japan • Funded the Earth Simulator • Ministry of Economy, Trade, & Industry (METI) • Administers industrial policy • Funds R&D projects with ties to industry • Not interested in HEC, except for grids WTEC High End Computing in Japan

  9. Business and Government • New Independent Administrative Institution (IAI) model • Some research institutes had already converted • Universities were being converted during our visit • Govt. funds institution as whole; control own budget • Funding being cut annually as well • Commercial viability of vector supers is problematic. • Only NEC still committed to this architectural model • Commodity PC clusters increasingly prevalent • All three Japanese vendors have cluster products WTEC High End Computing in Japan

  10. Business Partnerships • Each of the Japanese vendors is partnered with a US vendor • NEC and Cray ? • Fujitsu and Sun Microsystems • Hitachi and IBM WTEC High End Computing in Japan

  11. HEC Hardware in Japan

  12. Architecture/Systems Continuum Loosely Coupled • Commodity processor with commodity interconnect • Clusters • Pentium, Itanium, Opteron, Alpha, PowerPC • GigE, Infiniband, Myrinet, Quadrics, SCI • NEC TX7 • Fujitsu IA-Cluster • Commodity processor with custom interconnect • SGI Altix • Intel Itanium 2 • Cray Red Storm • AMD Opteron • Fujitsu PrimePower • Sparc based • Custom processor with custom interconnect • Cray X1 • NEC SX-7 • Hitachi SR11000 Tightly Coupled WTEC High End Computing in Japan

  13. Peak (128 nodes): 85 Tflop/s system 5.2 Gflop/s / proc 41.6 Gflop/s system board 666 Gflop/s node 1.3 GHz Sparc based architecture 8.36 GB/s per system board 133 GB/s total Fujitsu PRIMEPOWER HPC2500 High Speed Optical Interconnect 128Nodes 4GB/s x4 SMPNode 8‐128CPUs SMPNode 8‐128CPUs SMPNode 8‐128CPUs SMPNode 8‐128CPUs ・ ・ ・ ・ Crossbar Network for Uniform Mem. Access (SMP within node) <System Board> <DTU Board> <System Board> PCIBOX CPU CPU CPU CPU D T U D T U D T U D T U Channel Channel CPU CPU CPU CPU ・ ・ ・ memory memory CPU CPU CPU CPU Adapter Adapter CPU CPU CPU CPU … … System Board x16 to Channels to I/O Device to High Speed Optical Interconnect DTU : Data Transfer Unit WTEC High End Computing in Japan

  14. Fujitsu IA-Cluster: System Configuration • System Configuration • Compute Node Control Node • FUJITSU PRIMERGY (1U) • PRIMERGY BX300 Max. 20 blades in a 3U chassis • PRIMERGY RXI600 IPF(1.5GHz): 2~4CPU Compute Nodes Giga Ethernet Switch InfiniBand or Myrinet Switch • Compute Network InfiniBand or Myrinet for Compute Network Control Network Compute NetworkInfiniBand, Myrinet WTEC High End Computing in Japan

  15. Latest Installation of FUJITSU HPC Systems WTEC High End Computing in Japan

  16. HITACHI’s HPC system SR11000 Peak Performance First HPC machine combined with vector processing and scalar processing 100,000 [GFLOPS] SR8000 First commercially available distributed memory parallel processor 10,000 G1 F1 E1 Single CPU peak performance 8GFlops (Fastest in the world) H1 SR2201 1,000 Single CPU peak performance 3GFlops 100 D Vector-Scalar Combined type S-3800 C B A 480 First Japanese Vector Supercomputer 10 S-3800 POWER4+ AIX 5L S-820 140 Vector-Scalar combined type 80 S-810 Integrated Array Processor system 180 1 HI-UX/MPP 60 S-3600 20 40 10 20 120 0.1 5 15 Scalar Parallel (MPP type) Auto Parallelization M-680 M-280H IAP IAP VOS3/HAP,HI-OSF/1-MJ 0.01 Vector Automatic Pseudo Vectorization M-200H IAP Automatic Vectorization '92 '93 '77 '78 '79 '80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 ‘04 ‘05 WTEC High End Computing in Japan

  17. Preload Prefetch Preload (S/W Ctl) (S/W Ctl) (H/W Ctl) SR8000 PseudoVectorProcessing(PVP) Pseudo Vector Vector Problems of conventional RISC - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak Arithmetic Unit PVP Feature Arithmetic Unit Pipelining Floating-point Registers (FPRs) Vector Register • Prefetch • Read data from main memory to • cache before calculation • - Accelerate sequential data access • Preload • - Read data from main memory to Floating Registers before calculation • - Accelerate stride memory access and indirectly addressed memory access Load Cache Pipelining MS MS WTEC High End Computing in Japan

  18. Node 2-6 planes Hitachi SR11000 • Based on IBM Power 4+ • SMP with 16 processors/node • 109 Gflop/s / node(6.8 Gflop/s / p) • IBM uses 32 in their machine • IBM Federation switch • Hitachi: 6 planes for 16 proc/node • IBM uses 8 planes for 32 proc/node • Pseudo vector processing features • Minimal hardware enhancements • Fast synchronization • No preload like SR 8000 • Hitachi’s Compiler effort is separate from IBM • Automatic vectorization, no plans for HPF • 3 customers for the SR 11000, • National Institute for Material Science Tsukuba - 64 nodes (7 Tflop/s) • Okasaki Institute for Molecular Science - 50 nodes (5.5 Tflops) • Institute for Statistic Math Institute - 4 nodes WTEC High End Computing in Japan

  19. Prefetch Preload (S/W Ctl)(H/W Ctl) (H/W Ctl) SR11000 Pseudo Vector Processing (PVP) Pseudo Vector Vector Problems of conventional RISC - Reduction of performance for large scale simulations because of cache-overflow - Sustained : Under 10% of peak Arithmetic Unit PVP Feature Arithmetic Unit Pipelining Floating-point Registers (FPRs) Vector Register Load • Prefetch • Read data from main memory to • cache before calculation • - Accelerate sequential data access Cache Pipelining MS MS WTEC High End Computing in Japan

  20. SR11000 Next Model • Continuing IBM partnership • Power5 processor • Greatly enhanced memory bandwidth - Flat Memory Interleaving • Hardware Barrier Synchronisation Register WTEC High End Computing in Japan

  21. NEC HPC Products High-End Capability Computing Parallel Vector Processors SX-6/7 Series Middle - Small Size Capacity Computing IA-64 SERVER TX7 TX7 SERIES Express5800/1160Xa Express 5800 Parallel Linux Cluster Parallel PC- Clusters IA-32 Workstations Express 5800/50 Series WTEC High End Computing in Japan

  22. cc-NUMA architecture employs a chipset and crossbar switch developed in-house by NEC achieves near uniform high-speed memory access. up to 32 Itanium² Processors up to 128 GB of RAM Linux operating system with NEC enhancements more than 100GF on Linpack file server functionality for SX TX7 Itanium² Server WTEC High End Computing in Japan

  23. SX-Series Evolution NEXT GENERATION SX USE THE LATEST TECHNOLOGY TO BUILD UP AND DEVELOP THE NEW SUPERCOMPUTER 2001 1998 1994 SX-6 Series - SINGLE-CHIP VECTOR PROCESSOR -GREATER SCALABILITY 1989 SX-5 Series -HIGH SUSTAINED PERFORMANCE -Large Capacity SHARED MEMORY 1983 SX-4 Series -CMOSINNOVATIVE TECHNOLOGY -ENTIRELY AIR-COOLING The Latest Technology Always in SX-Series SX-3 Series -SHARED MEMORY・MULTI-FUNCTION PROCESSOR -UNIX OS SX Series -THE FIRST COMPUTER IN THE WORLD SURPASSING 1GFLOPS WTEC High End Computing in Japan

  24. SX-6: 8 proc/node 8 GFlop/s, 16 GB processor to memory SX-7: 32 proc/node 8.825 GFlop/s, 256 GB, processor to memory Rumors of SX-8 8 CPU/node 26 Gflop/s / proc NEC SX-7/160M5 WTEC High End Computing in Japan

  25. Special Purpose: GRAPE-6 • The 6th generation of GRAPE (Gravity Pipe) Project • Gravity (N-Body) calculation for many particles with 31 Gflop/s / chip • 32 chips / board - 0.99 Tflop/s / board • 64 boards of full system is installed in University of Tokyo- 63 Tflop/s • On each board, all particles data are set onto SRAM memory, and each target particle data is injected into the pipeline, then acceleration data is calculated • No software! • Gordon Bell Prize at SC for a number of years (Prof. Makino, U. Tokyo) WTEC High End Computing in Japan

  26. Sony PlayStation2 • Emotion Engine: • 6 Gflop/s peak • Superscalar MIPS 300 MHz core + vector coprocessor + graphics/DRAM • About $200 • 70M sold • PS1 100M sold • 8K D cache; 32 MB memory not expandable OS goes here as well • 32 bit fl pt; not IEEE • 2.4GB/s to memory (.38 B/Flop) • Potential 20 fl pt ops/cycle • FPU w/FMAC+FDIV • VPU1 w/4FMAC+FDIV • VPU2 w/4FMAC+FDIV • EFU w/FMAC+FDIV

  27. High-Performance Chips Embedded Applications • The driving market is gaming (PC and game consoles) • Motivation for almost all the technology developments. • Demonstrate that arithmetic is quite cheap. • Today there are three big problems with these apparent non-standard "off-the-shelf" chips. • Most of these chips have very limited memory bandwidth and little if any support for inter-node communication. • Integer or only 32 bit floating point • No software support to map scientific applications to these processors; minimal general-purpose programming tools. • Poor memory capacity for program storage • Not clear that they do much for scientific computing. • Developing "custom" software is much more expensive than developing custom hardware. WTEC High End Computing in Japan

  28. TOP500 Data WTEC High End Computing in Japan

  29. Top 20 Computers Where They are Located WTEC High End Computing in Japan

  30. Efficiency is Declining Over time • Analysis of top 100 machines in 1994 and 2004 • Shows the # of machines in the top 100 that achieve a given efficiency on the Linpack benchmark • In 1994 40 machines had >90% efficiency • In 2004 50 have < 50% efficiency WTEC High End Computing in Japan

  31. ESS Impact on Climate Modeling • NERSC IBM SP3: • 1 simulated year per compute day on 112 processors • ORNL/NCAR IBM SP4: • ~2 simulated years per compute day on 96 processors • ORNL/NCAR IBM SP4: • 3 simulated years per compute day on 192 processors • ESS: • 40 simulated years per compute day on unknown number of processors (probably ~128) • Cray X1 rumor: • 14 simulated years per compute day on 128 procs. Source: Michael Wehner WTEC High End Computing in Japan

  32. Technology Transfer from Research • Numerical Wind Tunnel → Fujitsu VPP500 • cp-pacs → Hitachi SR2201 • Earth Simulator → NEC SX-6 • Grape, MDM, eHPC, … → ?(MD-engine) • Government projects encouraged new architectures. • New technologies were commercialized. WTEC High End Computing in Japan

  33. Hardware Summary • The commercial viability of "traditional" supercomputing architectures with vector processors and high-bandwidth memory subsystems is problematic. • NEC only remaining in Japan • Clusters are replacing traditional high-bandwidth systems WTEC High End Computing in Japan

  34. HEC Software in Japan

  35. Software Overview • Emphasis on vendor software • Fujitsu, Hitachi, NEC • Earth Simulator software • Languages and compilers • Persistent effort in High Performance Fortran • Including HPF/JA extensions • Use of common libraries • Little academic work for supercomputers: vendors supply tools • Support for clusters WTEC High End Computing in Japan

  36. Achievements HPF on the Earth Simulator • PFES • Oceanic General Circulation Model based on Princeton Ocean Model • Achieved 9.85TFLOPS with 376 nodes • 41% of the peak performance • Impact3D • Plasma fluid code using Total Variation Diminishing (TVD) scheme • Achieved 14.9 TFLOPS with 512 nodes • 45% of the peak performance WTEC High End Computing in Japan

  37. HPF/JA Extensions • HPF research in language and compilers • HPF 2.0 extends HPF 1.0 for irregular apps • HPF/JA further extends HPF for performance • REFLECT: placement of near-neighbor communication • LOCAL: communication not needed for a scope • Extended ON HOME: partial computation replication • Compiler doesn’t need full interprocedural communication and availability analyses • HPF/JA was a consortium effort by vendors • NEC, Hitachi, Fujitsu WTEC High End Computing in Japan

  38. Interconnection Network 共有メモリ 共有メモリ Main Memory ・・・ Vectorization and Parallelization on the Earth Simulator (NEC) Inter-node Parallelization HPF MPI HPF Open MP Intra-node Parallelization AP AP AP AP Automatic parallelization AP AP AP AP Vectorization Processor Node WTEC High End Computing in Japan

  39. Hitachi Automatic Vectorization = COMPAS + PVP Parallelized with parallel libraries (HPF,MPI,PVM,etc.) Inter-node COMPAS (Automatic parallelization) Node PVP (Automatic pseudo vectorization) IP Example of applied image b DO i=1,l Inter-node parallelization (With parallel libraries) DO j=1,m Intra-node elementwise parallel processing (COMPAS) DO k=1,n Vector processing in IP (With PVP) Inner DO loop PVP: Pseudo Vector Processing COMPAS: CO-operative Micro-Processors in single Address Space IP : Instruction Processor WTEC High End Computing in Japan

  40. Conclusions • Longer sustained effort on HPF than in the US • Part of the Earth Simulator Vision • Successful on two of the large codes, including GB prize • Languages extensions were also needed • MPI is dominant model for internode communication • Although larger nodes on Vector/Parallel means smaller degree of MPI parallelism • Combined with automatic vectorization within nodes • Other familiar tools developed outside Japan: • numerical libraries, debuggers, etc. WTEC High End Computing in Japan

  41. Grid Computing in Japan Kathy Yelick U.C. Berkeley and Lawrence Berkeley National Laboratory

  42. Outline • Motivation for Grid Computing in Japan • E-Business, E-Government, Science • Summary of grid efforts • Labs, Universities, • Grid Research Contributions • Hardware • Middleware • Applications • Funding summary WTEC High End Computing in Japan

  43. Grid Motivation • e-Japan: create a "knowledge-emergent society," where everyone can utilize IT • In 2001, Japan internet usage was at the lowest level among major industrial nations • Four strategies to address this: • Ultra high speed network infrastructure • Facilitate electronic commerce • Realize electronic government • Key is information sharing across agencies and society • Nurturing high quality human resources • Training, support of researchers, etc. WTEC High End Computing in Japan

  44. Overview of Grid Projects in Japan • Super-SINET (NII) • National Research Grid Initiative (NAREGI) • Campus Grid(Titech) • Grid Technology Research Center (AIST) • Information Technology Based Lab (ITBL) • Applications: • VizGrid (JAIST) • BioGrid (Osaka-U) • Japan Virtual Observatory (JVO) WTEC High End Computing in Japan

  45. SuperSINET: All Optical Production Research Network • Operational since Jan. 2002 • 10Gbps Photonic Backbone • GbEther Bridges for peer-connection • 6,000+km dark fiber • 100+ e-e lambda and 300+Gb/s WTEC High End Computing in Japan

  46. NAREGI: National Research Grid Initiative • Funded by MEXT: Ministry of Education, Culture, Sports,Science and Technology • 5 year project (FY2003-FY2007) • 2 B Yen(~17M$) budget in FY2003 • Collaboration of National Labs. Universities and Industry in the R&D activities • Applications in IT and Nano-science • Acquisition of Computer Resources underway WTEC High End Computing in Japan

  47. NAREGI Goals • Develop a Grid Software System: • R&D in Grid Middleware and Upper Layer • Prototype for future Grid Infrastructure in scientific research in Japan • Provide a Testbed • 100+Tflop/s expected by 2007 • Demonstrate High-end Grid Computing Environment can be applied to Nano-science • Simulations over the Super SINET • Participate in International Collaboration • U.S., Europe, Asian Pacific • Contribute to standards activities, e.g., GGF WTEC High End Computing in Japan

  48. ISSP Small Test App Clusters Kyoto Univ. Small Test App Clusters Tohoku Univ. Small Test App Clusters KEK Small Test App Clusters AIST Small Test App Clusters NAREGI Phase 1 Testbed ~3000 CPUs ~17 Tflops TiTech Campus Grid Osaka Univ. BioGrid AIST SuperCluster Kyushu Univ. Small Test App Clusters Super-SINET (10Gbps) Comp. Nano-science Center (IMS) ~10 Tflops Center for GRID R&D (NII) ~5 Tflops WTEC High End Computing in Japan

  49. AIST Super Cluster for Grid R&D P32: IBM eServer325 Opteron 2.0GHz, 6GB 2way x 1074 node Myrinet 2000 8.59TFlops/peak 10,200mm Myrinet M64: Intel Tiger 4 Madison 1.3GHz, 16GB 4way x 131 node Myrinet 2000 2.72TFlops/peak 10,800mm F32: Linux Networx Xeon 3.06GHz, 2GB 2way x 256+ node GbE 3.13TFlops/peak P32 M64 total 14.5TFlops/peak, 3188 CPUs WTEC High End Computing in Japan

  50. NAREGI Grid Software Stack WP6: Grid-Enabled Apps WP3: Grid Visualization WP3: Grid PSE WP3: Grid Workflow WP2: Grid Programming-Grid RPC -Grid MPI WP4: Packaging WP1: Grid Monitoring & Accounting WP1: SuperScheduler (Globus,Condor,UNICOREOGSA) WP1: Grid VM WP5: High-Performance & Secure Grid Networking Note: WP = “Work Package” WTEC High End Computing in Japan

More Related