240 likes | 403 Views
The 3rd International Workshop on Next Generation Climate Models for Advanced High Performance Computing Facilities. Visualization for High-End Weather and Climate Modeling on the NEC SX Series. March 29, 2001 Hiroshi Takahara & Toshifumi Takei NEC Corporation.
E N D
The 3rd International Workshop on Next Generation Climate Models for Advanced High Performance Computing Facilities Visualization for High-End Weather and Climate Modeling on the NEC SX Series March 29, 2001 Hiroshi Takahara & Toshifumi Takei NEC Corporation E-mail: h-takahara@bc.jp.nec.com
1 Peta 100 10 1 Tera 100 10 1 Giga Advancement of HPC Technology Multi-layered highly parallel processing / Distributed global computing FLOPS Future system High performance 1-chip vector / massively parallel processing Earth Simulator Highly parallel vector processing Supercomputer Parallel processing cluster Server Micro processor PC Structural & thermal analysis Weather/climate Ocean Biochemistry Applications CFD Crash
Required performance and memory capacity Memory size Climate Modeling Turbulence Simulation Human Genome Oceanic Circulation Viscous Fluid Dynamics Semiconductor Modeling Quantum Chromodynamics 80 GBytes 8 G Vehicle Designing Structural Biology Pharmaceutical Designing Chemical Dynamics 72-Hour Weather 800 M Estimate of Higgs Boson Mass 80 M 3-D Plasma Modeling 48-Hour Weather 2-D Plasma Modeling Oil Reservoir Modeling 8 M Airfoil (DARPA) 100M 1G 10G 100G 1T Performance (FLOPS)
Vector & scalar processing Vector --- tailored to large-scale simulations and huge data (Meteo/climate, CFD, crash…) Vector-tailored Data size Genome Weather Scalar--- suitable for small-to-medium sized problems Limited performance scalability due to inter-PE communications Scalar-tailored Crash CFD Chemistry FEM Amount of computation
Merits and Demerits of Each Architecture Shared Distributed Distributed・Shared Excellent Effective Performance Ease of Use(Auto Parallelization) Difficult Parallelization (Require High Skills) Vector High Cost,Limited Scalability Wide Application Range High Scalability Difficult Parallelization (Require High Skills) Poor Effective Performance Limited Scalability Scalar Ease of Use (Auto Parallelization) Excellent Cost/Peak Performance :Merits :Demerits
Some views from the weather & climate community for vector computers Shared-memory, vector computers manufactured in Japan, have a combination of usability and performance... The purchase of Japanese vector computers would have an immediate impact on climate and weather science in the U.S. The use of distributed memory, commodity-based processor parallel computers increases the needed software investment … USGCRP Report (Dec.2000)
Pros and Cons about the Validity of the TOP500 Pros: Ranking covering worldwide high-performance computers with much sway Cons: NOT representing a complete range of applications. Too much impacts in policy making Changing acceptance among the HPC community because of the increased dominance of business computing vendors (particularly for lower rankings) Finance, DB, Web etc.72 sites #15/34 Charles Schwab #53 European Patent Office #93 Sobeys #102 Deutch Telekom #112 Bank Administration Institute (BAI) #120 State Farm #177 NTT #213 Chase Manhattan 1 IBM ASCI White, 4938 Lawrence Livermore National Laboratory 2 Intel ASCI Red 2379 Sandia National Labs 3 IBM ASCI Blue-Pacific 2144 Lawrence Livermore National Laboratory 4 SGI ASCI Blue Mountain 1608 Los Alamos National Laboratory 5 IBM SP Power3 375 MHz1417 Naval Oceanographic Office (NAVOCEANO) 6 IBM SP Power3 375 MHz1179 National Centers for Environmental Prediction 7 Hitachi SR8000-F1/112 1035 Leibniz Rechenzentrum 8 IBM SP Power3 375 MHz 8 way929 UCSD/San Diego Supercomputer Center 9 Hitachi SR8000-F1/100 917 High Energy Accelerator Research Organization /KEK 10 Cray Inc. T3E1200 892Government Nov. 2000 ** Rank 176: Rakuten is the largest cyber mall in Japan!!! 250 Finance, DB, Web etc.54 sites #136 New York City - Human Resources #139 Bank Westboro #140 E-commerce Stanta Clara #169 Ariline London #170 Bank Milano #171 Bank Munich #173 Chase GlobalNet #176Rakuten ** 200 150 Intel(1) 215 HPTi(1) Self(5) 100 HP (5) Compaq (11) Hitachi (16) 92 Fujitsu (17) 50 67 47 Cray NEC(23) 0 IBM Sun SGI
SX-5 Series / SX-5S (HPC Server) Products SX- 5 Series 5T ● ● ● ● ● ● A B C D Be Ce ● ● ● ● ● ● ● IXS or HIPPI-SW (FLOPS) Peak performance 4T Multi-node 1T A Model 64G - 128GF Single node 4GFLOPS・ CPU Model 160G 80G 40G 20G 10G 100G 8G - 16GFLOPS B Model 32G - 64GFLOPS HPC Server SX- 5S 10G ● 16G C Model 16G - 32GFLOPS ● 8G ● 4G D Model 8G - 16GFLOPS 4G - 8GFLOPS 1G
...what you pay for: Performance of Mission-Critical NWP Codes on the SX Series
SX Series in Meteorology / Environmental Science SX Series at Worldwide Major Meteorological Institutions ・Danish Meteorological Institute(DMI) ・Institute for Atmospheric Physics in Germany(IAP) ・Deutsches Klimarechenzentrum (DKRZ) ・IRI Lamont Doherty ・Czech Hydrometeorological Institute(CHMI) ・Interdisciplinary Center for Mathematical and Computational Modeling, Warsaw University(ICMW) Europe North America ・Korea Meteorological Administration(KMA) ・Atmospheric Environment Service(AES) Asia Japan ・National Institute of Environmental Studies (NIES) ・Japan Marine Science and Technology Center (JAMSTEC) ・Frontier Research System for Global Change Ѓњ ・Istituto Nazionale di Geofisica e Vulcanologia (INGV) ・Meteorological Service Singapore (MSS) ・Instituto Nacional De Pesquis Espaciais (INPE) Swiss Center for Scientific Computing (CSCS) ・Bureau of Meteorology (BOM)/CSIRO South America Australia
Real-Time Visual Simulation LibraryRVSLIB Image-based visualization tailored to large volume of data resulting from numerical simulations / observations http://www.sw.nec.co.jp/APSOFT/SX/rvslib_e/
program cfd c implicit real*8 (a-h,o-z) parameter ( maxi=81,maxj=41,maxk=5 ) parameter ( maxgrd=maxi*maxj*maxk ,maxobj=101 & ,maxiwk=512*512*15 ,maxrwk=maxgrd*61 ) integer irvslibstate c c-- permanent array -- dimension x(maxgrd),y(maxgrd),z(maxgrd) & ,scal(maxgrd*5) & ,iobj(maxobj*6),rwork(maxrwk),iwork(maxiwk) & ,iobj2(maxobj*6) : : 1.199937909288815 -0.1175956774159311 3.017484603229200D-04 0.3024392297917339 1.219247451290822 -0.1220634853191233 2.453941548883239D-04 0.2809288930730908 1.238643912752106 -0.1256990939733991 1.843648193366380D-04 0.2589568927816136 1.257967395765750 -0.1284517830919192 1.183467845204172D-04 0.2367159678901764 1.277045209522219 -0.1303017241312552 4.732921003254189D-05 0.2144004355856013 1.295902107579022 -0.1312699897486619 -2.843950729607857D-05 0.1922945263984200 1.314703922311307 -0.1313582658024421 -1.051331901487677D-04 0.1699623273723783 : : 1.199937909288815 -0.1175956774159311 3.017484603229200D-04 0.3024392297917339 1.219247451290822 -0.1220634853191233 2.453941548883239D-04 0.2809288930730908 1.238643912752106 -0.1256990939733991 1.843648193366380D-04 0.2589568927816136 1.257967395765750 -0.1284517830919192 1.183467845204172D-04 0.2367159678901764 1.277045209522219 -0.1303017241312552 4.732921003254189D-05 0.2144004355856013 : Challenges in Visualizing a Large Volume of Data Internet Data transfer bottleneck Transferring GB-order data over a network is next to impossible. Ex. Effective performance1MB/sec -->100GB/(1MB/sec)=28h Computing server User’s terminal Memory capacity problem Loading a large volume of data that were output by a supercomputer may be difficult. Post-processor Disk space problem Storing all the computational results for each parameter set needs more than several GBytes of disk space. Ex. 100*100*100 grid points*10000 time steps --- 200 GBytes (5 variables at each grid point)
Intensive needs for grasping simulatedresults on the fly • Memory capacity NWP code : 100-200 array elements per grid Increasing demand with model resolution and complexity T319L50 model (40km mesh) requires 20-40 GBytes Ensemble forecasting of 50 members --> >> 1TBytes Data assimilation / chemical models much demanding • Climate code :1-year simulation • 30-60 Gbytes (T213L50) 2-4TBytes (T1280L100) • Disk space NCAR: empirically 114 Bytes per MFLOP 5TBytes/month net growth* (*RCI Workshop, April 2000)
Approach A : Conventional Post-processing –(Vis5D, GrADS, and many of off-the-shelf packages) Graphical mapping and rendering on the client side Approach adopted by many conventional post-processors • Advantages • Full exploitation of server for number crunching and local • machine resources for graphical processing • Drawbacks • Challenges in transferring a huge volume of (polygon) data • across the network and manipulating them on the local server Numerical simulation Mapping Rendering Image display Computing server User’s terminal
-Approach B (Server-side Visualization)-Approach of NEC RVSLIB • Both mapping and rendering processes on the server side • Advantages • Efficient usage of network because of transfer of image data (NOT massive polygon data) • Image compression techniques available for further reduction of data • Drawbacks • Increased load of computing and memory resources on the server side for graphical mapping and rendering processes Numerical simulation Mapping Rendering Image display User’s terminal Computing server
RVSLIB: Real-time Visual Simulation Library Network (LAN/WAN) • Monitoring of an on-going simulation (tracking) and alteration of its parameters (steering) while continuing the simulation • - Constant and reduced data transfer rate between the server and client regardless of the scale of simulations Program (Calling RVSLIB) RVSLIB Client Compressed Image Data (Flow Simulator etc.) Image Display GUI Rendering Animation Tracking RVSLIB Server Steering Visualization Steering of Solver Creation of Image Scenario File Animation File Terminal (Workstation/PC) Computing Server (Supercomputer/Workstation) RVSLIB/Server: SX, WS RVSLIB/Client: PC,WS (Java) Reduced Cost and Effort Efficient Use of NW Bandwidth
RVSLIB/Client (GUI) (interactive mode) Tracking and steering of user code - UNIX version based on X/Motif - Java version for Windows / UNIX C/S communica- tion protocols Intranet - TCP/IP socket Internet/firewall - HTTP Single machine - Shared memory Off-line converter Scenario script Movie in AVI etc. Movie in avi or mpeg2 Usage of RVSLIB Interactive Mode Server User’s code RVSLIB server --> Initialization in interactive mode Handshake with the client in batch mode Loading a scenario --> Data management (no data copy) --> Rendering C/S communication --> Termination CALL RVS_INIT Main loop body Time integration CALL RVS_BFC CALL RVS_MAIN Batch Mode CALL RVS_TERM Movie generation (batch mode) Data interfaces with GrADS and NetCDF formats supported for post-processing
Best Benefits Gained From RVSLIB Reduced cost and effort on a trial-and-error basis -Monitoring of an on-going simulation (tracking) and alteration of its parameters (steering) while continuing the simulation - Conventional post-processing and batch-mode graphics also available Efficient use of vector/parallel facilities and network - Efficient graphical processing and image creation capitalizing on vector/parallel computing capabilities - Reduced and almost constant network traffic exploiting image data compression Animation based on scenario - Easily navigable visualization based on a plot described in a scenario file Library format tailored to a wide spectrum of simulation programs (BFC Grid, FEM, Multi-block grid, particle simulation, …)
Visualization of flow around a baseball - Collaboration with Physical & Chemical Res. Inst., Japan Applications Computation: Finite Difference Method Unsteady, incompressible, viscous Flow Number of Grids: 169 * 92 * 101 Reynolds number: 100000 -- 200000 Ball Speed: 75~150km/h Computation timing data (10000 time steps): Solver only (no visualization) 27150sec (7.54h) + Visualization with same viewing: 28000sec (7.78h) + Visualization with variable viewing: 28150sec (7.82h) ---> Almost no additional CPU time required for visualization because of high-speed visualization on the SX Series # Computation on SX-5S1 (4GFlops) # Visualization every 10 time steps (contour and tracer) # Tracer movement calculated at each time step
Post-processing with RVSLIB- Collaboration with BoM/Australia - Server SX-4/32 User’s solver RVSLIB Server RVSLIB Client Numerical Weather Prediction Offline visualization Compressed image data NetCDF format files
Bureau of Meteorology (Australia) Oceanic circulation simulated with ACOM2 Atmospheric simulation -- Relative humidity around Australia represented by isosurfaces
On-going & Future enhancements • MPI-based performance optimization • Hierarchical data structure for visualization of huge data combined with wavelet transformation • Inter-server collaboration • Active visualization - Automatic extraction of specific features from data - Visualization combined with data mining
Collaboration Tools Data Mgmt Tools Distributed simulation . . . Information services Resource control Fault detection . . . Remote access Remote monitoring Needs for Grid Services Toward Global Computing Environments One Single Machine Never Fits All ... net
sx-5@sxsmd.ho.nec.co.jp SX-5 Series: http://www.sw.nec.co.jp/hpc/sx-e/index.html RVSLIB: http://www.sw.nec.co.jp/APSOFT/SX/rvslib_e/