750 likes | 908 Views
Planning and building linux based cluster for NWP. Climatological Research Institute (CRI Cluster). Dr. Jamali Chezgi. Outline. Introduction Our problem Our solution Building CRI Cluster Monitoring and controlling Benchmarking Feature plans references. Simulation. Nature.
E N D
Planning and building linux based cluster for NWP Climatological Research Institute (CRI Cluster) Dr. Jamali Chezgi
Outline • Introduction • Our problem • Our solution • Building CRI Cluster • Monitoring and controlling • Benchmarking • Feature plans • references
Simulation Nature Theory Experiment Introduction
Environment / Climate / Weather • Aeronautics and space exploration • Energy research • Virtual reality • Scientific visualization • Health sciences
Make observation Collect and process data Run forecast model Create product Provide for end users
Main issues • Very large data sets • Distributed data • High processing required • Need to real-time processes • Coupled models
Our problems • Data management • Lisa • NWP models • ARPS • MM5 • HRM • Climatological models • NCM
NWP models • ARPS • MM5 • HRM
ARPS • Advanced Regional Prediction System • Open source • Parallel code • Running on the all unixes • IBM RS/6000 Workstation • Cray C-90 • Cray T3D • Cray J90 • CM-5 • PC LINUX
ARPS Model Process Flow chart Indexed terrain elevation file ( 1°,5 min,or 30 sec ) ARPSTERN ( Terrain data preprocessor ) Arpstern.input User-supplied gridded data (e.g.OLAPS.NMC analysis ) ARPSSFC ( Surface characteristics data preprocessor ) Soil .vegetation type and other land-use data EXT2 ARPS ( Gridded data interpolater) Arps40.input Arps40.input Rawinsondes,VAD. And wind profilers ARPS Data Assimilation System ARPSRETRV )Doppler Radar Data Retrieval system( Doppler Radar Data Doppler Radar Data ARPS Analysis System Single-level data ARPS ( Main model driver ) Arps40.input ARPSPLT ( Vector graphics post-processor) ARPSCVT ( History data format converter ) Arpscvt .input Arpsplt.input Other post- processing tools Visualizationpackages ( Savi3D,AVS etc )
Our solution Memory: using bigger memory ? CPU: using better CPU ? Cluster: for powering Memory and CPU
Prebuilded clusters? • direct relation between technology and end user • Customize it for our users • obtaining this technology • Better use • We can upgrade it • Lower costs • Samples on the world
OU Cluster • Breakdown of Nodes • 132 Compute Nodes (computing jobs) • 8 Storage Nodes (Parallel Virtual File System) • 2 Head Nodes (login, compile, debug, test) • 1 Management Node (PVFS control, batch queue) • Each Node • 2 Pentium4 XeonDP CPUs (2 GHz, 512 KB L2 Cache) • 2 GB RDRAM (400 MHz, 3.2 GB/sec) • Myrinet-2000 adapter
Cluster room • Space • Packing • Power • Air condition • Easily repairing • Security • Cabling
Linux true multitasking virtual memory shared library demand loading shared copy-on-write executables proper memory management TCP/IP networking Up to 64 GB memory support in i386 IP Virtual server Support Virtual server via NAT Virtual server Tunneling Virtual server direct routing Vlan Fast Switching Bonding driver Eql 386/486 based pc, ARMS, DEC, ALPHA, SUN sparc, M 68000, MIPS, PowerPC, …
Communication protocols • Internet protocols • Low latency protocols • Active messages • Fast messages • VMMC • U-net • BIP
TCP/IP problems for clustering • Latency • for small packets • Bandwidth • for big packets
Protocol overhead system NIC 3) copy 5)Send to NIC 1)Preparing data Internal buffers Os memory User memory User process 2)Sending intrupt 4)Intrupt to sending out data OS
Cluster computing standards • VIA • Combination of the many protocols • Like U-net uses virtual network interface • native and emulated • A version of the emulated VIA has more performance than TCP/IP • MPICH over VIA • Infiniband • Compaq dell HP IBM intel microsoft sun • Replace the shared I/O with a high speed serial,channel based,message passing ,scalable ,switched fabric. • Using HCA and TCA to connect the channel • Uses Six type transfer method: reliable and unreliable connections and datagrams,multicast connections,raw packets • Support DMA • IPv6
Hardware products • Ethernet fast ethernet and gigabit ethernet • Giganet(cLAN) • Myrinet • Qsnet • ServerNet • SCI(Scalable Coherent Interface) • ATM • Fiber Channel • HIPPI • Reflective Memory • ATOLL
Installing and configuring • Installing server • Building services • Auto installing clients • Auto configuring clients • Management of the nodes
NIS configuration In the server 1) Specifying domain name # domainnam <DOMAIN_NAME> • Putting in the “/etc/Sysconfig/network” NISDOMAIN=<DOMAIN_NAME> 3) Specifying server name in “/etc/yp.conf ” : NISDOMAIN <DOMAIN_NAME> SERVER <SERVER_NAME> 4) Restarting daemons : # /etc/ rc.d/ ypserv restart # /etc/ rc.d/ypbind restart 5) Putting it in the init 6)Editing “/etc/yp/Makefile” • MERGE_PASSWD= FALSE TRUE • MERGE_GROUP=FALSE TRUE • delete netgrp from all options. 7)Bulding NIS Database : #/usr/lib/yp/ypinit -m 8) If you make any changes in the feature only run this # cd /var/yp; make
NIS configuration In the client 1) Specifying domain name # domainnam <DOMAIN_NAME> 2) Putting in the “/etc/Sysconfig/network” NISDOMAIN=<DOMAIN_NAME> 3) Specifying server name in “/etc/yp.conf ” : NISDOMAIN <DOMAIN_NAME> SERVER <SERVER_NAME> 4) Restarting daemons : # /etc/ rc.d/ypbind restart 5) Putting it in the init 6) Testing it with logging in with the server users
Monitoring and controlling 1)scripts: perl python bash 2) Prebuilded Webmin Scyld SCD
Hardware monitoring and control(IceBox) • Icebox management with hardware • monitor temperatures within nodes and remotely reset motherboards through internally placed probes • SNMP compliant • DHCP or static network configuration • NIMP (Network ICE Management Protocol) • SIMP (Serial ICE Management Protocol) • Out-of-band Serial Data Buffering • Accessible with several protocols (NIMP, SIMP, Null Modem, Telnet, SNMP, ClusterWorX) • Remote temperature monitoring of CPU temperatures • Remote Power Management • Power sequencing to start-up nodes • Optional cabinet temperature monitoring (eight sensors per ICE Box) • Node reset • Multiple ICE Boxes scale to support large clusters • Embedded CPU powered by Linux for stable run-time environment • Ability to easily and safely update ICE Box Operating System without cluster downtime
Security • SSH • PAM • Xinetd
Running ARPS • Fortran 77 compiler (GNU) • Pre processing data • BC and IC data from other models • Post processing tools (NCARG) • Running flowchart • Preprocessing (always one time) • splitting • Initializing • Boundary conditions • Running • Joining • Post processing (another computers)
800*400 200*200 800*800
10 km 3 km 1 km Grid computing? 1-Big domain low res coarse domain and better res 2-in data assimulation code goes to the near of data
Benchmarking • ARPS results • GMandel • BPS
Performance Utilities • AIMS - instrumentors, monitoring library, and analysis tools • MPE logging library and Nupshot performance visualization tool • Pablo - monitoring library and analysis tools • Paradyn - dynamic instrumentation and run-time analysis tool • SvPablo - integrated instrumentor, monitoring library, and analysis tool • VAMPIRtrace monitoring library and VAMPIR performance visualization tool • VT - monitoring library and performance analysis and visualization tool for the IBM SP
ARPS performance • Performance is better for larger domain per CPU • Because of the network limitation at the cluster and we need larger calculation per data transfer.
Model situation • 200*200 per processor • Prediction time = 60s • output = NONE • Dtbig = 6s • 1km * 1km * 500m grids
--200 * 200 per domain {200 x 200}-1 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.760000E+01s 1.40% Data output : 0.829005E+01s 1.53% Wind advection : 0.190701E+02s 3.52% Scalar advection: 0.397800E+02s 7.34% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.618995E+01s 1.14% Small time steps: 0.241000E+03s 44.48% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.874099E+02s 16.13% Comput. mixing : 0.352601E+02s 6.51% Rayleigh damping: 0.271003E+01s 0.50% TKE src terms : 0.287300E+02s 5.30% Bound.conditions: 0.220026E+00s 0.04% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.452400E+02s 8.35% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169800E+02s 3.13% Entire model : 0.541820E+03s 100.00% 0.541820E+03s
--200 * 200 per domain {400 x 200}-2 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.763000E+01s 1.41% Data output : 0.822997E+01s 1.52% Wind advection : 0.190600E+02s 3.52% Scalar advection: 0.402001E+02s 7.42% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.615997E+01s 1.14% Small time steps: 0.241520E+03s 44.56% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.872100E+02s 16.09% Comput. mixing : 0.351900E+02s 6.49% Rayleigh damping: 0.276001E+01s 0.51% TKE src terms : 0.285300E+02s 5.26% Bound.conditions: 0.240047E+00s 0.04% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451199E+02s 8.32% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.168399E+02s 3.11% Entire model : 0.542000E+03s 100.00% 0.542000E+03s
--200 * 200 per domain {400 x 400}-4 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.762000E+01s 1.40% Data output : 0.827001E+01s 1.52% Wind advection : 0.191300E+02s 3.52% Scalar advection: 0.404000E+02s 7.44% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.614000E+01s 1.13% Small time steps: 0.241750E+03s 44.53% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.874600E+02s 16.11% Comput. mixing : 0.351000E+02s 6.47% Rayleigh damping: 0.273998E+01s 0.50% TKE src terms : 0.285099E+02s 5.25% Bound.conditions: 0.249939E+00s 0.05% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451600E+02s 8.32% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169001E+02s 3.11% Entire model : 0.542850E+03s 100.00% 0.542850E+03s
--200 * 200 per domain {800 x 400}-8 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.758000E+01s 1.39% Data output : 0.827006E+01s 1.52% Wind advection : 0.190499E+02s 3.50% Scalar advection: 0.404402E+02s 7.44% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.619997E+01s 1.14% Small time steps: 0.242260E+03s 44.57% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.873999E+02s 16.08% Comput. mixing : 0.352699E+02s 6.49% Rayleigh damping: 0.271999E+01s 0.50% TKE src terms : 0.286100E+02s 5.26% Bound.conditions: 0.290039E+00s 0.05% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.451000E+02s 8.30% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169199E+02s 3.11% Entire model : 0.543510E+03s 100.00% 0.543510E+03s
--- {(200-3)*4+3=791 or ~ 800 totally }-16 cpu-- ARPS stopped normally in the main program. The ending time was 60.000 seconds. Thanks for using ARPS. Process CPU time used Percentage ----------------------------------------------- Initialization : 0.762000E+01s 1.40% Data output : 0.820012E+01s 1.50% Wind advection : 0.191300E+02s 3.50% Scalar advection: 0.403599E+02s 7.39% Coriolis force : 0.000000E+00s 0.00% Buoyancy term : 0.615000E+01s 1.13% Small time steps: 0.243190E+03s 44.55% Radiation : 0.000000E+00s 0.00% Soil model : 0.000000E+00s 0.00% Surface physics : 0.000000E+00s 0.00% Turbulence : 0.880600E+02s 16.13% Comput. mixing : 0.354600E+02s 6.50% Rayleigh damping: 0.276005E+01s 0.51% TKE src terms : 0.287300E+02s 5.26% Bound.conditions: 0.309933E+00s 0.06% Gridscale precp.: 0.000000E+00s 0.00% Kuo cumulus : 0.000000E+00s 0.00% Kain-Fritsch : 0.000000E+00s 0.00% Warmrain microph: 0.455600E+02s 8.35% Lin ice microph : 0.000000E+00s 0.00% NEM ice microph : 0.000000E+00s 0.00% Hydrometero fall: 0.000000E+00s 0.00% Miscellaneous : 0.169700E+02s 3.11% Entire model : 0.545870E+03s 100.00% 0.545870E+03s
Gmandel-pvm benchmark calculating with: x1=-0.760416667 y1=-0.354166667 x2=-0.614583333 y2=-0.208333333 limit=1000000 wall time=97 secs. MFLOPS=19556.6 calculating with: x1=-2.000000000 y1=-2.000000000 x2=2.000000000 y2=2.000000000 limit=1000000 wall time=17 secs. MFLOPS=19461.0