320 likes | 461 Views
High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger Research Center Jülich GmbH R.Niederberger@fz-juelich.de Helmut Grund, Ferdinand Hommes, Eva Pless GMD - German National Research Center for Information Technology
E N D
High Speed Supercomputer Communications in Broadband Networks Ralph Niederberger Research Center Jülich GmbH R.Niederberger@fz-juelich.de Helmut Grund, Ferdinand Hommes, Eva Pless GMD - German National Research Center for Information Technology Helmut.Grund@gmd.de, Ferdinand.Hommes@gmd.de, Eva.Pless@gmd.de High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Introduction • Introduction • GTB West • Goals, Projects, Timeframes and Configuration • Super Computer Impediments and Solutions • Status of Cray Super Computer Communications • Future Tests • Summary High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Introduction • New kinds of Microprocessors and expansion of internal storage lead to new kinds of supercomputing systems solving best different kinds of problems. • Two mostly known types of supercomputers are massively parallel systems and vector systems. • A new kind of supercomputer is the Metacomputer. • A Metacomputer distributes an application onto 2 or more equal or distinct machines which are coupled dynamically via an external network. • This distribution may be done by quality (functional distribution) or by quantity. High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Introduction Distribution of an application onto more than one system only recommended, if computation time can be decreased significantly. This depends on degree of parallalization and time of communication between processes Communication time depends on: communication medium and protocol length of communication link number of intermediate systems performance of communicating systems (cpu, internal communication, ...) High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
GTB - West Project sponsored by BMBF and DFN with financial participation of the project partners Partners: Research Center Jülich GmbH http://www.fz-juelich.de GMD - German National Research Center for Information Technology http://www.gmd.de Deutsches Klimarechenzentrum http://www.dkrz.de Alfred Wegener Inst. for Polar & Marine Res. http://www.awi.de Pallas GmbH http://www.pallas.de o.tel.o http://www.o-tel-o.de Runtime: Aug, 1st 1997 - Jan, 31th 2000 More Info: http://www.fz-juelich.de/gigabit High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
GTB - West Projects • Giganet - Configuration, Management and Performance Analysis of the Gigabit Testbed • Methods and Tools, Software Support • Solute Transport in Ground Water • Algorithmic Analysis of Magnetoenzephalography Data • Complex Visualization over a Gigabit WAN • Multimedia applications in a Gigabit WAN • Distributed calculations of climate and weather models • Porting Parallel and Distributed Applications from CEC CISPAR Project High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
GTB West - Goals • Demonstrate the usefulness of high speed wide-area communication networks for scientific computing • Engage in selected applications which are known to need very high communication bandwidth • Major objective: • coupling of architecturally different supercomputers i.e. vector computers and massively parallel computers to build a new kind of metacomputer • strengthen the know how in • high speed computer communications, • metacomputing in LAN and WAN environments • coupling of the super computer centers in Germany High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Status (Phase 1) • Installation on top of o.tel.o high tension cables • most problems at last mile • at GMD underground workings necessary • at Research Center Jülich installation of fiber cables together with hot water supply • o.tel.o offers SDH infrastructure and uses Lucent technologies • no major problems using o.tel.o trunc lines have been found High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Repeater Trunc lines (Phase 1) High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Status • 622 Mbit/s link stable for one year • CRAY T3E Supercomputer connected with 155 Mbit/s ATM • FZJ GMD link upgrade (622 Mbit/s 2.4 Gbit/s): End of July 1998 • Aug. 5. 1998: • first ATM-WAN connection with 2.4 Gbit/s user data (8 Workstations with 155 / 622 Mbit/s interfaces) • 96.4% (TCP) • 99.97% (UDP) (high packet loss) • Beta test FORE ASX- 4000 ended • Beta test HiPPI to ATM gateway (SUN and SGI) ongoing • Throughput and delay measurement ongoing • Monitoring and accounting of trunc line with HP-OpenView at GMD High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
.... 10 Mbps external machine external machine external machine external machine external machine external machine external Ethernet 10 Mbps internal Ethernet .... .... .... Frame 3 Frame 1 Frame 2 SP2-Nodes: HP-Switch HP-Switch HP-Switch 800 Mbps 155 Mbps HIPPI Switch ATM Switch 622 Mbps 800 Mbps .... .... IBM SP2 internals High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
4 proc 4 proc 4 proc 4 proc 4 proc T3E-3D-Thorus T3E-processors: .... .... .... .... external machine external machine external machine external machine external machine external machine external machine external machine GigaRing GigaRing GigaRing communication nodes: FDDIEthernet 10 Mbps HIPPI ATM 800 Mbps 800 Mbps 155 Mbps FDDI Ring ATM Switch HIPPI Switch 100 Mbps 800 Mbps .... .... Cray T3E internals High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
User PE Support/OS PE Device-Treiber PE I/O Controller I/O Controller I/O Controller GigaRing GigaRing GigaRing MPN: Sbus-System with 200 MB D-MPN HPN MPN ATM ATM FDDI Ethernet Cray T3E internals (2) High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Impediments Current problem: Communication throughput within and between supercomputers differs extremly Example: Cray/T3E with internal communication throughput of 500 MB/s bidirectional into three dimensions (3D torus) High speed external connections: (Fast-) Ethernet (10-100 Mb/s), FDDI (100 Mb/s) , HiPPI (800 Mb/s-1600 Mb/s), Super HiPPI ( 6400 Mb/s ), ATM 155 Mb/s, 622 Mb/s - 2.4 Gb/s, Gigabit-Ethernet (1Gb/s), High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
CRAY/T3E 512 World Wide Internet Essential HiPPI EPS1004 CRAY/T3E 256 Cisco Router FDDI Concentrator CRAY/T90 JuNet CRAY/J90Compute Server Cisco Router 155 Mb/s ATM Connecting a Cray system with n systems 2 * n PVC entries CRAY/J90 File Server Cray SystemsNetwork Environment High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
PVC configurationnot recommended High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
High speed communication Alternatives communicating between CRAY/T3E and IBM/SP2 • rawHiPPI (800 Mb/s) • HiPPI Tunneling (622 Mb/s, currently MTU 9180) • HiPPI Sonet Extender (currently 155 Mb/s or 932 Mb/s) • TCP/IP via HiPPI (622 Mb/s, currently MTU 9180 because of routing) • nativeATM (155 Mb/s, 622 Mb/s) (Hardware ?, Software ?) • TCP/IP via ATM (155 Mb/s, 622 Mb/s) (Hardware ?) High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Throughput considerations • Transmission time in fiber optics cables tt = length of medium / (0,66 * c) with c = 300.000 km/s additionally delays in routers, switches etc. ttopt = 100 km / (0,66 * 300.000 km/s) = 1/2000 s = 0,5 ms • use path mtu discovery • apply socket buffers to bandwidth delay product • BDP = (B * RTT) = 622 Mb/s * 0.5 ms 311 kb 40 kB • use setsockopt to set: • SO_SNDBUF und SO_RCVBUF 1 MB • TCP_NODELAY=1 and TCP_WINSHIFT=4 High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Throughput considerations (2) High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Supercomputer - Impediments CRAY T3E communication throughput measured • Maximum of 115 Mb/s via TCP/IP over ATM MTU 9180 (Default MTU from standard) • Maximum of 430 Mb/s via TCP/IP over HiPPI MTU 64 KB because of IP-Header fields • Maximum of 530 Mb/s via raw HiPPI no real MTU limitation Netperf between SUN Ultra/60 and SGI Origin 200 maximum of 535 Mb/s user data via 622 Mb/s ATM High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Net topology GMD - FZJ IBM SP2 FZJ Fore ASX-1000 GMD 8 x SUN Ultra 2 2 Proc SUN Ultra 2 2 Proc SUN Enterprise 4000 HIPPI Switch Fore ASX-1000 SUN SUN atm- fore atm- sun Fore ASX-4000 Fore ASX-1000 Fore ASX-1000 Fore ASX-4000 SUN Ultra 60 2 Proc SGI O200 SUN HIPPI Switch Cisco LS1010 Cisco A 100 SGI atm- sun atm- fore Legende: 155 Mbit/s 622 Mbit/s CRAY T3E 512 Proc CRAY T90 16 Proc CRAY T3E 256 Proc SUN 800 Mbit/s atm- sun atm- fore 2,4 Gbit/s High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
IBM /SP2 CRAY/T3E Gigabit Testbed West HiPPI 800 Mb/s MTU 64 K HiPPI 800 Mb/s MTU 64 K ATM 622 Mb/s 64K MTU SGI/SUN HiPPI/PCI SUN HiPPI/Sbus 2.4 Gb/s ATM GMD ASX4000 ASX4000 FZJ Cisco Router Cisco Router ATM 155 / 622 Mb/s 9K MTU 110 km Gigabit Testbed WestNetwork Layout High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
ATM Switch Gigabit Tests July, 30th 1998 GMD 2.4 Gbit Interface filou SUN Enterprise 5000 2.4 Gbps 622 Mbps baloo SUN Ultra 60 ATM/SDH High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
FZJ GMD 622 Mbps 622 Mbps 3 * 622 Mbps 622 Mbps 2.4 Gbps ATM Switch ATM Switch 622 Mbps 622 Mbps ATM/SDH 622 Mbps 622 Mbps 155 Mbps 155 Mbps Gigabit Speed RecordAugust, 5th 1998 High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Gigabit Testbed WestConnecting CRAY T3E and IBM SP2 via separate network Problem: Interrupt rate of CRAY/T3E systems Solution: Create two logical networks upon one physical network • network 1 with 64k MTU between gateway systems (exact MTU 65280) as specified for CRAY systems on HiPPI networks • network 2 with 9.180 MTU between directly connected ATM systems Advantage: MTU-Path-Discovery on the end systems will find maximum value to use. MTU: 9180 4356 1500 9180 65280 High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
CRAY/T90 CRAY/T3E 256 CRAY/T3E 512 CRAY/J90 CRAY/J90 HPN1 HPN1 HPN1 HPN1 HPN1 192.168.115.26 (gmdsp2) 134.94.72.4 192.168.115.6 134.94.72.1 134.94.72.2 192.168.115.10 134.94.72.5 134.94.72.3 HiPPI-Switch 192.168.115.25 Parallel HiPPI card Ethernet module Serial HiPPI card 192.168.110.3 192.168.116.3 (gmdsun) 192.168.115.9 SGI O200 192.168.115.5 SUN Ultra 60 1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Fore ASX4000 Fore ASX4000 192.168.110.36 192.168.110.49 192.168.116.36 192.168.116.49 StatusCRAY HiPPI Testbed configuration High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
ATM Switch ATM Switch Communication nominal and real throughput Nominal: 800 Mbps 800 Mbps 622 Mbps 2.4 Gbps 622 Mbps 800 Mbps 800 Mbps CRAY T3E/256 FZJ GMD IBM SP2 H/A-router H/A-router CRAY T3E/512 HIPPI Switch HIPPI Switch CRAY T90 ATM/SDH Real: 430 Mbps 430 Mbps 530 Mbps 530 Mbps 530 Mbps 370 Mbps 370 Mbps High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
CRAY/T3E (256) CRAY/T3E (512) 430 (direct) 340 (gate) 350 (direct) 270 (gate) Parallel HiPPI 800 Mb/s MTU 64 K Ethernet module 430 370 350 315 1 0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 320 380 Serial HiPPI 800 Mb/s MTU 64 K Serial HiPPI 800 Mb/s MTU 64 K 440 250 535 SUN HiPPI/PCI SGI HiPPI/PCI ATM 622 Mb/s MTU 9180 or 64 K 415 Gigabit Testbed West TCP-Gateway-Layout (Beta-Tests in Jülich) High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Future TestsCRAY HiPPI Testbed configuration • Solve HiPPI problem. Using large MTU sizes (65280 kB) does not work correctly • Testing the other Cray Systems with HiPPI to ATM gateway (T90, J90) • Testing different configurations if testbed is available • using 2 HPN1 • using 2 Communication nodes within CRAY/T3E • using one Gateway for more than one machine • using same HiPPI device for local and remote communication • using multiple HiPPI devices for advanced throughput High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
multiple HiPPI multiple HiPPI ASX4000 ASX4000 : : ATM 622 Mb/s 4*ATM 622 Mb/s ATM 622 Mb/s 4*ATM 622 Mb/s ATM 2.4 Gb/s Internal communication: M1 Mm, N1 Nn External communication: Mm-k+1,Mm-k+2,..Mm (Multiplex of k HiPPI interfaces) IP over HiPPI IP over ATM IP over HiPPI Nn-j,Nn-j+1,...Nn (Multiplex of j HiPPI interfaces) Gateway Gateway Possible future test scenario High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Summary • No problems left with 2,4 Gbit/s ATM/SDH trunc line • Workstation systems can generate and transfer datastreams saturating a 622 Mbit/s ATM link • Coupling of supercomputer systems over WANs with high bandwidth currently only possible with an HIPPI to ATM gateway solution and special configuration But time is ready for gigabit transmissions. High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.
Summary • Applications are capable using gigabit networks. • Metacomputing may become reality in LAN as well as in WAN environments • Therefore supercomputer system designers have to prepare their systems with gigabit communication interfaces „The net is the computer and the computer is the net“ ((SuperComputer) Communications)!= (Super (ComputerCommunications)) High Speed Supercomputer Communications in Broadband Networks R.Niederberger@fz-juelich.de et al.