350 likes | 575 Views
Swiss-Tx. Swiss-T1 : A Commodity MPI computing solution. Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne. Mars 1999. Swiss-Tx. Swiss-T1 : A Commodity MPI computing solution. Content: Distributed Commodity HPC Characterisation of machines and applications Swiss-Tx project. March 2000.
E N D
Swiss-Tx Swiss-T1:A Commodity MPI computing solution Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne Mars 1999
Swiss-Tx Swiss-T1:A Commodity MPI computing solution • Content: • Distributed Commodity HPC • Characterisation of machines and applications • Swiss-Tx project March 2000
Swiss-Tx Past : SUPERCOMPUTER Manufactures What happened Why it happened Cray Research Convex Connection Machines KSR Intel Paragon Japanese companies Teracomputers Taken over by SGI Taken over by HP Disappeared Disappeared Stopped supercomputing Still existing (not main) Develop since 6 years Produced own processors Developped own memory switches Needed special memories Developped own operating system Developped own compiler Special I/O : HW and SW Own communication system July 1998
Swiss-Tx Processor performance evolution July 1998
Swiss-Tx SMP/NUMA Manufacturer Parallel server Present situation DIGITAL SUN IBM HP SGI ….. Wildfire Starfire SP-2 Exemplar Origin 2000 ….. Off the shelf processors Off the shelf memory switches Off the shelf memories Special parts of operating system Special compiler extensions Special I/O and SW Own communication system What is the trend ? July 1998
Swiss-Tx Commodity Computing (MPI/PCI) PC clusters/Linux: Fast Ethernet: Beowulf SOS cooperation (Alpha): Myrinet/DS10: C-Plant (SNL) T-Net/DS20: Swiss-T1 (EPFL) Customised commodity: Quadrics/ES40: Compaq/Sierra Off the shelf processors Off the shelf memory switches Off the shelf memories Off the shelf local I/O HW and SW Off the shelf operating systems Off the shelf compilers New communication system New distributed file/IO system March 2000
Swiss-Tx 4th SOS workshop on Distributed Commodity HPC Participants: SNL, ORNL, Swiss-Tx, LLNL, LANL, ANL, NASA, LBL, PSC, DOE, UNM, Syracuse, Compaq, IBM, Cray, Sun, SME’s Content: Vision, Clusters, Interconnects, Integration, OS, I/O, Applications, Usability, Crystal ball March 2000
Swiss-Tx Distributed commodity HPC User’s Group Goals: Characterise the machines Characterise the applications Match machines to applications March 2000
Swiss-Tx Characterise processors, machines, and applications Performance Processors: Vmac Vmac= peak proc. performance/peak memory BW Parallel machines: gmac gmac = effective proc. perf./effective network perf. Applications:gapp gapp = operation count/words to be sent
Swiss-Tx In a box: Vmac values Vmac = R¥[Mflop/s] / M¥[Mword/s] Table: Vmac values for Alpha 21164 and 21264 boxes and NEC SX-4 Machine N R¥ M¥Vmac Alpha server 1200 2 2133 138 15 DS20 2 2000 667 3 DS20+ 2 2667 667 4 NEC SX-4 1 2000 2000 1 15 juin 1998
Swiss-Tx Between boxes: gmac value gmac = N * R [Mflop/s] * <d> / C [Mword/s] Table: gmac of different machines Machine Type Nproc Peak Eff perf Eff bw gmac Gravitor Beowulf 128 50 6.4* 0.064 100 Swiss-T1 T-Net 64 64 13 0.32 40 Swiss-T1 FE 64 64 13 0.032 400 Baby T1 C+PCI 12 12 2.4 0.072 30 Origin2K NUMA/MPI 80 32 9 1 9 NEC SX4 vector 8 16 8 6.4 1.3 Effective performance measured with MATMULT, * estimated. Effective bandwidth measured with point to point
Swiss-Tx The gapp value gapp = Operations/Communicated words Material sciences (3D Fourier analysis): gapp~ 50 Beowulf insufficient, Swiss-T1 just about right Crash analysis (3D non-linear FE): gapp> 1000 Beowulf sufficient, latency?
Swiss-Tx The gapp value for Finite Elements gapp = Operations/Communicated words FE: Ops ~ Nb of volume nodes Ops ~ Nb of variables per node square Ops ~ Nb of non-zero matrix elements Ops ~ Nb of operations per matrix element FE: Comm ~ Nb of surface nodes Comm ~ Nb of variables per node FE: gapp~ Nb of nodes in one direction gapp~ Nb of variables per node gapp~ Nb of non-zero matrix elements gapp~ Nb of operations per matrix element gapp~ Nb of surfaces
Swiss-Tx The gapp value • Statistics for 3D brick problem (Finite elements) • Nb of Nb of Nb Mflop Mflop kB kB gapp • Subd Nodes interface /cycle /data /cycle /cycle • Nodes /proc transfer /proc • 1 5049 0 13.5 13.5 0.0 0.0 ¥ • 2 5202 153 13.5 6.8 7.2 3.6 15074 • 4 5508 459 13.5 3.4 21.5 5.4 5028 • 16 6366 1317 13.5 0.8 61.7 3.9 1755 • 32 6960 1911 13.6 0.4 89.6 2.8 1211 • 64 7572 2523 13.6 0.2 118.3 1.8 918 • 128 8796 3747 13.6 0.1 175.6 1.4 620 • Table: Current day case, 4096 elements
Swiss-Tx Fat-tree/Crossbars 16x16 N=8, P=8, N*P=64 PUs, X=12, BiW=32, L=64 March 2000
Swiss-Tx Circulant graphs/Crossbars 12x12 K=2 (1/3) N=8, P=8, X=8 BiW=8, L=16 K=3 (1/3/5) N=11, P=6, X=11 BiW=18, L=33 K=4 (1/3/5/7) N=16, P=4, X=16 BiW=32, L=64 March 2000
Swiss-Tx Fat-tree/Circulant graphs March 2000
Installation #P Peak Memory Disk Operating Date Place Gflop/s GBytes GBytes system 12.97 EPFL 8 Digital Unix 1** 2 64 8 10.98 EPFL 16 16 170 - Windows NT Digital Unix 8 8.99 EPFL 4.00 DGM 170 16 - 16 8 Tru64 Unix 1.00 EPFL Tru64 Unix 35 1** 70 950 70 Swiss-Tx The Swiss-Tx machines Machine Archive Connection TBytes system Swiss-T0 EasyNet bus FE bus Swiss-T0 * (Dual) EasyNet bus FE switch Baby T1* Crossbar 12x12 FE switch Swiss-T1 Crossbar 12x12 FE switch Swiss-T2 ? ? 504 1008 252 9000 - Not decided Crossbar 12x12 FE switch * Baby T1 is an upgrade of T0(Dual) ** Archive ported from T0 to T1 September 1998
Swiss-Tx Swiss-T1 March 2000
Swiss-Tx Swiss-T1 Components 32 computational DS20E 2 frontend DS20E 1 development DS20E 300 GB RAID disks 600 GB distributed disks 1 TB DLT archive Fast/Gigabit Ethernet Tru64/TruCluster Unix LSF, GRD/Codine Totalview, Paradyn MPICH/PVM T-Net network technology ( 8+1)12x12 crossbar 100MB/s 32 bit PCI adapter 75 MB/s (64 bit PCI adapter 180 MB/s) Flexible, non-blocking Reliable Optimal routing FCI 5 ms MPI 18 ms Monitoring system Remote control Up to 3 Tflop/s (g < 100)
Swiss-Tx Swiss-T1 Architecture March 2000
Swiss-Tx Swiss-T1 Routing table March 2000
Swiss-Tx Swiss-T1: Software in a Box *Digital Unix Compaq Operating system in each box *F77/F90 Compaq Fortran compilers *HPF Compaq High performance Fortran *C/C++ Compaq C and C++ compilers *DXML Compaq Digital math library in each box *MPI Compaq SMP message passing interface *Posix threads Compaq Threading in a box *OpenMP Compaq Multiprocessor usage in a box through directives *KAP-F KAI To parallelise a Fortran code in a multiprocessor box *KAP-C KAI To parallelise a C program in a multiprocessor box March 2000
Swiss-Tx Swiss-T1: Software between Boxes *LSF Platform Inc.Load Sharing Facility for resource management *Totalview Dolphin Parallel debugger *Paradyn Madison/CSCS Profiler to help parallelising programs *MPI-1/FCI SCS AG Message passing interface between boxes running over TNET *MPICH Argonne Message passing interface running over Fast Ethernet **PVM UTK Parallel virtual machine running over Fast Ethernet *BLACS UTK Basic linear algebra subroutines *ScaLAPACK UTK Linear algebra matrix solvers MPI I/O SCS/LSP Message passing interface for I/O MONITOR EPFL Monitoring of system parameters NAG NAG Math library package Ensight Ensight 4D visualisation MEMCOM SMR SA Data management system for distributed architectures Shmem EPFL Interface Cray to Swiss-Tx March 2000
Swiss-Tx Baby T1 Architecture March 2000
Swiss-Tx Swiss-T1 : Alternative network March 2000
Swiss-Tx Swiss-T2 : K-Ring architecture March 2000
Swiss-Tx Create SwissTx Company Commercialise T-Net Commercialise dedicated machines Transfer knowhow in parallel application technology
Swiss-Tx Between boxes: gmac value gmac = N * R [Mflop/s] * <d> / C [Mword/s] Table : The gmac values for Swiss-T0, Swiss-T0(Dual) and Swiss-T1 for MATMUL Machine N R¥ % N * R C <d> gmac T0 (Bus) 8 8000 5* 400* 4* 1 100 T0(Dual) (Bus) 8*2 16533 6* 1000* 4* 1 250 Baby T1 (Switch) 6*2 12000 20* 2400* 90* 1 27 T1(local) (Switch) 4*2 8000 20* 1600* 60** 1 27 T1(global) (Switch) 32*2 64000 20* 12800* 400** 1.25 40 T1 (Fast Ethernet) 32*2 64000 20* 12800* 80** 1 160 * measured (SAXPY and Parkbench) ** expected
Swiss-Tx Time Schedule 1st phase 2nd phase 1.6.98 1.11.99 31.10.00 1.1.00 1.1.98 1.1.99 Swiss-T0(Dual) 16 processors Windows NT Swiss-T0(Dual) 16 processors Digital Unix Baby T1 12 processors Digital Unix Swiss-T1 68 processors Digital Unix Swiss-T2 504 processors OS not defined EasyNet bus based prototypes T-Net switch based prototype/production machines March 2000
Swiss-Tx Phase I: Machines installed Swiss-T0: 23 December 97 (accepted 25 May 98) Swiss-T0(Dual): 29 September 98 (accepted 11 Dec. 98 / NT) Swiss-T0(Dual): 29 September 98 (accepted 22 Jan. 99 / Unix) Swiss-T1 Baby: 19 August 99 (accepted 18 Oct. 99 / Unix) Swiss-T1: 21 Jan. 2000 March 2000
Swiss-Tx Swiss-T1 Node Architecture Mars 1999
Swiss-Tx 2nd Phase Swiss-Tx: The 8 WPs Managing Board: Michel Deville Technical Team: Ralf Gruber Management: Jean-Michel Lafourcade WP1: Hardware development Roland Paul, SCS WP2: Communication software development Martin Frey, SCS WP3: System and user environment Michel Jaunin, SIC-EPFL WP4: Data management issues Roger Hersch, DI-EPFL WP5: Applications Ralf Gruber, CAPA/SIC-EPFL WP6: Swiss-Tx concept Pierre Kuonen, DI-EPFL WP7: Management Jean-Michel Lafourcade, CAPA/DGM-EPFL WP8: SwissTx Spin-off Company Jean-Michel Lafourcade, CAPA/DGM-EPFL March 2000
Swiss-Tx 2nd Phase Swiss-Tx: The MUSTs WP1: PCI adapter page table / 64 bit PCI adapter WP2: Dual processor FCI / Network monitoring / Shmem WP3: Management / Automatic SI / Monitoring / PE / Libraries WP4: MPI-I/O / Distributed file management WP5: Applications WP6: Swiss-Tx architecture / Autoparallelisation WP7: Management WP8: SwissTx Spin-off Company ;March 2000