190 likes | 286 Views
Cluster Computers. Introduction. Cluster computing Standard PCs or workstations connected by a fast network Good price/performance ratio Exploit existing (idle) machines or use (new) dedicated machines Cluster computers versus supercomputers
E N D
Introduction • Cluster computing • Standard PCs or workstations connected by a fast network • Good price/performance ratio • Exploit existing (idle) machines or use (new) dedicated machines • Cluster computers versus supercomputers • Processing power is similar: based on microprocessors • Communication performance was the key difference • Modern networks (Myrinet, Infiniband) have bridged this gap
Overview • Cluster computers at our department • DAS-1: 128-node Pentium-Pro / Myrinet cluster (gone) • DAS-2: 72-node dual-Pentium-III / Myrinet-2000 cluster • DAS-3: 85-node dual-core dual Opteron / Myrinet-10G cluster • Part of a wide-area system: Distributed ASCI Supercomputer • Network interface protocols for Myrinet • Low-level systems software • Partly runs on the network interface card (firmware)
Node configuration • 200 MHz Pentium Pro • 128 MB memory • 2.5 GB disk • Fast Ethernet 100 Mbit/s • Myrinet 1.28 Gbit/s (full duplex) • Operating system: Red Hat Linux
DAS-2 Cluster (2002-2006) • 72 nodes, each with 2 CPUs (144 CPUs in total) • 1 GHz Pentium-III • 1 GB memory per node • 20 GB disk • Fast Ethernet 100 Mbit/s • Myrinet-2000 2 Gbit/s (crossbar) • Operating system: Red Hat Linux • Part of wide-area DAS-2 system (5 clusters with 200 nodes in total) Ethernet switch Myrinet switch
DAS-3 Cluster (Sept. 2006) • 85 nodes, each with 2 dual-core CPUs(340 cores in total) • 2.4 GHz AMD Opterons (64 bit) • 4 GB memory per node • 250 GB disk • Gigabit Ethernet • Myrinet-10G 10 Gb/s (crossbar) • Operating system: Scientific Linux • Part of wide-area DAS-3 system (5 clusters with 263 nodes in total),using SURFnet-6 optical network with 40-80 Gb/s wide-area links
DAS-3 Networks Nortel 5530 + 3 * 5510 ethernet switch 85 compute nodes 85 * 1 Gb/s ethernet Nortel 1 or 10 Gb/s Campus uplink 10 Gb/s ethernet 8 * 10 Gb/s eth (fiber) 85 * 10 Gb/s Myrinet 80 Gb/s DWDM SURFnet6 Nortel OME 6500 with DWDM blade Myrinet 10 Gb/s Myrinet 10 Gb/s ethernet blade Myri-10G switch Headnode (10 TB mass storage)
DAS-1 Myrinet Components: • 8-port switches • Network interface card for each node (on PCI bus) • Electrical cables: reliable links Myrinet switches: • 8 x 8 crossbar switch • Each port connects to a node (network interface) or another switch • Source-based, cut-through routing • Less than 1 microsecond switching delay
128-node DAS-1 cluster • Ring topology would have: • 22 switches • Poor diameter: 11 • Poor bisection width: 2
PC PC PC PC Topology 128-node cluster • 4 x 8 grid withwrap-around • Each switch is connectedto 4 other switchesand 4 PCs • 32 switches (128/4) • Diameter: 6 • Bisection width: 8
Myrinet interface board Hardware • 40 MHz custom cpu (LANai 4.1) • 1 MByte SRAM • 3 DMA engines (send, receive, to/from host) • full duplex Myrinet link • PCI bus interface Software • LANai Control Program(LCP)
Network interface protocols for Myrinet • Myrinet has programmable Network Interface processor • Gives much flexibility to protocol designer • NI protocol: low-level software running on NI and host • Used to implement higher-level programming languages and libraries • Critical for performance • Want few μsec latency, 100s MB/sec throughput • Map network interface (NI) into user space to avoid OS overhead • Goal: give supercomputer communication performance to clusters
Enhancements (see paper) • Optimizing throughput using Programmed I/O instead of DMA • Making communication reliable using flow control • Use combination of polling and interrupts to reduce message receipt overhead • Efficient multicast communication in firmware
Multicast • Implement spanning tree forward protocol on NIs • Reduces forward latency • No interrupts on hosts 1 2 3 4
Performance • DAS-2: • 9.6 μsec 1-way null-latency • 168 MB/sec throughput • DAS-3: • 2.6 μsec 1-way null-latency • 950 MB/sec throughput
MareNostrum: largest Myrinet cluster in the world • IBM system at Barcelona Supercomputer Center • 4812 PowerPC 970 processors, 9.6 TB memory