270 likes | 443 Views
Hardware. Hardware. So you want to build a cluster. What do you need to buy? Remember the definition of a beowulf cluster: • Commodity machines • Private cluster network • Open source software Two of these are hardware-related. Racks.
E N D
Hardware • So you want to build a cluster. What do you need to buy? • Remember the definition of a beowulf cluster: • • Commodity machines • • Private cluster network • • Open source software • Two of these are hardware-related THE MOVES INSTITUTE
Racks • You need some place to keep all the hardware organized. Racks are the usual solution. The things you put into racks are measured in something called “U”s. • The standard rack is 42 U’s tall--A U is a measure of vertical height, about 1.75”. • CPUs are often one or two U’s, ie they take up one or two slots on the rack. THE MOVES INSTITUTE
Racks • It’s easy to get a cable mess in the back of the rack. Aside from management, a cable mess can affect air flow. Be consistent about routing power and network cables. • Color-coded cables are an excellent idea; for example, make all private network cables one color, public network cables another • Use zip ties to group cables • Make sure you have enough space for air flow (typically front to back) THE MOVES INSTITUTE
KVM • Keyboard-Video-Mouse. While usually you want to remotely manage the cluster, you’ll often wind up attaching a monitor to a box to watch it boot or troubleshoot. It’s good to have a lightweight flat panel display, keyboard, and mouse you can attach as needed THE MOVES INSTITUTE
CPU: 64 vs. 32 Bit • Isn’t 64 bits twice as good as 32 bits? Not really. • The 64 bits usually refers to the amount of memory that can be addressed. With 32 bits you can ‘count’ up to 4 billion, which usually means you can’t have more than 4 GB of memory in one process. Because of the way VM is set up, 2 GB is a more typical maximum address space for 32 bit processes • 64 bits allows 4.5 Petabytes of memory address • This can be important if you have an application that breaks the 2 GB address space barrier • This is plausible now that many machines can be configured with more than 4 GB of real memory THE MOVES INSTITUTE
64 vs. 32 bit • 64 bit CPUs have bigger registers (high speed holding areas) but this often isn’t all that big of a deal. • Almost every 64 bit CPU has a compatibility mode that allows 32 bit applications to be run • To run in 64 bit mode your application will need to be compiled for 64 bits and link to 64 bit libraries. THE MOVES INSTITUTE
64 bit: The Dark Side • A pointer is just a piece of data that holds the address of something in memory • int count = 100; • int *aPtr = &count; 4 GB 0x2248 100 aPtr 0x2248 0 THE MOVES INSTITUTE
64 bit: The Dark Side • Since a pointer refers to a place in memory, it needs to be big enough to reference any place in the memory range. For 32 bit applications that is 4 bytes. For 64 bit applications that is 8 bytes, twice as big. That means code that contains 64 bit pointers will take up more space than code with 32 bit pointers. If you have a cache with a fixed size you can fit less 64 bit code into it 24 byte cache can hold six 32 bit pointers 24 byte cache can hold three 64 bit pointers p1 p2 p1 p3 p4 p2 p5 p6 p3 THE MOVES INSTITUTE
64 bit: Dark Side • The extra space that 64 bit code takes up in caches can reduce performance by increasing cache misses. For this reason you probably shouldn’t compile to 64 bits unless your application requires a large address space • (In fact under OS X the GUI libraries are all 32 bit, even though the underlying hardware may be 64 bit. Compiling them to 64 bit would have reduced performance.) THE MOVES INSTITUTE
Dual Core • CPU designers have been running up against a wall lately; they can’t increase clock speed as much as in the past • As a result they are adding more silicon, in the case of dual-core CPUs replicating two or more CPUs and cache on one chip • This doesn’t increase sequential speed, but can increase speed for multiple process or multiple thread programs THE MOVES INSTITUTE
CPUs • This is a major religious issue. The major contenders in the HPC Beowulf cluster space are • • AMD Opteron (64/32 bit) • • Intel Itanium (64/32 bit) • • Intel Xeon MP (Dual-core, EM64T) • • Intel Pentium M (32 bit, low power) • • PowerPC G5/970 (64 bit, OS X usually) THE MOVES INSTITUTE
CPUs • The Opteron is a strong choice. Good price/performance, good power consumption, good SMP capabilities • But some vendors (famously including Dell) don’t sell Opterons THE MOVES INSTITUTE
SMP • Each node in the cluster can be an SMP machine, for example a dual-processor 1U box. • If the CPU is dual core this can give 4 cores per box • Four and eight CPU SMP boxes are also available. More cores per CPU are likely in the future. THE MOVES INSTITUTE
Blades • Lots of people want extremely dense installations, with the most possible CPUs per square foot. • Blades are essentially hosts on plug-in cards. They’re inserted into a blade chassis. THE MOVES INSTITUTE
Blades • The blades have an interconnect to the chassis and can share some resources such as the power supply, CD and floppy, network cables, etc. • They may implement features such as hot swap • The downside is that they can be somewhat more expensive than plain 1U rackmounts, may be more proprietary, and are somewhat less flexible. • Also, while they take up less floor space, they still generate a similar amount of heat • IBM is a major blade vendor THE MOVES INSTITUTE
Memory • How much memory do you need on a compute node? • That depends on your application and current computer part economics. • Roughly 1/2 GB per GFLOP of speed seems to do OK. Opterons are at this writing roughly 3 GFLOP/processor, so about 3-4 GB per dual processor box. This may change with dual core CPUs. • You don’t want your application to page-fault. • If you’re running 64 bit that probably means you need an address space over 4 GB, so you may well need 4-8 GB or more of physical memory THE MOVES INSTITUTE
Virtual Memory Page faults happen when a page isn’t in working memory and has to be retrieved from disk. If you can fit the entire process into physical memory you can avoid page faults and speed up the process. Virtual Process Address Space (Often 4 GB) Process Working Set in Physical Memory Max size = Physical memory Size THE MOVES INSTITUTE
Disk • There are several schools of though about disk on compute nodes: • • Leave them diskless. • -Less heat, less to fail, single image on server • • Put a big enough disk in to swap locally • -Don’t have to swap/page across the network • • Put a big enough disk in to run the OS • -Don’t have to mess with net booting • • Put a big enough disk in to run the OS and keep some data cached locally • I favor the last method. Disk space is cheap, labor expensive THE MOVES INSTITUTE
CD • If you do go the disk-per-compute node route you should have at least a CDROM drive in order to boot from CD and install. • OS bloat has made it likely that a DVD may be required in a few years. THE MOVES INSTITUTE
Video • You should have at least an el-cheapo VGA card in order to hook up a monitor. • If you’re doing visualization work a high-end graphics card may be needed. Note that a fancy graphics card may consume more power and generate more heat. THE MOVES INSTITUTE
Networking • The compute nodes have to communicate with each other and the front end over the private network. What network should be used for this? • We want: • • High speed • • Low latency • • Cheap • Major technologies are gigabit ethernet, Myrinet, and Infiniband THE MOVES INSTITUTE
Networking • Myrinet is a 2 Gb/sec, 2-3 ms latency networking standard that uses multimode fiber • NIC price (as of this writing) is about $500, and a 16 port switch about $5K. THE MOVES INSTITUTE
Infiniband • Infiniband is all-singing, all-dancing, 10 Gb/Sec (for 4X rate), ~4 ms latency • Pricing seems to be similar to that of Myrinet. HP prices on the order of $1K for a PCI adapter, $10K for a 24 port switch. Other places are probably cheaper. • Often a cluster will have both an infiniband and an ethernet network, the Infiniband for MPI and the ethernet for conventional communications THE MOVES INSTITUTE
Gigabit Ethernet • 1 Gb/S, high latency (it goes through the TCP/IP protocol stack by default) • Advantage: dirt cheap, ubiquitous. Gbit ethernet is built into most server mobos by default, unmanaged L2 gigabit switches are at aprox $10/port • You can reuse existing expertise, cables, etc. • It’s pretty tough to argue against gigabit ethernet on an economics basis THE MOVES INSTITUTE
Front End • The front end is the cluster’s face to the rest of the world. It alone is connected to the public network. • It typically runs a web server, scheduling software, NFS, DHCP on the private network, a firewall, and a few other utilities • More memory than a compute node is good. More disk is good. (disk is the subject of another talk.) THE MOVES INSTITUTE
Price • Very roughly: • Compute nodes at $3K/each (dual opterons, 4 gig mem, 120 gig disk) • Front end at $4K • Rack at $2K • Small L2 GigE switch ($150) • For aprox $40K you can get a ~10 node cluster with ~20 CPUs that has a peak performance of around 30-50 GFLOPS. • (this pretty much ignores disk space) THE MOVES INSTITUTE