350 likes | 822 Views
Cluster Management the IBM Way. Background ScotGRID project IBM Management Solutions Technologies Boot, Installation, Server, IBM x330 server, KVM, Console Server, Raid,Vlan Procedures putting the hardware together installing RedHat. ScotGRID. http://www.scotgrid.ac.uk/ JREI funded
E N D
Cluster Managementthe IBM Way • Background • ScotGRID project • IBM Management Solutions • Technologies • Boot, Installation, Server, IBM x330 server, KVM, Console Server, Raid,Vlan • Procedures • putting the hardware together • installing RedHat HEP SYSMAN 26th Nov 2002
ScotGRID • http://www.scotgrid.ac.uk/ • JREI funded • ScotGRID is a £800k prototype two-site Tier 2 centre in Scotland for the analysis of data primarily from the ATLAS and LHCb experiments at the Large Hadron Collider and from other experiments. The centre currently consists of a 118 CPU Monte Carlo production facility run by the Glasgow PPE group and a 5TB datastore and associated high-performance server run by Edinburgh Parallel Computing Centre. HEP SYSMAN 26th Nov 2002
ScotGRID-Glasgow HEP SYSMAN 26th Nov 2002
IBM Management Suites Xcat is set of scripts tables and goodies written to help install a Beowulf Cluster the IBM Wayhttp://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246623.html?Open CSM is alternative suite of software from IBM to install and manage a cluster of PC servers. It seems to have its origin in the RS2000/AIX world http://publib-b.boulder.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246601.html?Open IBM seem to have other management software with titles including words like Tivoli, Director. HEP SYSMAN 26th Nov 2002
Booting - PXE • http://support.intel.com/support/network/adapter/pro100/bootagent/index.htm • http://syslinux.hackerdojo.com/pxe.php • ftp://download.intel.com//labs/manage/wfm/download/pxespec.pdf HEP SYSMAN 26th Nov 2002
PXE • PXE is Intel’s Pre-eXecution Environment and includes a dhcp/tftp loader in PROM on an Ethernet Card. It serves much the same role as MOP on Vaxstations and Decstations doing diskless booting. • The Xcat solution uses PXE to load PXELINUX - a network variant of SYSLINUX as used on RedHat CDs HEP SYSMAN 26th Nov 2002
PXE • PXELINUX can load a kernel either over the network or from the local disk according to a tftp-ed configuration file. • IBM suggest a boot order - floppy,CD,net,disk • In Xcat, network loading is mainly used to initiate kickstart installation and the local disk load is used for normal operation. • (Re)installation is just a matter of changing the configuration file. HEP SYSMAN 26th Nov 2002
Automated Linux installation • kickstart • http://www.lcfg.org/ • http://www.sisuite.org/ • http://www.openclustergroup.org/ HEP SYSMAN 26th Nov 2002
kickstart • RedHat’s scheme to automate installation • installation parameters in file as • file on bootnet floppy • nfs accessible file named with ip number • http ….. • possibility of scripts to customise installation after the RedHat code has finished • Xcat calculates kickstart files for compute, head and storage nodes from local configuration tables and locally modified templates. HEP SYSMAN 26th Nov 2002
Servers • Redundancy - fileservers have dual power supplies, RAID • error checking - ECC • reduce chances of undetected data corruption • CDF had an incident recently that proved to be undetected corruption of data in one system • mountable in 19 inch rack • Blades (like PCs as CAMAC Modules) are now supported by xcat • http://www.pc.ibm.com/uk/xseries/bladecenter.html HEP SYSMAN 26th Nov 2002
IBM x330 Server X330 server is 1U high ( 1U = 1¾ inch and a full rack is 42U ) HEP SYSMAN 26th Nov 2002
(Remote) Support Processors • Each system has a PowerPC support Processor to power on/off main processor, do monitoring,... • Support processors are daisy chained with RS485 to a card with serial and Ethernet connections • Remote access via Ethernet and serial with snmp, telnet, http and proprietary protocols HEP SYSMAN 26th Nov 2002
KVM Switch • in a box - ScotGRID has an 8 port KVM switch from Apex (www.apex.com) • integrated C2T - the IBM x330 compute nodes do not have conventional Keyboard,Video and Mouse connections - they have part of a KVM switch onboard and a pair of connectors to daisy-chain a rack’s worth of servers into a distributed KVM switch HEP SYSMAN 26th Nov 2002
KVM Switch box • On screen control by pressing “PrintScreen” • >1 port for real K, V and M • Cascadeable in organised manner HEP SYSMAN 26th Nov 2002
KVM Switch integrated An adapter cable connects keyboard, video and mouse to first system and short interconnecting cables daisy-chains the rest of the rack’s worth of servers into a distributed KVM switch. Switching is effected by keyboard shortcuts or a button on the front of the server HEP SYSMAN 26th Nov 2002
Console Servers • conserver package http://www.conserver.com • Terminal Servers in reverse • IBM Supplied 4 Equinox ELS16 • accept incoming telnet HEP SYSMAN 26th Nov 2002
Serial Line access • Terminal Server lines connect to COM1 • talks to support processor when server unbooted • is /dev/ttyS0 when linux is running • Conserver establishes telnet to all COM1 ports and multiplexes access to multiple linux clients - guess this is important to avoid finding access blocked. HEP SYSMAN 26th Nov 2002
Raid • remote management of the IBM ServeRAID controllers using RaidMan software • configuration • monitoring • the remote agent component seems to provoke crashes of kernel 2.4.9-34 HEP SYSMAN 26th Nov 2002
Switches • IBM supplied 3 x 48 port Cisco Catalyst Switches and 1 x 8 port Gigabit switch to interconnect the file servers, other switches and the Campus Backbone • The switches divide the ports between an Internet accessible VLAN and a private 10.0.0.0 VLAN for the compute nodes and management units • The management interface is via the serial port, telnet and http HEP SYSMAN 26th Nov 2002
Cisco Catalyst 3500XL Switch • Sockets on right take GBICs • ScotGRID-Glasgow has 3 of 48 port switches and a single unit with 8 GBIC sockets • Separate VLANs for private 10.0.0.0 and Internet. ( GBIC = GigaBit Interface Converter Cisco Gigabit Transceiver to various fibres and 1000baseT) HEP SYSMAN 26th Nov 2002
ScotGRID-Glasgow • Masternode and Fileserver nodes on left • Compute Nodes in middle and right with headnodes low down • Daisy chained KVM and Support Proc network • Cascaded starwired Ethernet and serial lines HEP SYSMAN 26th Nov 2002
Putting it together Masternode Storage Nodes Head Nodes Campus Backbone Internet VLAN 10.0.0.0 VLAN 100 Mbps 1000 Mbps Compute Nodes HEP SYSMAN 26th Nov 2002
Compute Nodes Console in Console out eth0 ethernet Keyboard/Video/Mouse 2 Pentium III Support Processor COM1 Remote Support Adapter eth1 Serial RS485 in RS485 out HEP SYSMAN 26th Nov 2002
Configure VLANs Install RedHat and xcat manually on masternode Tabulate topology Update BIOS Configure Terminal Servers Software HEP SYSMAN 26th Nov 2002
Cisco boxes use a serial port to start configuration Use CISCO flavoured RJ45 <-> 9 Pin D type adapter cu - l /dev/ttyS0 -s 9600 set ip address; enable telnet,... ongoing management via telnet/tftp/http VLANs HEP SYSMAN 26th Nov 2002
Install everything off CD + errata Fail to spot unconfigured mailman rpm - eventually exhausted inodes with 95,671 empty error logs /etc/syslog.conf # 1/10/02 Trick learned from IBM - Ctl Alt F12 displays /dev/tty12*.info;mail.none;news.none;authpriv.none /dev/tty12 RedHat HEP SYSMAN 26th Nov 2002
Cisco3500.tabstorage1 cisco1,1storage2 cisco1,2storage3 cisco1,3node01 cisco2,1node02 cisco2,2………………… Tables describe the connections, ip networking parameters, selectable options # file of Global Settings for ScotGRID cluster 21/5/02 DJM # # r-series commands pathnames - actually ssh equivalents rsh /usr/bin/ssh rcp /usr/bin/scp # # Absolute location of the SSH Global Known Hosts file that contains the ssh pub lic keys for all the nodes gkhfile /usr/local/xcat/etc/gkh # # Directory used by the tftp daemon to retrieve 2nd stage boot loaders …………………………. Tables HEP SYSMAN 26th Nov 2002
Boot floppy and use CD in every Box BIOS HEP SYSMAN 26th Nov 2002
Use serial port to start configuration Use Equinox flavoured RJ45 <-> 9 Pin D type adapter cu -l /dev/ttyS0 -s 9600 Setup script over /dev/ttyS0 Ongoing management via telnet Terminal Servers HEP SYSMAN 26th Nov 2002
cause compute nodes to remotely boot kernel and application to send packets out ethernet ports and emit MAC addresses on COM1 collect MAC Addresses from serial lines or via Cisco management interface using topology tables and MAC addresses to calculate dhcpd.conf Harvest MAC Addresses HEP SYSMAN 26th Nov 2002
The RSA Ethernet <-> RS485 boxes can live in their own box or in a spare PCI slot - as in ScotGRID-Glasgow boot utility in hosting server and set ip address xcat script uses tables to locate and configure the RSA over ethernet Configure Remote Support Adapters HEP SYSMAN 26th Nov 2002
calculate kickstart file reboot compute node press button xcat command : rpower noderange boot last stage of kickstart post-installation script resets booting to local disk Compute Node Installation HEP SYSMAN 26th Nov 2002
OpenPBShttp://www.openpbs.org MAUI schedulerhttp://www.supercluster.org plugin scheduler for PBS/WIKI/Loadleveler/Sun’s Grid Engine able to organise parallel jobs using >1 cpu, perhaps slight overkill for us lots of control - per user, per group, per job length, target fair share, time waiting,……... Batch HEP SYSMAN 26th Nov 2002
Snmp mainly from support processors via syslog to warning emails psh noderange command parallel ssh party trick : psh compute eject xcat rcommands Ongoing Management HEP SYSMAN 26th Nov 2002
rvitals [root@masternode martin]# rvitals node08 all node08: CPU 1 Temperature = 21.0 C (69.8 F)node08: CPU 2 Temperature = 21.0 C (69.8 F)node08: hard shutdown: 85.0 C (185.0 F)node08: soft shutdown: 80.0 C (176.0 F)node08: warning: 75.0 C (167.0 F)node08: warning reset: 63.0 C (145.4 F)node08: DASD 1 Temperature not available.node08: Ambient Temperature = 16.0 C (60.8 F)node08: System Board 5V: 5.06node08: System Board 3V: 3.29node08: System Board 12V: 11.85node08: System Board 2.5V: 2.63node08: VRM1: 1.78node08: VRM2: 1.78node08: Fan 1: 70%node08: Fan 2: 72%node08: Fan 3: 78%node08: Fan 4: 78%node08: Fan 5: 76%node08: Fan 6: 75%node08: Power is on.node08: System uptime = 3556node08: The number of system restarts = 139node08: System State: Currently booting the OS, or no transition was reported. HEP SYSMAN 26th Nov 2002