160 likes | 279 Views
Hostnames used in CERN IT data centres. AI forum 9 th of January 2014 Procurement team IT CF/FPP. Outline. Motivation Lessons from past Quattor Puppet s anitization opportunity Possible workarounds with host aliases. What we want to achieve. Three distinct and independent goals
E N D
Hostnames used in CERN IT data centres AI forum 9th of January 2014 Procurement team IT CF/FPP
Outline • Motivation • Lessons from past • Quattor Puppet sanitization opportunity • Possible workarounds with host aliases Hostnames used in CERN IT data centres - 2
What we want to achieve • Three distinct and independent goals • name is generated automatically by the node itself • name is unique forever • Name was not used before and will not be used again • name is useful for those who need to deal with hardware • On-site repair services at CERN & Wigner • Procurement team Hostnames used in CERN IT data centres - 3
Automated host naming • Goal: name is generated automatically by the node itself when it is powered up for the first time • It can’t be: • depending on its location (room, rack, U) • relating to any service (e.g. batch) it may host in future • relating to functional element (lxhwproc01..) • It can be: • completely random, and/or • based on some local feature that is somehow unique and can always be retrieved locally on the host, e.g. BMC Field Replaceable Unit (FRU) information, provided that: • It’s not overwritten • The repair technician has an established procedure to transfer original information if BMC is changed • Constructing the name from MAC address or invariable s/n of a component (e.g. mainboard) is not a good idea because they may change Hostnames used in CERN IT data centres - 4
All new deliveries (>2012) • Vendors required to • print custom labels with CERN order reference and an unique s/n • set BMC FRU ‘Product serial’ and ‘Asset Tag’ Output from ‘ipmitoolfru’ Sticker at rear of chassis Unfortunately ‘cd5151113-6n006225ts-2’ is too long for NETBIOS Compromise: ‘p05151113126469’, where ‘p’ for physical and last part is random Suggested by a CMS user: ‘p05151113e26469’ Random character (skip ‘l’, ’i’, ‘o’ and ‘z’) Hostnames used in CERN IT data centres - 5
Unique names • Goal: every hostname is unique forever • LANDB assures a registered name is unique at any point in time • However, nothing prevents reuse of names of decommissioned hosts • For instance: headnode of ‘pony’ service is always called ‘lxpony01’ and inherited when hardware is replaced • Most history records are keyed by (DNS) name • Can lead to serious confusions and as result historic data is unreliable and therefore useless for h/w problem analysis • Past events or incidents recorded in SNOW, Lemon, syslog, etc. may refer to different hardware Hostnames used in CERN IT data centres - 6
Useful names • Goal: name is useful for those who need to deal with hardware • Purchase order as part of the name allows convenient grouping • Technician can immediately tell from hostname which stock (out of 50+) to use for repairs • Failure analysis: systematic hardware issues are often related to a delivery • E.g. firmware bug or defective component batch • Example: p05153061300107crashed during power re-cabling at Wigner • Search for p05153061* in SNOW and Lemon gives all other nodes affected in the same batch Hostnames used in CERN IT data centres - 7
Failure analysis example Metric 6104: IPMI SEL Log Metric 9001: uptime Correlation of IPMI SEL entries with uptime <10000 Quick diagnose: on the almost identical batches p05153061* and p05153065* only nodes in the first batch crashed during the Wigner re-cabling intervention (a SNOW query gives same info) Hostnames used in CERN IT data centres - 8
Lessons from past • Cdb combined information from • Delivery spreadsheets (MAC, s/n) • Spreadsheets from the rack mounting (rack, U-pos) • EDH (contract id == purchase order) • CDB-SQL warranty table • LANDB (ip, gateway, netmask) • Hardware discovery (CPU, RAM, HDD, RAID,…) • LEAF tools and procedures for maintaining the consistency upon changes (e.g. rename) • Complex rollback when something failed in the middle • Risk for information degradation from software bugs and human errors Hostnames used in CERN IT data centres - 9
Quattor Puppet campaign • Moving hosts from Quattor to Puppet is an opportunity for patching information and restore consistency • Found so far: • 0.5% wrong s/n • 4% with missing interfaces (especially IPMI) in LANDB. Expect 10% for older deliveries • 0.5% hardware issues • 8 (out of1400) hosts with wrong location • Our conclusions: • A lot of information is manually gathered, entered and thoroughly checked once when equipment is received and installed • There is an inevitable risk for information degradation over time due to subsequent changes • Maximize automation and information discovery • Minimize the need for subsequent changes Hostnames used in CERN IT data centres - 10
Workaround with host aliases • Can’t use DNS to list aliases • DNS mapping is one-way: alias (CNAME) address (A) • There is nothing stored in DNS that goes the other direction for aliases • However, LANDB can • getMyDeviceInfo() SOAP call • Runs on local host and requires no authentication Hostnames used in CERN IT data centres - 11
LANDB & DNS Device name is unknown to DNS but will usually correspond to one of the interface names below Interface names are recorded in DNS address (A) records Aliases are recorded in DNS Canonical Name (CNAME) records Hostnames used in CERN IT data centres - 12
Adding an alias in LANDB Don’t remove the existing CD…-… alias when you add your own alias Hostnames used in CERN IT data centres - 13
Calling getMyDeviceInfo() Puppet uses Ruby so it’s installed by default… #!/usr/bin/ruby -w require 'soap/rpc/driver' NAMESPACE = 'urn:NetworkService' URL = 'https://network.cern.ch/sc/soap/soap.fcgi?v=5' begin $stderr.reopen("/dev/null", "w") driver = SOAP::RPC::Driver.new(URL, NAMESPACE) # Add remote sevice methods driver.add_method('getMyDeviceInfo') # Call remote service method for getting the device information myInfo = driver.getMyDeviceInfo() # Initialize the hostname to the device name, to have a fallback in case we can't identity a proper alias hostname = myInfo['DeviceName'] # Get aliases for the main interface for interface in myInfo['Interfaces'] if interface['Name'].downcase == hostname.downcase # We identify the main interface, that matching the device name if interface['IPAliases'] # Do we have any aliases? for ipAlias in interface['IPAliases'] if myInfo['SerialNumber'] == nil || ipAlias.downcase != myInfo['SerialNumber'].downcase # If the alias is not matching the serial number hostname = ipAlias # We take it as the hostname break # And we break out of the for loop end end end break # Once we got to the main interface we break end end puts hostname.downcase # We output the new hostname rescue => err puts err.message end Perhaps not so pretty but seems to work Could it be wrapped somehow into a Puppet fact adding aliases to /etc/hosts? Hostnames used in CERN IT data centres - 14
Use host alias example Hostnames used in CERN IT data centres - 15
Questions/comments Hostnames used in CERN IT data centres - 16