140 likes | 290 Views
Console Infrastructure in the CERN Computer Centre. Helge.Meinhard@cern.ch HEPiX / HEPNT Autumn 2003 Vancouver Mostly work done by Andras.Horvath@cern.ch. The problem. CERN CC is running large farms CPU servers: now 1500 boxes, 6000* in 2006 Disk/tape servers: now 300 boxes, 1200* in 2006
E N D
Console Infrastructure in the CERN Computer Centre Helge.Meinhard@cern.ch HEPiX / HEPNT Autumn 2003 Vancouver Mostly work done by Andras.Horvath@cern.ch
The problem • CERN CC is running large farms • CPU servers: now 1500 boxes, 6000* in 2006 • Disk/tape servers: now 300 boxes, 1200* in 2006 *) Error bar: ~ factor 2 • Attempt at high-level management solution: ELFms T. Kleinwort • Low-level problems • E.g. machine unpingable • Console access and/or reset required HEPiX Vancouver: Console management at CERN
Existing solutions… … do not scale HEPiX Vancouver: Console management at CERN
Requirements • Considered systematically in summer 2003 • Main points: • Remote console access • To boot loader and operating system (Linux) • Preferably to BIOS as well • Remote reset • ATX reset and/or ATX power on/off and/or • Remote power cycling HEPiX Vancouver: Console management at CERN
Options (1 CHF = 0.75 USD = 0.65 EUR = 1 CAD)† : yes, but… HEPiX Vancouver: Console management at CERN
Serial daisy-chaining Up to 4 nodes BIOS, boot loader, OS Console: minicom But few boards come with two serial lines these days… Remote reset port 0 port 0 port 0 port 0 port 1 port 1 port 1 port 1 Prototypes HEPiX Vancouver: Console management at CERN
Decisions • Infrastructure for serial console via serial cards in PCs to be deployed • Nothing else for now (no remote reset etc.) • 24 x 7 operator coverage can step in • Many services are redundant • Specs for all new servers require support for • Redirection of BIOS to serial line… • and controllable system behaviour (stay off vs. previous state) on power cycle HEPiX Vancouver: Console management at CERN
Serial infrastructure: head nodes • Dedicated head nodes vs. worker nodes serving as heads for small number of peers + Cleaner – all worker nodes remain the same + Can be used for other head node applications (e.g. software distribution) if desired – Extra investment, extra space – If down, larger number of machines inaccessible via serial console • Decided in favour of dedicated head nodes HEPiX Vancouver: Console management at CERN
Concentration factor, scope • Head nodes equipped with 6 8-port cards • Complete head node (w/o serial cables) is about 1800 CHF • By far cheaper than higher number of ports per console server, even though more console servers needed • Will equip all CERN computer centre • Machine rooms on ground floor and basement • Except Windows machines, machines dedicated to network services • Procurement running for 75 head nodes • Cross-connection of head nodes not decided yet • Some free ports on head nodes HEPiX Vancouver: Console management at CERN
Software • Need a bit more than minicom • Logging into one of ~75 servers and requesting /dev/ttyS25 not going to scale • Authentication and authorisation • Logging of console output • Started prototyping our own solution (Andras Horvath / CERN) • Put on hold when we learned (at HEPiX Amsterdam) of … • Software by Chuck Boeheim (SLAC) used at SLAC, Fermi, LBL, … • Provides most of the functionality we require • CERN-specific extensions can be easily added (wrapper scripts) • Constructive discussions with Chuck, expect to share the work • Aim is one common code base HEPiX Vancouver: Console management at CERN
lxplusnnn User app Console server 1 Console server 75 conf conf Server proc Server proc log log Software schematics • CDB – config service • Machine – port @ head node mapping • User – machine authorisations pcitfionnn xxx RS/232 Machine 1.1 . . . . Machine 1.44 … Machine 75.1 . . . . Console logrepository HEPiX Vancouver: Console management at CERN Machine 75.44
Software components • User application • Should run on all on-site Linux machines; Windows, Solaris? • Console application on head nodes • Grants and logs access to serial lines • Logs console output • Configuration service • Machine – port @ head node mapping • User – machine mapping (authorisation to access serial line) • Store for console logs • Nothing on machines… HEPiX Vancouver: Console management at CERN
Software: TBD • On our wishlist: • Authentication of head node towards user app, and of user towards server process on head node • Per-line control of access right • (Possibility of) logging via syslog • CERN-specific extensions being designed • Machine detection, feedback to config service • Wrapper around user app asking config service to provide mapping of machine to (port @) head node • Automatic creation of local config files on head nodes • Collection of console logs in central repository HEPiX Vancouver: Console management at CERN
Status, outlook • HW: Orders for head nodes, serial cards, cables out or being finalised • Expected delivery: 2H November 2003 • SW: Started discussing and investigating adaptations, CERN-specific elements being designed • Hope to have first head node ready in time for next disk server delivery (early December; no KVM switches!) • Full deployment will run well into 2004 HEPiX Vancouver: Console management at CERN