330 likes | 485 Views
Cluster Monitoring with EPICS and SNMP. Motivation. We wish to monitor the ALICE HLT analysis cluster – 500 PCs The analysis of data obtained from the ALICE experiment will take a long time, therefore a stable analysis cluster is needed
E N D
Cluster Monitoring with EPICS and SNMP CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Motivation • We wish to monitor the ALICE HLT analysis cluster – 500 PCs • The analysis of data obtained from the ALICE experiment will take a long time, therefore a stable analysis cluster is needed • To ensure stability, this cluster must be constantly monitored • Using the EPICS architecture with SNMP support it is possible to monitor such a PC cluster CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Contents • Cluster Management • SNMP • MIB Trees • SNMP Operations • Using data from SNMP • EPICS • Overview • Channel Access • Record Display • Device Support • devSNMP • Management Possibilities • Test Implementation • Overview • Software • Monitored Resources • Example Implementation • Extended Implementation • Extension Possibilities • Current State • Summary CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Cluster Management • Nowadays PC clusters are widely used for data analysis in many settings, such as in physics experiments or commercial organisations • These clusters often consist of hundreds to thousands of individual PCs (nodes) • In order to maintain a healthy, efficient cluster, key resources of the nodes must be monitored, eg: • Hard disk usage • Processor usage • Running processes, etc... • What is the best way of obtaining this information from the nodes? • Self monitoring? • Operating system logging? • SNMP? CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Simple Network Management Protocol • Simple Network Management Protocol (SNMP) is a management protocol for gathering statistical data about network/host traffic and the behaviour of network components • It is a telecom industry standard protocol and therefore most standardized organizations and main vendors support SNMP • It creates an extensive Management Information Base (MIB) on the host system, which is a database of information useful for network management • MIB objects are organised in a tree structure that includes public (standard) and private branches • These MIBs contain key system resource information which can be used for monitoring purposes CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
MIB Tree - Graphical View iso = 1 • MIB tree can referred to symbolically or numerically • Eg: iso.org.dod.internet.mgmt.mib-2.system.sysUpTime = 1.3.6.1.2.1.1.3 org = 3 dod = 6 internet = 1 mgmt = 2 private = 4 MIB-2 = 1 enterprises = 1 system = 1 ucdavis = 2021 sysDescr = 1 sysUpTime = 3 dskTable = 9 dskEntry = 1 dskTotal = 6 dskAvail = 7 CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
MIB Tree - Output View +--iso(1) | +--org(3) | +--dod(6) | +--internet(1) | +--directory(1) | +--mgmt(2) | | | +--mib-2(1) | | | +--system(1) | | | | | +-- -R-- String sysDescr(1) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -R-- ObjID sysObjectID(2) | | +-- -R-- TimeTicks sysUpTime(3) | | +-- -RW- String sysContact(4) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -RW- String sysName(5) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -RW- String sysLocation(6) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -R-- INTEGER sysServices(7) | | | Range: 0..127 | | +-- -R-- TimeTicks sysORLastChange(8) | | | Textual Convention: TimeStamp | | | | | +--sysORTable(9) | | | | | +--sysOREntry(1) | | | Index: sysORIndex | | | | | +-- ---- INTEGER sysORIndex(1) | | | Range: 1..2147483647 | | +-- -R-- ObjID sysORID(2) | | +-- -R-- String sysORDescr(3) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -R-- TimeTicks sysORUpTime(4) | | Textual Convention: TimeStamp | | | +--interfaces(2) | | | | | +-- -R-- Integer32 ifNumber(1) | | | | | +--ifTable(2) | | | | | +--ifEntry(1) | | | Index: ifIndex | | | | | +-- -R-- Integer32 ifIndex(1) | | | Textual Convention: InterfaceIndex | | | Range: 1..2147483647 | | +-- -R-- String ifDescr(2) | | | Textual Convention: DisplayString | | | Size: 0..255 | | +-- -R-- EnumVal ifType(3) CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
SNMP Operations - Overview • SNMP has simple client-server interactions with few operations to access information held in the MIB tree: • {Get} {Set} {GetNext} {Walk} {Table} {Trap} {Translate} • These operations can query local MIB trees, or those of networked machines SNMP Agent SNMP Agent Network SNMP Operation SNMP Agent SNMP Agent SNMP Agent Managed Device MIB MIB MIB MIB CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
SNMP Operations - Command Struct. • Typical SNMP {get} command structure: Operation Community PC to Query MIB Object to query • Output: MIB Object queried Object Type Object Value CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Using Data from SNMP • Once the information has been obtained from the MIB trees it must be fed into a control system for it to be useful in a management context • This might process the information, store it for later analysis, or simply display it using a Graphical User Interface (GUI) • Many systems currently exist: • EPICS • Ganglia • Lemon CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
EPICS - Overview • One such system is the Experimental Physics and Industrial Control System (EPICS) • www.aps.anl.gov/epics • It is currently in use in over 12 organizations to control devices in major projects such as Particle Accelerators, Telescopes, and Large Experiments • GSI, SLAC, ANL, DESY, LANL, ... • Therefore, huge support and knowledge base • It is based on a client/server network model, with servers holding information in Records which can be accessed by the clients CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
EPICS - Architecture EPICS Clients Network Record Field 1: x Field 2: y Field 3: z Record Field 1: x Field 2: y Field 3: z EPICS Servers CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
EPICS - Channel Access • Remote access to EPICS records is achieved through the Channel Access (CA) protocol • This requires a CA server to be running on the EPICS server, and a CA client to be running on the EPICS client • These are usually already integrated into EPICS clients/servers when they are created CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
EPICS - Architecture EPICS Clients CA Client CA Client Network Record Field 1: x Field 2: y Field 3: z CA Server Record Field 1: x Field 2: y Field 3: z CA Server EPICS Servers CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
EPICS - Record Display • The information from EPICS records can be displayed by a GUI: MEDM CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
EPICS - Record Display GumTree CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
EPICS - Device Support • Records can be interfaced to numerous devices • These devices can be hardware or software • Interfacing allows information from device to be input into EPICS records • This interfacing is known as device support CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
EPICS - Architecture EPICS Clients CA Client CA Client Network Record Field 1: x Field 2: y Field 3: z CA Server Record Field 1: x Field 2: y Field 3: z CA Server EPICS Servers Support Support CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Device Support for SNMP - devSNMP • devSNMP is the device support for SNMP • Allows the input of data from SNMP into EPICS records • Sets input field of a record to an SNMP {get} operation • It is configured for the open source product, NET-SNMP • This is simply one particular implementation of SNMP • www.net-snmp.org CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Device Support for SNMP - devSNMP • SNMP {get} command: • Record definition file: record (stringin, “System_Description"){ field (DTYP,"Snmp") field (INP,"@localhost public system.sysUpTime.0 STRING:100") field (SCAN,"5 second")} CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Management Possibilities • EPICS records are capable of carrying out simple calculations and conditionality relations – nothing very complicated • The data from SNMP can therefore be used to control other devices interfaced with EPICS records • One reaction possibility is an SNMP {set} operation, which writes values to a MIB • However, the current release of devSNMP supports only {get} operation • Other SNMP command support planned for the future CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Test Implementation - Overview • Carried out at the Linux PC Cluster at the Kirchhoff Institute for Physics, University of Heidelberg • 32 PCs running SuSE 9 Linux OS CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Test Implementation - Software • EPICS Servers: • 30 cluster nodes (2.4 and 2.6 kernels) running EPICS soft IOCs with devSNMP • NET-SNMP tool set and libraries installed on each node • EPICS Clients: • Two cluster nodes (2.6 kernel) running an installation of Motif Editor and Display Manager (MEDM) on an EPICS base CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Test Implementation - Architecture MEDM MEDM CA Client CA Client Record Inp: SNMP CA Server Record Inp: SNMP CA Server CA Server Record Inp: SNMP Network devSNMP devSNMP devSNMP SNMP Agent SNMP Agent SNMP Agent MIB MIB MIB CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Test Implementation - Info. Flow MEDM MEDM CA Client CA Client Record Inp: SNMP CA Server Record Inp: SNMP CA Server Record Inp: SNMP CA Server CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Test Implementation - Mon. Resources • Some resources monitored: • Hard disk partition usage (total, available, used, percentage used, alarm limit) • Avg CPU usage over 1 min • System up time (from SNMP daemon start) • Inbound Packet Errors • Uncast Outbound Packets • SNMP daemon process check CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Example Implementation - DESY • Currently EPICS with devSNMP is being used at DESY to monitor key switches and routers • Network Traffic • Status • Solaris and Linux PC clusters to be monitored in the future • In total around 25 managed devices, but this is increasing all the time • More information on EPICS/devSNMP at DESY: • http://www-mks2.desy.de/content/e4/e40/e41/e12212/index_ger.html CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Extension Possibilities • EPICS has limitations as a management system: • EPICS is a static system. • Records have limited analysis and reaction capabilities,in particular, no rule based events • For dynamic management we can forward information from EPICS records to an expert management system – SysMES (Camilo Lara, et al.) • Allows complex analysis and reaction to the data obtained from SNMP • Management system must have CA Client to communicate with EPICS records CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Current State • Interface between CA Client and SysMES has been written • Interface between the cluster monitoring systems LEMON and Ganglia have been defined and we are in the process of implementation CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Current State - Architecture SysMES Client MEDM MEDM Interface CA Client CA Client CA Client Record Inp: SNMP CA Server Record Inp: SNMP CA Server CA Server Record Inp: SNMP Network devSNMP devSNMP devSNMP SNMP Agent SNMP Agent SNMP Agent MIB MIB MIB CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Summary • SNMP: • Is the standard for network management in almost all modern networked devices (eg: PCs, work stations, bridges, switches, routers, ...) • Widely implemented protocol with a large knowledge base • Very low system resource usage • A lot of system information is stored in node MIB Trees (which SNMP can access) • EPICS: • Widely implemented control system with a huge support base • Allows input and output to a vast array of devices • Through device support for SNMP, these can be combined to create a monitoring system • This can be extended by forwarding the monitoring data to an expert management system (such as SysMES) CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
Thanks • Many thanks to all who have helped, but especially: • Camilo Lara Coordinator, KIP • Albert Kagarmanov devSNMP at DESY CBM Conference 2006 Cluster Monitoring with EPICS and SNMP
The End Thank you for your attention Any questions? CBM Conference 2006 Cluster Monitoring with EPICS and SNMP