1 / 15

Lemon Monitoring

Lemon Monitoring. Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, 24-26 May 2005. Outline. Lemon Structure and design How it works, deployment Use cases, web interface Installation and setup Summary.

Download Presentation

Lemon Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, 24-26 May 2005

  2. Outline • Lemon • Structure and design • How it works, deployment • Use cases, web interface • Installation and setup • Summary LCG Operations Workshop 24-26/05/2005 Bologna

  3. Lemon – LHC Era Monitoring • Lemon is a system containing tools for monitoring status and performance of computers: • Distributed monitoring system scalable to ~10k nodes • Provides active monitoring of software and hardware in the Computer Center on centrally managed clusters • Facilitates early error detection and problem prevention • Executes corrective actions and sends notifications • Provides persistent storage of the monitoring data • Offers a framework for further creation of sensors for monitoring • Site independent functionality • Link: http://cern.ch/lemon • Part of the ELFms toolsuite:http://cern.ch/elfms LCG Operations Workshop 24-26/05/2005 Bologna

  4. Lemon Use • It is used in-and-outside CERN by: • System administrators, service managers, cluster responsibles • Developers and service/data challenges • Managers and general users • Deploymentsoutside CERN: • EDG testbeds • Accelerator (AB) department at CERN • CMS online • GridICE • BARC India (development partner) LCG Operations Workshop 24-26/05/2005 Bologna

  5. Repository backend Prot RRDTool / PHP Correlation Engines SOAP SOAP apache TCP/UDP HTTP Monitoring Repository Monitoring Agent Nodes Lemon CLI Web browser Sensor Sensor Sensor User Lemon architecture LCG Operations Workshop 24-26/05/2005 Bologna

  6. Components • Lemon is a typical server/client application with following components: • MSA – Monitoring Sensor Agent (Lemon Agent) • Daemon on a client machine that spawns multiple Monitoring Sensors to measure data in defined intervals and sends data to Monitoring Repository • MS - Monitoring Sensor • Uses standard C++, perl API – it is easy to write your own sensor • Several sensors exist for performance, process, hw and sw monitoring, grid VO’s job reporting, database monitoring, security, alarms (total 260 metrics) • MR – Monitoring Repository • Server application that receives samples and processes/validates them • Stores the full monitoring history data • Two implementations - flat files or Oracle DB based • LRF - Lemon RRD Framework • Pre-processes data into rrd files and creates cluster summaries • These are used for web graphics • Provides service and cluster overview in its web displays • LAG – Lemon Alarm Gateway • Generic gateway for alarms (in development) • Gateways to MonALISA and GridICE exist LCG Operations Workshop 24-26/05/2005 Bologna

  7. Lemon at CERN • Lemon monitors about 2200 computers in ~100 clusters • On average it collects about 70 metrics from each host • Integrated with Sure alarm system • Collecting about 1.5 GB/day • LEAF (LHC-Era Automated Fabric) for high-level intervention scheduling Node Configuration Management Node Management • Configuration • Derived from the Quattor Configuration Database (CDB) • individual configuration per cluster/host • hierarchical structure • Alarm system • Sure – legacy system receiving alarms from Lemon • Integration with new LASER system (LHC alarm system) via LAG is ongoing LCG Operations Workshop 24-26/05/2005 Bologna

  8. Web interface • Cluster view displays accumulated statistics and status for all machines in the cluster • Host view gives overview of the host status with basic metrics • Other views available: • Rack view • Hardware type view • Other views can be added, working on user defined views • With the newest version (to be released soon): • Generic entry page displaying status overview of the key services • Configurable views • In development: database services monitoring with database specific view LCG Operations Workshop 24-26/05/2005 Bologna

  9. Use(ful) case Reboot occurrence history graph • Kernel upgrade • Kernel version is “measured” on the boot of the machine • Automatic tools for upgrading the kernel on a cluster retrieve information from Lemon and schedule reboot of a machine based on this info • Web interface allows monitoring of the progress LCG Operations Workshop 24-26/05/2005 Bologna

  10. Computer Center display • Lemon Web Interface can be interfaced with a Computer Center database of objects (racks, silos, …) • Provides search of objects as well as listing • Interfaced through a XML defined geometry of the computer center • Generic design that can be used anywhere: • <?xml version="1.0" ?> • <CC> • <ROOM ID=“0513-S-0034" DESCRIPTION=“Tape Vault" R="0" G="0" B="0"> • <DOORS R="0" G="255" B="0"> • <DOOR X="63" Y="39" LX="64" LY="39" /> • <DOOR X="34" Y="0" LX="36" LY="0" /> • </DOORS> • <RACKS R="0" G="0" B="203"> • <RACK ID="EA01" X="73" Y="9" LX="75" LY="10" PLANNED="0"/> • <RACK ID="EA03" X="73" Y="8" LX="75" LY="9" PLANNED="0"/> • </RACKS> • <WALLS R="0" G="0" B="0"> • <WALL X="0" Y="0" LX="0" LY="60" /> • <WALL X="0" Y="0" LX="76" LY="0" /> • </WALLS> • <STEPS R="255" G="163" B="0"> • <STEP X="47" Y="36" LX="52" LY="37" /> • <STEP X="47" Y="37" LX="52" LY="38" /> • </STEPS> • </ROOM> • </CC> LCG Operations Workshop 24-26/05/2005 Bologna

  11. Service challenges, GRID VOs • Lemon allows for • Virtual clusters • clusters defined on request by service managers • or defined by scripts – updated dynamically on demand • or defined for specific purpose • Examples: Alice MDC, network challenges,… • Clusters defined dynamically • example: hosts running GRID jobs on the batch cluster belonging to the given Virtual Organization • hooks in Lemon for defining any dynamic grouping of hosts LCG Operations Workshop 24-26/05/2005 Bologna

  12. Automatic recovery actions and Alarms • Alarm Sensor • For defined values of measured metrics an actuator is called with predefined action • An example: ssh daemon dead – action /sbin/service sshd start • Definition: metric X, field Y <op> reference value Z => call actuator • <op> can be ==,<,>,regexp, range, etc.. • If success log only, else call action up to max times • Each occurrence is logged in the Monitoring Repository • Already about 70 predefined alarms with automatic recovery actions • After first month of deployment it reduced number of problem tickets by half • Correlation engine (CMDaemon) • Allows ‘global’ correlations, and in the future client/server alarms and recovery actions • Lemon Alarm gateway (LAG) • Lemon’s LAG can be used to feed alarms into arbitrary alarm systems (under development) LCG Operations Workshop 24-26/05/2005 Bologna

  13. Installation and setup (I) Lemon installation consists of three steps: • Server installation • Client installation • Web interface installation 1. Server installation: • install edg-fabricMonitoring-server rpm (“flat file” server) • Configure receiving port in /etc/edg-fmon-server.conf • Start the server daemon 2. Client installation: • Install edg-fabricMonitoring-agent rpm (comes with default metric configuration) • Configure server and its port in /etc/edg-fmon-agent.conf • Start the client daemon on all monitored hosts LCG Operations Workshop 24-26/05/2005 Bologna

  14. Installation and setup (II) 3. Web interface installation • Install and start apache server (with php) on your server • Install rrdtool and lrf (lemon rrd framework) rpms • Configure your clusters in clusters.conf file and start lemonmrd daemon • Drink Champagne… you have Lemon up and running! ;-) • You can do all this on your laptop! • Possible additional components: • Computer center synoptic view through xml file • Problem tracking system integration (through php plug-in to your DB/application) • Quattor CDB configuration view – through CDB xml profiles • Oracle based Repository (for very large installations with high scalability and increased functionality) • Other, new components are easy to add • View detailed instructions at: http://cern.ch/lemon/doc/installation/installation.html LCG Operations Workshop 24-26/05/2005 Bologna

  15. Summary • Lemonserves to provide monitoring information about the farms in Computer Centers (or your laptop). • Lemon provides framework for recovery actions and alarms. • Lemon is easy to install (…and it is easy to add your own metrics and visualize them). • It is flexible with respect to your needs – you can add clusters, views, specify your definition of virtual and dynamic clusters. • It has been a useful tool for general monitoring of performance and also for system administrators in debugging problems. • For more information check http://cern.ch/lemon LCG Operations Workshop 24-26/05/2005 Bologna

More Related