1 / 15

Lemon Web Monitoring

Lemon Web Monitoring. Miroslav Šiket CERN IT/FIO http://cern.ch/lemon-status. Outline. Concepts Design and architecture Web visualization Deployment Current development. Concepts of Monitoring. Monitoring information in Computer Centers CERN ~ 2000 computers and ~ 70 clusters

mali
Download Presentation

Lemon Web Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lemon Web Monitoring Miroslav Šiket CERN IT/FIO http://cern.ch/lemon-status HEPIX, Edinburgh, May 24-28

  2. Outline • Concepts • Design and architecture • Web visualization • Deployment • Current development HEPIX, Edinburgh, May 24-28

  3. Concepts of Monitoring • Monitoring information in Computer Centers • CERN ~ 2000 computers and ~70 clusters • Huge amount of data ~150 metrics per host • High demand on organization of the information in easily accessible way and easily to parse • Variety of views for different groups of users – sysadmins, users, managers • Lemon – tries to do the job by incorporating many relatively new technologies HEPIX, Edinburgh, May 24-28

  4. Monitoring information • We have generally three types of data: • Performance metrics: • CPU usage, load averages, memory use, disk use/performance, sockets, network, … • Exceptions: • High load, swap use over 90%, service down,… • Status information: • Uptime, boot time, kernel version,… • Heartbeat • All is gathered with different frequencies from 60s to 1 day/on boot. • About 1GB of data a day HEPIX, Edinburgh, May 24-28

  5. Lemon Architecture HEPIX, Edinburgh, May 24-28

  6. Components (I) • MSA (Monitoring Sensor Agent) and MS (Monitoring Sensor) - MS measures data and MSA provides transport to MR • MR (Monitoring Repository) with backend to Oracle, MySQL, flat file,… • Correlation Engine – framework for creating metric correlations • Alarm Broker (prototype) – daemon for handling exceptions and communication between alarm GUI and MR HEPIX, Edinburgh, May 24-28

  7. Components (II) • Anamon (Analysis of MONitoring information) – java based GUI for real-time visualization of metrics • SOAP/WSDL – MR provides Web services extension for any additional users • RRD/Apache/PHP framework for easy access to the pre-processed information • CDB (Configuration Database) – many components access this information which is part of Quattor framework at CERN HEPIX, Edinburgh, May 24-28

  8. RRD Tool Framework • RRD (Round Robin Database) • Data is organized in time-series files of aging information • Supported types – Gauge, Counter, Derive, Absolute • Framework for storing measurement averages, min, max, derivatives,… • Provides graphing capabilities • Provides simple mathematic operation on stored data • Data does not expand in size with time • Provides export to XML, flat file formats • Is widely used by many applications – MRTG, Ganglia, CDF Farm Control, FBSNG WWW HEPIX, Edinburgh, May 24-28

  9. Framework Architecture • RRD Tool framework is used to store and to manipulate data • Data is retrieved from Monitoring Repository by a daemon in 5 min. intervals • Data are pre-processed and RRD files are updated • Apache/PHP and RRD tools are accessing these files and are creating statistics per host and per cluster • In connection with CDB also configuration information is provided • JPGraph (PHP) is used to provide access to information in graphical form from the MR that is not available through RRD Framework HEPIX, Edinburgh, May 24-28

  10. Cluster information HEPIX, Edinburgh, May 24-28

  11. Host information HEPIX, Edinburgh, May 24-28

  12. JpGraph and host reboots HEPIX, Edinburgh, May 24-28

  13. Scalability • Scalability is usually an issue with large scale monitoring frameworks • Our framework currently encompasses ~2000 computers at CERN and is scalable to more than10.000 computers • RRD Tool reduces need to access directly MR (Oracle) and provides cached information • Our framework provides support for RRD framework clusters and is expandable – currently uses about 40 most common performance metrics HEPIX, Edinburgh, May 24-28

  14. Issues and future work • RRD Tool framework does not contain certain features that we have added to it – support for uploading historical data, easy removal and addition of metrics,… • Current development: • Dynamic configuration of stored data in connection with CDB (configuration DB) • Packaging and providing site independent structure • Expanding framework for Web displays – on demand correlations, manipulation of cluster configuration,… • Summary displays for exception metrics HEPIX, Edinburgh, May 24-28

  15. Conclusion • The framework is currently in deployment at CERN • Already help for sysadmins, developers, experiments in data challenges • Framework provides an easy overview of the computing capabilities at our computing center • It is alive and is currently being improved to suit user needs, to provide centralized information, to provide more functionality HEPIX, Edinburgh, May 24-28

More Related