150 likes | 281 Views
Lemon Web Monitoring. Miroslav Šiket CERN IT/FIO http://cern.ch/lemon-status. Outline. Concepts Design and architecture Web visualization Deployment Current development. Concepts of Monitoring. Monitoring information in Computer Centers CERN ~ 2000 computers and ~ 70 clusters
E N D
Lemon Web Monitoring Miroslav Šiket CERN IT/FIO http://cern.ch/lemon-status HEPIX, Edinburgh, May 24-28
Outline • Concepts • Design and architecture • Web visualization • Deployment • Current development HEPIX, Edinburgh, May 24-28
Concepts of Monitoring • Monitoring information in Computer Centers • CERN ~ 2000 computers and ~70 clusters • Huge amount of data ~150 metrics per host • High demand on organization of the information in easily accessible way and easily to parse • Variety of views for different groups of users – sysadmins, users, managers • Lemon – tries to do the job by incorporating many relatively new technologies HEPIX, Edinburgh, May 24-28
Monitoring information • We have generally three types of data: • Performance metrics: • CPU usage, load averages, memory use, disk use/performance, sockets, network, … • Exceptions: • High load, swap use over 90%, service down,… • Status information: • Uptime, boot time, kernel version,… • Heartbeat • All is gathered with different frequencies from 60s to 1 day/on boot. • About 1GB of data a day HEPIX, Edinburgh, May 24-28
Lemon Architecture HEPIX, Edinburgh, May 24-28
Components (I) • MSA (Monitoring Sensor Agent) and MS (Monitoring Sensor) - MS measures data and MSA provides transport to MR • MR (Monitoring Repository) with backend to Oracle, MySQL, flat file,… • Correlation Engine – framework for creating metric correlations • Alarm Broker (prototype) – daemon for handling exceptions and communication between alarm GUI and MR HEPIX, Edinburgh, May 24-28
Components (II) • Anamon (Analysis of MONitoring information) – java based GUI for real-time visualization of metrics • SOAP/WSDL – MR provides Web services extension for any additional users • RRD/Apache/PHP framework for easy access to the pre-processed information • CDB (Configuration Database) – many components access this information which is part of Quattor framework at CERN HEPIX, Edinburgh, May 24-28
RRD Tool Framework • RRD (Round Robin Database) • Data is organized in time-series files of aging information • Supported types – Gauge, Counter, Derive, Absolute • Framework for storing measurement averages, min, max, derivatives,… • Provides graphing capabilities • Provides simple mathematic operation on stored data • Data does not expand in size with time • Provides export to XML, flat file formats • Is widely used by many applications – MRTG, Ganglia, CDF Farm Control, FBSNG WWW HEPIX, Edinburgh, May 24-28
Framework Architecture • RRD Tool framework is used to store and to manipulate data • Data is retrieved from Monitoring Repository by a daemon in 5 min. intervals • Data are pre-processed and RRD files are updated • Apache/PHP and RRD tools are accessing these files and are creating statistics per host and per cluster • In connection with CDB also configuration information is provided • JPGraph (PHP) is used to provide access to information in graphical form from the MR that is not available through RRD Framework HEPIX, Edinburgh, May 24-28
Cluster information HEPIX, Edinburgh, May 24-28
Host information HEPIX, Edinburgh, May 24-28
JpGraph and host reboots HEPIX, Edinburgh, May 24-28
Scalability • Scalability is usually an issue with large scale monitoring frameworks • Our framework currently encompasses ~2000 computers at CERN and is scalable to more than10.000 computers • RRD Tool reduces need to access directly MR (Oracle) and provides cached information • Our framework provides support for RRD framework clusters and is expandable – currently uses about 40 most common performance metrics HEPIX, Edinburgh, May 24-28
Issues and future work • RRD Tool framework does not contain certain features that we have added to it – support for uploading historical data, easy removal and addition of metrics,… • Current development: • Dynamic configuration of stored data in connection with CDB (configuration DB) • Packaging and providing site independent structure • Expanding framework for Web displays – on demand correlations, manipulation of cluster configuration,… • Summary displays for exception metrics HEPIX, Edinburgh, May 24-28
Conclusion • The framework is currently in deployment at CERN • Already help for sysadmins, developers, experiments in data challenges • Framework provides an easy overview of the computing capabilities at our computing center • It is alive and is currently being improved to suit user needs, to provide centralized information, to provide more functionality HEPIX, Edinburgh, May 24-28