1 / 14

Farm monitoring

Farm monitoring. Massimo Biasotto - LNL. Local Farm Monitoring. LNL experiences with “local” farm monitoring July 2001, we first started with MRTG : a lot of problems heavy footprint on the server unreliable (processes hanging with unreachable hosts) not scalable

callie-rice
Download Presentation

Farm monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Farm monitoring Massimo Biasotto - LNL

  2. Local Farm Monitoring • LNL experiences with “local” farm monitoring • July 2001, we first started with MRTG: a lot of problems • heavy footprint on the server • unreliable (processes hanging with unreachable hosts) • not scalable • November 2001, remstats: improvements • lighter and more robust than MRTG • more flexibility in graph display and alarm management • still scalability problems (it works in sequential mode)

  3. Remstats example

  4. Remstats example

  5. Remstats vs MRTG

  6. Ganglia • March 2001, ganglia: many advantages • much greater resolution: metrics sampled every 15 sec instead of 5 min • scalability: based on a distributed architecture, with data exchange via multicast channel • single host metrics easily integrated to produce “cumulative” overview graphs • there is still need to customize the tool (adding more metrics, customizing web pages, etc)

  7. Ganglia example

  8. Ganglia example

  9. Ganglia example

  10. Netsaint • During our survey of the existing monitoring tools, Netsaint was considered and discarded • Main reason: it didn’t monitor host performance metrics, like % cpu, load, network traffic, etc. (at least, not without heavy customization). Maybe now the necessary plugins have been added. • It didn’t have a database to record the historical data • It monitors the status of the hosts (up or down) and of some network services • It provides a log of all relevant events (hosts/services going up or down, etc.) • Probably other features, but I’ve never investigated the tool deeply

  11. Grid monitoring • Grid monitoring is different than “local” farm monitoring • you cannot monitor on a WAN all the performance metrics of all the farm nodes (and you probably don’t want to) • Currently, Netsaint is used on DataGrid Testbed to monitor the status of the testbed nodes and their grid services • http://infngrid.ct.infn.it/index-orig.html (infn-tb/guest) • Is this useful for CMS? • Can other useful features be added?

  12. Netsaint example

  13. Adapting CMS monitoring to Grid • What are the CMS requirements for “Grid monitoring”? • What do we want to monitor and why? • Once these questions have been addressed, we can decide if Netsaint fulfills the requirements • Integrating Netsaint into existing CMS farms shouldn’t be difficult • the main issue is probably the setup (and maintenance) of the central repository • But it should be done only if there is a real need, not just for the sake of it

More Related