140 likes | 282 Views
Farm monitoring. Massimo Biasotto - LNL. Local Farm Monitoring. LNL experiences with “local” farm monitoring July 2001, we first started with MRTG : a lot of problems heavy footprint on the server unreliable (processes hanging with unreachable hosts) not scalable
E N D
Farm monitoring Massimo Biasotto - LNL
Local Farm Monitoring • LNL experiences with “local” farm monitoring • July 2001, we first started with MRTG: a lot of problems • heavy footprint on the server • unreliable (processes hanging with unreachable hosts) • not scalable • November 2001, remstats: improvements • lighter and more robust than MRTG • more flexibility in graph display and alarm management • still scalability problems (it works in sequential mode)
Ganglia • March 2001, ganglia: many advantages • much greater resolution: metrics sampled every 15 sec instead of 5 min • scalability: based on a distributed architecture, with data exchange via multicast channel • single host metrics easily integrated to produce “cumulative” overview graphs • there is still need to customize the tool (adding more metrics, customizing web pages, etc)
Netsaint • During our survey of the existing monitoring tools, Netsaint was considered and discarded • Main reason: it didn’t monitor host performance metrics, like % cpu, load, network traffic, etc. (at least, not without heavy customization). Maybe now the necessary plugins have been added. • It didn’t have a database to record the historical data • It monitors the status of the hosts (up or down) and of some network services • It provides a log of all relevant events (hosts/services going up or down, etc.) • Probably other features, but I’ve never investigated the tool deeply
Grid monitoring • Grid monitoring is different than “local” farm monitoring • you cannot monitor on a WAN all the performance metrics of all the farm nodes (and you probably don’t want to) • Currently, Netsaint is used on DataGrid Testbed to monitor the status of the testbed nodes and their grid services • http://infngrid.ct.infn.it/index-orig.html (infn-tb/guest) • Is this useful for CMS? • Can other useful features be added?
Adapting CMS monitoring to Grid • What are the CMS requirements for “Grid monitoring”? • What do we want to monitor and why? • Once these questions have been addressed, we can decide if Netsaint fulfills the requirements • Integrating Netsaint into existing CMS farms shouldn’t be difficult • the main issue is probably the setup (and maintenance) of the central repository • But it should be done only if there is a real need, not just for the sake of it