210 likes | 228 Views
CrossGrid Task 3.3 Grid Monitoring. Trinity College Dublin (TCD, AC14 – CR11) Brian Coghlan, Stuart Kenny, David O’Callaghan CYFRONET Academic Computer Center, Krakow (CYFRO, C01) Bartosz Balis, Slawomir Zielinski, Kazimieriz Balos ICM, University of Warsaw (ICM, AC2 – C01)
E N D
CrossGrid Task 3.3 Grid Monitoring Trinity College Dublin (TCD, AC14 – CR11) Brian Coghlan, Stuart Kenny, David O’Callaghan CYFRONET Academic Computer Center, Krakow (CYFRO, C01) Bartosz Balis, Slawomir Zielinski, Kazimieriz Balos ICM, University of Warsaw (ICM, AC2 – C01) Krysztof Nawrocki, Adam Padee
CrossGrid Task 3.3 Grid Monitoring Provides monitoring information from four main sources: - Applications (OCM – G) - gathers performance data from an executing application - used by application developers in order to understand an application’s behavior and improve its performance. - Infrastructure (JIMS) - gather and expose information concerning the state of devices used to build a grid environment - notify the user not only about simple events, but derived ones as well, - take managerial actions in cases of failures. - Instruments(/Networks) (SANTA-G) - allow information captured by external monitoring instruments to be introduced into the Grid information system. - used in validation and calibration of both intrusive monitoring systems and systemic models, and also for performance analysis.
CrossGrid Task 3.3 Grid Monitoring - Derived Results - gathering information from other monitoring tools and creation of one consistent user interface. - generation of forecasts of future grid state using Kalman Filters and neural networks.
Grid Monitoring System Infrastructure monitoring R-GMA/OGSA info Application monitoring
Task 3.3.1 OCM-G, Current State • OCM-G integrated with GT. • Secure communication based on globus_io between components (authentication, possibly encryption). • Service Managersrun on a "well known” port(3331, configurable). • Configuration via local config files (user home dir or /opt/cg/etc) • No longer need for shared fs ! • Still one central Service Manager • can handle multi-site applications unless firewalls block communication • Registration of application processes improved • Locks to get rid of race condition while forking LMs • Support for user-defined events (probes) added. • CVS status: • code up to date. • building with autobuild, on RH6.2. • need to make changes to comply with developers guide.
Task 3.3.1 OCM-G, Task Contacts • Task 2.4 - G-PM fully integrated with OCM-G in its current functionality. • G-PM now needs user certificate to connect to the OCM-G.
Task 3.3.1 OCM-G, Integration • Smooth integration with G-PM. • Communication based on globus_io. • No dependenciesto other Globus/EDG components.
Task 3.3.1 OCM-G, Problems and Issues • Building under RH7.3 – problems with globus_io developmentpackage. • Interface to Grid Benchmarks should be defined.
Task 3.3.2 SANTA-G, Current State • Improve the schema of information available: - Done, still more to do • Add more SQL parsing support: - Done, added more WHERE predicates - Supports =, > , < queries • Add on-line data acquistion: - Sensor now starts/stops TCPdump at startup/shutdown - Allows querying of dynamically generated network traffic • Integrate Sensor and QueryEngine components: - Sensor now contacts QueryEngine at startup - informs it when a new log file is generated, informs QE of shutdown • Enhance Viewer functionality - Improved Viewer GUI. - Graphical packet display, displays timestamps in correct format, automatically resolves IP addresses… - Query Builder added to allow user to construct complex queries
Task 3.3.2 SANTA-G, Task Contacts • EDG WP3 - SANTA-G makes use of the EDG R-GMA. - has also contributed to it, CanonicalProducer was an extension to the EDG R-GMA developed as part of Task 3.3.2. • Task 3.3.3 JIMS - integration with this task has begun - work should be completed by the end of the summer (see next slide).
JMX Client JMX Request SANTA-G (R-GMA Producer) R-GMA SQL R-GMA Consumer API JMX ResultSet Task 3.3.2 SANTA-G, Integration
R-GMA Producer Code R-GMA Producer API JIMS (MBean Server) JMX Request R-GMA SQL JMX ResultSet Task 3.3.2 SANTA-G, Integration
Task 3.3.2 SANTA-G, Problems and Issues • Need the most recent EDG R-GMA RPMS - Canonical Producer not in earlier release!! • R-GMA RPMs Redhat 7.3 only! • Still To Do: - Expand schema of available information - Improve SQL support - Complete SANTA-G/JMX integration - Testing - Investigate security
Task 3.3.3 JIMS, Current State • JIRO-based Infrastructure Monitoring System – JIMS - ported from JDMK to pure JMX reference implementation • host monitoring module, ready. • SNMP is in progress • SOAP Gateway for integration with other CG tasks • exposes Web Services based interface • makes integration with OGSA (Open Grid Services Architecture) easier: • Web Services Gateway module • simple SOAP client for testing purposes
JIMS, SOAP Gateway Facilities • Web Services Gateway serves as a mediator between MBean Servers in monitored stations and external applications • Place for registering active monitored stations and removing non-existent ones
Task 3.3.3 JIMS, Problems and Issues What is done: • Host monitoring system - JIMS - ready • SOAP Gateway - before deadline • Open (not commercial) implementation of discovery services - before deadline To do: • integration with CVS and autobuild process, by the end of this week • Simplifying installation process • Adding functionality: • other mechanisms for monitored stations unregistering • security when connecting modules via Web Services (SOAP/XML)
Task 3.3.4 PostProcessing, Current State • Forecaster based on linear Kalman filter implemented and available as RPM. • More work needed to put it in CVS, will be done during the meeting. • Current solution for real monitoring data from clusters is VO-Centric Ganglia
Task 3.3.4 PostProcessing, Integration • For integration meeting will provide 2 RPM’s: • ganglia-monitor-core-mcastmin-2.4.1-1.i386.rpm • serves as monitoring daemon on worker nodes • gmmetad-2.2-1.i386.rpm • located on cluster CE. Gathers information from monitoring daemons and passes it to central monitoring host • RPM slightly altered wrt to original • Would like to install these on X# clusters for testing during integration meeting.
Task 3.3.4 PostProcessing, Problems and Issues • 3rd part which binds forecaster and data sources under development • Not ready for integration meeting.