110 likes | 135 Views
Grid Monitoring. A conceptual introduction to GridICE. Sergio Andreozzi sergio.andreozzi@cnaf.infn.it DataTAG WP4. OUTLINE. Definition of Grid Monitoring Service GridICE: Architecture overview Data flow: from resources to users Current deployment in the DataTAG/Glue Testbed.
E N D
Grid Monitoring A conceptual introduction to GridICE Sergio Andreozzi sergio.andreozzi@cnaf.infn.it DataTAG WP4
OUTLINE • Definition of Grid Monitoring Service • GridICE: • Architecture overview • Data flow: from resources to users • Current deployment in the DataTAG/Glue Testbed
Defining Grid Monitoring Service • Grid Monitoring Service: • the activity of measuring significant grid resources related parameters • in order to • analyzeusage, behaviorandperformance of the grid • detect and notify • fault situations • contract violations (SLA) • user-defined events • Essential part of a Grid Management Activity
Components for a Grid Monitoring Service • Measurement Service: • service able to probe the resources for certain parameters (especially QoS related) • Discovery Service: • service able to find out which resources are currently available • rely on Grid Information Service • Detection and Notification Service: • Fault situations, SLA violations, user-defined events • Data Analyzer: • Performance, Usage, general reports and statistics • Presentation Service: • web-based graphic user interface • role-based view
Measurement Service • service able to probe the resources for certain • parameters • Parameters: • Based on Glue Schema (vers. 1.1) • Richer host related parameter set • soon: • Glue Network Service, Job Details Monitoring, Collective Services • Collecting observations: • Worker node related: • Rely on EDG WP4 fmon (customized) in order to collect worker nodes params at cluster head node • Injecting params in the GIS: • Standard EDG4Glue + extensions for worker nodes info
Discovery Service • service able to find out which resources are • currently available: • rely on available Grid Information Service in order to be able to automatic discovery resources • at the moment, MDS 2.x is supported • Porting on LCG-1/EDG 2.0 Grid Information Service is straightforward
Detection and Notification Service • Rely on the following Nagios functionalities: • Activity Scheduler • Event Notification • At the moment a pre-defined set of events is checked for notification • LCG should quickly define the interesting set of events • Dynamic event configuration is foreseen as a low priority development task
Presentation Service • Web-based graphic user interface • Role-based view: • VO-manager • Resources available to the VO • Total running jobs owned by users part of the VO • Site manager • Status of local resources • Grid manager (e.g. Operator of LCG GOC) • Status of all general services • User • Total Accessible Free Processor
Data flow from resources to user Presentation Service Data Collection (historical) Detection & Notification Data Analyzer Grid Information Service Measurement Service
CentralMonitoringDatabase GRIS (GLUE schema) Second discovery phase information providers First discovery phase GIIS (GLUE schema) web interface EDG-WP4 monitoring agent EDG-WP4 monitoring agent EDG-WP4 fmonserver run metric output run metric output WP4 sensor WP4 sensor read read metric output metric output /procfilesystem /procfilesystem cluster worker node cluster workernode An example scenario: cluster ldap query informationindex ldap query monitoringserver write run ldif output farm monitoringarchive read cluster head node