390 likes | 526 Views
Regional Grid Monitoring Introduction & database components. Wojciech Lapka SAM Team CERN EGEE’09 Conference, 21 - 25 September 2009, Barcelona. Outline. Introduction to the new Service Availability Monitoring System Description of the Database Components
E N D
Regional Grid MonitoringIntroduction & database components Wojciech Lapka SAM Team CERN EGEE’09 Conference, 21 - 25 September 2009, Barcelona
Outline Introduction to the new Service Availability Monitoring System Description of the Database Components • Aggregated Topology Provider (ATP) • Metric Description Database (MDDB) • Metric Results Store (Metric Store)
Outline Introduction to the new Service Availability Monitoring System Description of the Database Components • Aggregated Topology Provider (ATP) • Metric Description Database (MDDB) • Metric Results Store (Metric Store)
Outline Introduction to the new Service Availability Monitoring System Description of the Database Components • Aggregated Topology Provider (ATP) • Metric Description Database (MDDB) • Metric Results Store (Metric Store)
Databases - ATP ? ? ? How it will be tested? What to do with test results? What will be tested?
Databases - ATP ? ? Aggregated Topology Provider How it will be tested? What to do with test results? What will be tested?
Databases - ATP What information is provided by the ATP? • Topology information containing: • Projects (WLCG) and grid infrastructures (EGEE, OSG, NDGF) • Sites, Services, VOs and their groupings • Downtimes • A history of the above Why do we need it? • For availability re-calculations, history of grid topology is needed • We couldn’t name groups of arbitrary grid resources (e.g. ATLAS clouds) • Single authoritative information source with topology information
ATP - why do we need it? • Current flow of Grid topology data across various monitoring tools:
ATP - why do we need it? Streamlined grid topology data flow using the ATP:
ATP – data sources OSG IM GOCDB BDII VO / service mappings OSG topology & downtimes VO feeds Installed capacity Alice Voboxes Gstat 2.0 ATP sync Aggregated Topology Provider Project feeds VO cards WLCG MOU Portal CIC Portal EGEE topology & downtimes
ATP – status What do we have today? • MySQL and Oracle version • Synchronizer • A programmatic interface to retrieve ATP information (XML/JSON):
ATP – status What needs to be added? • History tables to record changes in topology information • Programmatic Interface - parameterised queries (similar to SAM PI)
Databases ? ? Aggregated Topology Provider How it will be tested? What to do with test results? What will be tested?
Databases - MDDB ? Metric Description Database Aggregated Topology Provider How it will be tested? What to do with test results? What will be tested?
Databases - MDDB What information is provided the MDDB? • Metrics which are used to test Grid infrastructure • Profiles – combination of metrics for computation of different availabilities and configuration of Nagios installations Why do we need it? • More flexible availability calculations: • Example: CMS would like to test Tier-1 and Tier-2 sites differently • Maintain a history of which metrics and calculations were valid at each point in time
MDDB - Architecture CENTRAL MDDB MDDB Sync Local Cache
MDDB - Status What do we have today? • MySQL and Oracle version • Integration with ATP • Web User Interface • A programmatic interface to retrieve MDDB information (JSON) What needs to be added? • Synchronizer between Central DB and local (ROC) caches • Interface for populating and querying profiles • Profiles: Mapping with grid resources
Databases ? Metric Description Database Aggregated Topology Provider How it will be tested? What to do with test results? What will be tested?
Databases – Metric Store Metric Results Store Metric Description Database Aggregated Topology Provider How it will be tested? What will be tested? What to do with test results?
Databases – Metric Store What information is provided by the Metric Store? • Metric results for service end-points for the grid infrastructure • Status changes for service end-points in the infrastructure What do we have today? • MySQL and Oracle versions: • Integration with MDDB and ATP • Per-service status change calculation for Profiles • Data loader • Data from 11 ROCs is being loaded to Central Metric Store: • Some of the records rejected (Mainly due to service end-points not defined correctly in GOCDB)
Metric Store – status What needs to be added: • MySQL – tuning of DB (e.g. table partitioning) • Programmatic Interface - parameterised queries • Purging mechanism • Alerting mechanism integrated with Nagios (e.g. when not enough metric results received in given period of time)
Central Metric Store Population Active & Passive Checks Results Service Definition Metric & Profile Definition
Outline Introduction to the new Service Availability Monitoring System Description of the Database Components • Aggregated Topology Provider (ATP) • Metric Description Database (MDDB) • Metric Results Store (Metric Store) Publicity
Publicity - Demo Watch our demo and vote for it: • Tuesday 16:30-17:00 • Wednesday lunch • http://tinyurl.com/EgeeSAM (YouTube) • http://www.youtube.com/watch?v=PADq2x8q0kw
Acknowledgments Thanks to the following people for their contributions: • James Casey (CERN) • Emir Imamagic (SRCE) • Pradyumna Joshi (BARC) • Rajesh Kalmady (BARC) • Vaibhav Kumar (BARC) • Steve Traylen (CERN) SAM Team at CERN: • John Shade • David Collados • Karolis Eigelis • Judit Novak • Konstantin Skaburskas
Summary New enhanced SAM system, based on Nagios - a very popular powerful open-source tool, will: • Simplify transition to the EGI era • Help site administrators with fabric monitoring ATP, acting as a single authoritative information aggregator, will simplify the job of assimilating grid resource information MDDB will allow flexible availability calculations Metric Results Store will help MyEGEE portal in displaying of the test results. Demo: http://tinyurl.com/EgeeSAM
Thank you! • Questions? • egee3-operations-automation-discuss@cern.ch