310 likes | 463 Views
Grid Event Management Using R-GMA Monitoring Framework. ISGC April 27-29, 2005 Min Tsai ASCC/CERN. Overview. Introduction to R-GMA R-GMA Monitoring Framework Event Management System. R-GMA Introduction (Slides from Steve Fisher). Background R-GMA developed in EDG Deployed by LCG
E N D
Grid Event Management Using R-GMA Monitoring Framework ISGC April 27-29, 2005 Min Tsai ASCC/CERN
Overview • Introduction to R-GMA • R-GMA Monitoring Framework • Event Management System
R-GMA Introduction (Slides from Steve Fisher) • Background • R-GMA developed in EDG • Deployed by LCG • Re-engineered for the EGEE project • Uniform data transport mechanism for: • Measurement • Resource information • Logging • Monitoring
R-GMA – one cloud • A relational implementation of GMA (from GGF) • Powerful data model and query language • All data modelled as tables • SQL can express complex queries in one expression • Creates impression that you have one RDBMS Producer R-GMA Consumer
Site A Site B Server Server Registry Service Schema Service Registry Service Key control data Example Deployment Server Server Producer Service Consumer Service Producer Service Consumer Service API API Consumer application Producer application Site C Site D
Producer application API Key control data The Producer Service • Producer created declare Producer Service
Producer application API Key control data The Producer Service • Registry Service contacted to register producer Registry Service register producer declare Producer Service
Producer application API Key control data The Producer Service • Data transferred from the application to Producer Service • If there are no consumers then the data goes no further than the Producer Service Registry Service register producer declare Producer Service insert
Primary Producers (PP) Initial source of data Data published by user code stored by Producer Service Secondary Producers (SP) Used to republish data to: co-locate information to speed up queries reduce network traffic SP PP PP Producers PP
Consumer application API Key control data The Consumer Service • Query issued by user code • Consumer Service carries out all of the work on its behalf query Consumer Service
Registry Service Mediator Consumer application API Key control data The Consumer Service • Registry Service contacted to identify relevant producers • Mediator works out list of producers to answer query list of producers query query Consumer Service
Registry Service Mediator Consumer application API Key control data The Consumer Service • Consumer Service contacts relevant producers list of producers query query query Producer Service Consumer Service Producer Service Producer Service Producer Service
Registry Service Mediator Consumer application API Key control data The Consumer Service • Data transferred directly from Producer Services to the Consumer Service list of producers query query query Producer Service Consumer Service Producer Service Producer Service tuples Producer Service tuples
The Consumer Service • Query types: • Continuous queries • as soon as new data becomes available it is broadcast to all interested parties • History Queries • return time sequenced data • Latest Queries • correspond to intuitive idea of current information
Using R-GMA • Three steps to Producing • Create Producer • Declare table • Insert data • Three Steps to Consuming • Create a Consumer with your query • Start it • Read data • More information • JRA1-UK web site: http://hepunx.rl.ac.uk/egee/jra1-uk/
Problems with Monitoring Tools • Inconsistent site configuration sources • Results in different site coverage • Difficult to correlate test results • Different site Identification methods • Search through many web pages • Time consuming especially with 100’s sites!
Unified Monitoring • Site configurations are consistent • Only from GOCDB • Data is sent to single data transport (R-GMA) • Shared data schema • Shared client applications (UI, Alarm Sys.) • Data can be selectively collected to create custom view (CIC, ROC, site admin) • Application can share results and build dependencies • Application can more easily be distributed using R-GMA to aggregate the data
Unified Schema • TestDef • Test are registered here • Provides information describing each test • testName, friendlyName, isVirtual • dataType, Unit • testHelp, testTitle • TestDefRelation • Defines logical test groups and organize tests • superTestName, subTestName • displayPriority, flavour
Unified Schema (cont.) • TestData • Holds data produced by test • testName • nodeName • summaryData • detailedData • Status • 0:NA, 10:OK, 20:INFO, etc. • TestDataRelation • Defines test data dependencies • superTestName, superNodeName • subTestName, subNodeName
Introduction to EMS • Goal of Operation Centers to maximize availability of Grid services • EMS will help facilitate this by providing: • immediate notification as faults and performance degradation is identified • Shared EMS services that enables • Correlation and analysis of historical events of disparate monitoring tools • Reduce redundant development of notification modules for Grid monitoring applications
EMS Prototype • Web Services event transport and command client developed as a proof of concept • Individual monitoring application send events directly to a centralized EMS Core Service. • Client applications can then query the EMS Core for new events • Events are stored in an XML format and filtered using XPath expressions
Migration to R-GMA Transport • Already fully deployed in LCG2 • R-GMA benefits • No additional work required by applications using R-GMA Monitoring Framework • Built-in data archiving • Automatic timestamps • Flexible SQL queries • Planned features • Security • Fully redundant directory and schema service
What Lies Ahead • Not ready for operations use yet • Display • Filtering mechanism to customize event received • Display list of current active alarms • Event reduction • Based on test dependencies • Damping of flapping events • Notification • Email • Audio • Communication • Broadcast comments for a particular event
Questions? • R-GMA Monitoring Framework • http://goc.grid.sinica.edu.tw/gocwiki/RgmaUnifiedMonitoringSystem • Piotr Nyczyk • Judit Novak • Andrey Kiryanov • Dave Kant • EMS • http://goc.grid.sinica.edu.tw/gocwiki/EventManagementSystem • Antun Gusev • Doug Chen • Mark Ho • Jeng-Hsueh Wu