340 likes | 540 Views
R-GMA – DataGrid’s Monitoring System 1/7/2003. Werner Nutt (Heriot-Watt University) <w.nutt@hw.ac.uk>. RGMA = Relational Grid Monitoring Architecture. Grid Monitoring and Information System developed within DataGrid (Work Package 3)
E N D
R-GMA – DataGrid’s Monitoring System 1/7/2003 Werner Nutt (Heriot-Watt University) <w.nutt@hw.ac.uk>
RGMA = Relational Grid Monitoring Architecture • Grid Monitoring and Information System developed within DataGrid (Work Package 3) • Based on the “Grid Monitoring Architecture” of the Global Grid Forum • Code is open source and freely availableHomepage: type “wp3” into Google R-GMA -DataGrid's Monitoring System
Contributors • Heriot-Watt, Edinburgh • Andrew Cooke, Alasdair Gray, Lisha Ma, Werner Nutt • IBM-UK • James Magowan, Manfred Oevers, Paul Taylor • Queen Mary, University of London • Roney Cordenonsi • CCLRC/PPARC • Rob Byrom, Laurence Field, Steve Hicks, Manish Soni, Antony Wilson, Jason Leake • Linda Cornwall, Abdeslem Djaoui, Steve Fisher, Robin Middleton • SZTAKI, Hungary • Peter Kacsuk, Norbert Podhorszki • Trinity College Dublin • Brian Coghlan, Stuart Kenny, David O’Callaghan R-GMA -DataGrid's Monitoring System
Overview • Grid monitoring: Requirements • The R-GMA approach: A virtual monitoring database • Components of R-GMA: • Schema • Producers and Consumers • Registry • Republishers • Query Planning R-GMA -DataGrid's Monitoring System
Job Submission StatusInformation MonitoringSystem Resource Broker User Interface Logging and Bookkeeping StorageElement ComputingElement ReplicaCatalogue Computer Computer Computer Computer Computer Computer Data Transfer Major Components of DataGrid R-GMA -DataGrid's Monitoring System
WP7: R-GMA Collects Network Monitoring Data R-GMA -DataGrid's Monitoring System
The Grid Monitoring Problem In a Grid we have • Computers • Storage elements • Network nodes and connections • Application programmes, … Monitoring: • What is the current state of the system? • How did the system behave in the past ? R-GMA -DataGrid's Monitoring System
Monitoring Data Come in two Kinds A Grid monitoring system makes available two kinds of data • static data “pools”, e.g., databases on • network topology, nodes connected • applications available (versions, licences, ...) • “streams” of data, e.g., • sensor data (cpu load, network traffic, ...) Data streams may give rise to data pools if they are archived Today:R-GMA is tailored towards streams, but not pools R-GMA -DataGrid's Monitoring System
Examples of Monitoring Queries • “Show me the (average) cpu-load of computers at Heriot-Watt!” • “Between which nodes was yesterdaythe average transportation time for 1 MB packets higher than than 0.… seconds?” • For every computing element CE, how many computers of CE have currentlya cpu-load of no “ more than 30%?” R-GMA -DataGrid's Monitoring System
Grid Monitoring Requirements • Support for publishing data “pools” and “streams” • Support for locating data sources(automatic, if possible) • Queries with different temporal interpretations(continuous, latest state, history) • Scalability(there may be thousands of data sources) • Resilience to failure(data sources may become unavailable) • Flexibility (we don’t know which queries will be posed) R-GMA -DataGrid's Monitoring System
Architecture Approach 1: A Monitoring Data Warehouse Idea: • store all data about the Grid status into a huge database • and query it Not realistic: • Loading takes time • Data occupy space • Connections to the warehouse may fail • Often monitoring data flow as data streams, and queries ask for data streams as output R-GMA -DataGrid's Monitoring System
DirectoryService find/register Consumer Monitoring-Application Producer Sensor Data Base Approach 2: Monitoring with a “Multi-agent System” The Grid Monitoring Architecture (GMA) of the Global Grid Forumdistinguishes between: • Consumers of information • Producers of information • Directory Service • Producers register their supply • Consumers register their demand Directory Service mediatesbetween producers and consumers R-GMA -DataGrid's Monitoring System
Questions about GMA: • Which kinds of producers and consumers are there? • In which language do producers register their supplyand consumers their demand ? • What is the meaning of a registration? • How does a consumer find suitable producers? And how does a producer find suitable consumers? • Producers have different capabilities to answer queries (e.g. selections, joins, …). Which of them should they register? R-GMA -DataGrid's Monitoring System
DB Query DB-Producer Stream Producer Consumer Views on S Registry V1V2...Vn V Sensor Global Schema S R-GMA: A Virtual Monitoring Data Warehouse • Language of producers and consumers: relational queries (SQL) • Vocabulary: Relations in a global schema • Consumer: poses queries over global schema • Producer: • has a type(stream p., database p.) • publishes relationsR1,…,Rk • for every R, registers a simple view V on the global schema R-GMA -DataGrid's Monitoring System
Schema & Contributions R-GMA -DataGrid's Monitoring System
Contributions are Views SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’RAL’ SELECT * FROM cpuLoad WHERE country = ’UK’ AND site = ’GLA’ R-GMA -DataGrid's Monitoring System
Keys in the Global Schema Network throughput: tp(src, dest, method, pcktSize, timestamp, time) Intuitively, tphas the primary key (src, dest, method, pcktSize, timestamp). We need to know the primary keys • to understand the global schema • to answer latestsnapshot queries Primary keys aredeclared, butnotenforced! Although, sometimes they hold globally if they hold locally ! R-GMA -DataGrid's Monitoring System
Metaphor: Roles and Agents R-GMA Clients: Grid components or Grid applications • Clientscan play therolesof producers or consumers A client would need special capabilities for a role: • Clients are supported in their roles byagents Implementation: • APIs for client roles: “new StreamProducer(…)” • Agents are objectson a Web server R-GMA -DataGrid's Monitoring System
Primary Producers Database producer • supports queries over fixed set of tuples (static queries) • can be used to publish a database Stream producer • supports queries over changing set of tuples (continuous queries) • supports “latest snapshot queries” • offers up-to-date values for each primary key in a db Today: DatabaseProducer’s and StreamProducer’s in R-GMA are different from the above! R-GMA -DataGrid's Monitoring System
ProducerServlet ConsumerServlet Producer Consumer IIIIIIII... IIIIIIII... Queue Queue Communication Modes of Stream Producers Stream Producers may offer two communication modes for continuous queries: • lossless (… but tuples could become stale) • lossy (… but tuples are fresh) Today: R-GMA’s StreamProducer’s are resilient and support lossless communication R-GMA -DataGrid's Monitoring System
Republishers Publish Query Answers Archiver: shows the history of a stream. Stream Republisher: enables • merging, • thinning, • summarising of streams … R-GMA -DataGrid's Monitoring System
Republishers in R-GMA Today Republishers are called “archivers” (although some of them don't archive anything) An archiver (= republisher) • is defined by a query • consumes only from “stream producers” • publishes the query result according to its type, using • a “stream producer”, or • a “latest snapshot producer”, or • a “database producer” (which keeps an archive) Republishers are used to answercomplex queries! R-GMA -DataGrid's Monitoring System
National Republisher country = ‘uk’ Local/site Republisher site =‘ral’ site = ‘hw’ Stream Producers ral hw The Next Step: Hierarchies of Stream Republishers R-GMA -DataGrid's Monitoring System
Republisher Hierarchies:The Issues • Republishers are defined by queries:hierarchies have to be maintained automatically • newstream producers must only be added to republishers at “lowest level” • hierarchy has to be replanned if a republisher fails • difficult: transition from one plan to the other without loss of tuples • How well can we describe the content of a stream?Possibly need for descriptions that join • stream relations CPULoad(machineID, load, timestamp) • static relations locatedAt(machineID, site) R-GMA -DataGrid's Monitoring System
What is the Meaning of a Query in R-GMA? Assumption: the views of (primary) producers are selections on a single relation, i.e., queries of the form SELECT * FROM cpu_load WHERE machine_id = ‘AB123’ AND loc = ‘hw’(each producer contributes its parts of a relation) • The virtualdatabase contains the union of the data of all the primary producers • Conceptually, a query is evaluated over the entire virtual db R-GMA -DataGrid's Monitoring System
Stream Queries can have Various Temporal Interpretations Consider a query over the relation “Transport Time” tt(src, dest, pcktSize, method, timestamp, time) SELECT * FROM tt WHERE src = ral AND dest = bologna What is meant? Measurements • from now ?(Continuous Query) • up until now ?(History Query) • right now ?(Latest Snapshot Query) Today: Queries can be “flagged” with their type R-GMA -DataGrid's Monitoring System
Advanced Queries: Mixing Temporal Query Types • “Which connections have currentlya transportation time that is higher than last week's average?”(latest snapshot and history) • “Show me the cpu load of those machines where it is lower than yesterday's load average!” (continuous and history) We do not intend to support such queries by R-GMA! R-GMA -DataGrid's Monitoring System
In R-GMA Query Answering Needs Mediation SupposeP1, P2publish for tp (throughput) P1:… WHERE src = hw P2:… WHERE src = ral AND pcktSize > 20 A global consumer poses its query over global relations SELECT * FROM tp WHERE pcktSize > 10 A mediator translates this into queries over local relations SELECT * FROM P1.tp WHERE pcktSize > 10 UNION SELECT * FROM P2.tp Today: R-GMA’smediator handles simple queries like the one above R-GMA -DataGrid's Monitoring System
Global and Local Consumers • Global consumers pose queries over global relations SELECT * FROM tp WHERE pcktSize > 10 , which are translated into queries over local relations SELECT * FROM P1.tp WHERE pcktSize > 10 UNION SELECT * FROM P2.tp • Local consumerspose queries over local relations directly SELECT * FROM P1.tp WHERE method = ping Today: a consumer can be global or local, but local relations cannot be referred to explicitly R-GMA -DataGrid's Monitoring System
How does the Mediator Find Suitable Publishers? P1, P2, P3publish for tt (Transport Time) P1:… src = hw P2:… src = ral AND pcktSize > 20 P3:… src = ral AND method = ping Q: SELECT * FROM tt WHERE src = ral AND method = ping We see: P1 is not suitable for Q, but P2 and P3 are. Why? src = hwANDsrc = ral AND method = ping is never true src = ral AND pcktSize > 20AND… is sometimes true Satisfiability Test! Today:implemented R-GMA -DataGrid's Monitoring System
… So Which Publishers Should the Mediator Ask? P2:… src = ral AND pcktSize > 20 P3:… src = ral AND method = ping Q: SELECT * FROM tt WHERE src = ral AND method = ping All answers to Q returned by P2 are also returned by P3 : whenever src = ral AND pcktSize > 20ANDsrc = ral AND method = ping is true, then src = ral AND method = pingANDsrc = ral AND method = ping is true. Hence, R-GMA only needs to askP3 Entailment Test! Needed for Republisher Hierarchies! (not yet implemented) R-GMA -DataGrid's Monitoring System
… But What Did the Producers Promise? P registers view V Does P promise • someof V ? (sound description) • allof V? (sound and complete description) • The Entailment Test only makes sense when the registered views are sound and complete descriptions • Producers should register completeness flags R-GMA -DataGrid's Monitoring System
… Why May a Producer not be Complete? • The language of views is more restricted than the language of queriesHence: republishers may be unable to say exactly what they publish • Archivers may archive in lossy mode • Producers may lose tuples • A producer may not know everything about the real world • Open to debate R-GMA -DataGrid's Monitoring System
Summary (1) Monitoring data come in Pools and Streams Global Schema • primary keys Types of Stream Queries • continuous vs. history vs. latest snapshot Producers • DBproducers: publish database • stream producers: lossless vs. lossy communication modes R-GMA -DataGrid's Monitoring System