OGSA-based Grid Workload Monitoring

OGSA-based Grid Workload Monitoring R. Zhang1 ,S. Heisig2 ,S. Moyle1 and S. McKeever1 1 Oxford University Computing Laboratory 2 IBM T.J. Watson Research Centre

Complicated Systems • Open Grid Service Architecture (OGSA), is in a nutshell: The Grid + Web Services • WhileOGSA brings computational power and interoperability, it also inevitably yields Dynamics and Complexity

Complicated Problems • For instance, the system has been slow (i.e. SLA violation) in the past hour • What is causing the problem? • How can it be fixed and prevented? • We must find out: • Grid services (and underlying platforms) touched • Time spent on services (and underlying platforms) • End-to-endresponse time composition

Monitoring: The First Step • We need to trace works across Grid services from end to end, monitoring workload and reporting data. • “If you don’t measure it, you can’t control it.”– TQM • Workload monitoring – the first step towards achieving self-managing and self-optimising system.

Instrumentation • Monitoring points inserted into common (OGSA-based Grid) middleware. • Requests given a unique ID and traced through the system.

Measurement • Timer at every monitoring point measures local response time. • Subtraction gives elapsed time (no clock sync). Start 0 (Client) Stop 0 Start 1 (Tomcat@eD) Stop1 Start 2 (Axis@ eD) Stop 2 Start 3 (Tomcat@Ogsa-Dai) Stop 3 Start 4 (Axis@Ogsa-Dai) Stop 4

Reporting • Data batched and aggregated at agents to reduce reporting overhead. • Data reported with Java Messaging Service (JMS) to provide reliability and scalability.

Concurrency Issue • Parallel invocation is common in practice. For example, Grid service A calls B,D in parallel, and then C after B and D return. • Concurrency is modelled by response time service Petri-Net (RTSPN),which is constructed automatically from data collected.

Experiment in eDiamond Setting

Monitoring Data in DB

Visualisation Screen Shot

Conclusions • We have developed a monitoring infrastructure for OGSA-based Grids that: • discovers servicestouched; • monitors workload in an end-to-end manner; • captures concurrency in workload; • provides automated visualisation, • is portable (thanks to OGSA), scalable and lightweight (5 ms/req,service).

Future Work • The current infrastructure has enabled research on: • Performance problem determination; • End-to-end performance tuning/service differentiation • Real eDiamond workload data collection; • Instrumentation with finer granularity

We are grateful to • DTI for project grant • IBM for software/research support • eDiaMoND for experiment environment • all of you for coming along • Questions?

RTSPN Construction • Automatic construction from data • Each service receives ID of the service invoking it. • Each service receives IDs from services it depends on: • workflow description • temporal relation

OGSA-based Grid Workload Monitoring

OGSA-based Grid Workload Monitoring

Presentation Transcript

Grid Monitoring

Transaction-based Grid Data Replication Using OGSA-DAI

Energy Aware Grid: Global Workload Placement based on Energy Efficiency

Grid Monitoring Discussion

Grid Infrastructure Monitoring System Based on Nagios

GRID Workload Management System

OGSA-DQP: A Service-Based Distributed Query Processor for the Grid

Declarative Grid Service Orchestration with OGSA-DQP

OGSA Based MetaService Architecture

Resource monitoring and discovery in OGSA

Grid Monitoring

Open Grid System Architecture (OGSA)

Grid Monitoring Services

Grid Monitoring Tools

Grid Infrastructure Monitoring

An Open Grid Service Architecture (OGSA)

Grid Workload Management

WP1 Grid Workload Management

Grid Infrastructure Monitoring

Grid Infrastructure Monitoring System Based on Nagios

Grid Monitoring