1 / 15

OGSA-based Grid Workload Monitoring

OGSA-based Grid Workload Monitoring. R. Zhang 1 , S. Heisig 2 , S. Moyle 1 and S. McKeever 1 1 Oxford University Computing Laboratory 2 IBM T.J. Watson Research Centre. Complicated Systems. Open Grid Service Architecture (OGSA) , is in a nutshell: The Grid + Web Services

Download Presentation

OGSA-based Grid Workload Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OGSA-based Grid Workload Monitoring R. Zhang1 ,S. Heisig2 ,S. Moyle1 and S. McKeever1 1 Oxford University Computing Laboratory 2 IBM T.J. Watson Research Centre

  2. Complicated Systems • Open Grid Service Architecture (OGSA), is in a nutshell: The Grid + Web Services • WhileOGSA brings computational power and interoperability, it also inevitably yields Dynamics and Complexity

  3. Complicated Problems • For instance, the system has been slow (i.e. SLA violation) in the past hour • What is causing the problem? • How can it be fixed and prevented? • We must find out: • Grid services (and underlying platforms) touched • Time spent on services (and underlying platforms) • End-to-endresponse time composition

  4. Monitoring: The First Step • We need to trace works across Grid services from end to end, monitoring workload and reporting data. • “If you don’t measure it, you can’t control it.”– TQM • Workload monitoring – the first step towards achieving self-managing and self-optimising system.

  5. Instrumentation • Monitoring points inserted into common (OGSA-based Grid) middleware. • Requests given a unique ID and traced through the system.

  6. Measurement • Timer at every monitoring point measures local response time. • Subtraction gives elapsed time (no clock sync). Start 0 (Client) Stop 0 Start 1 (Tomcat@eD) Stop1 Start 2 (Axis@ eD) Stop 2 Start 3 (Tomcat@Ogsa-Dai) Stop 3 Start 4 (Axis@Ogsa-Dai) Stop 4

  7. Reporting • Data batched and aggregated at agents to reduce reporting overhead. • Data reported with Java Messaging Service (JMS) to provide reliability and scalability.

  8. Concurrency Issue • Parallel invocation is common in practice. For example, Grid service A calls B,D in parallel, and then C after B and D return. • Concurrency is modelled by response time service Petri-Net (RTSPN),which is constructed automatically from data collected.

  9. Experiment in eDiamond Setting

  10. Monitoring Data in DB

  11. Visualisation Screen Shot

  12. Conclusions • We have developed a monitoring infrastructure for OGSA-based Grids that: • discovers servicestouched; • monitors workload in an end-to-end manner; • captures concurrency in workload; • provides automated visualisation, • is portable (thanks to OGSA), scalable and lightweight (5 ms/req,service).

  13. Future Work • The current infrastructure has enabled research on: • Performance problem determination; • End-to-end performance tuning/service differentiation • Real eDiamond workload data collection; • Instrumentation with finer granularity

  14. We are grateful to • DTI for project grant • IBM for software/research support • eDiaMoND for experiment environment • all of you for coming along • Questions?

  15. RTSPN Construction • Automatic construction from data • Each service receives ID of the service invoking it. • Each service receives IDs from services it depends on: • workflow description • temporal relation

More Related