1 / 18

The Grid Job Monitoring Service

The Grid Job Monitoring Service. Lud ě k Matyska et al. CESNET, z.s.p.o. Prague Czech Republic. Motivation. Job tracking Too complex environment Responsibility delegation Independent decision by components Security issues (only delegated contact) Parallel and multipart jobs

Download Presentation

The Grid Job Monitoring Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic

  2. Motivation • Job tracking • Too complex environment • Responsibility delegation • Independent decision by components • Security issues (only delegated contact) • Parallel and multipart jobs • Too many sub-tasks • View aggregation

  3. Job Movement

  4. The Logging and Bookkeeping Service • Collects events associated with job life, e.g. • Job submitted • Resource found • Job started on a CE (Computing Element) • Job finished its computation • Stores them in bookkeeping and logging databases • Provides the job state to end users

  5. Job Life Cycle

  6. LB service architecture • Two APIs • logging API • server API • Local logger service • The database servers

  7. Architecture — Schema

  8. Architecture—Comments • Message format: • ULM based (NetLogger) • Semantic rules prescribed • Local logger service • locallogger daemon • interlogger daemon • local persistency (local disk file) • Data transfer to database servers • Bookkeeping server: persistent during the job life time • Logging server: “eternally” persistent

  9. Logging API • Simple • Just one function dg_log_event() • Always stores date/time, event producer, jobID • Authenticated

  10. Server API • State computed on-demand • Three core functions: • List of user’s jobs • Job status for a given job • List of events related to a given job • Authenticated

  11. Job Identification • GRID-wide (global) identifier • Used to identify the appropriate bookkeeping server • Currently “wired in” • In the future probably via Information service • URL-like syntax: https://hostname:port/unique_string?... • unique_string —to distinguish individual jobs • Bookkeeping server “speaks” https protocol

  12. Security Considerations • Authentication • Both for logging and database queries • Certificate based (user and/or host/service) • User associated with jobID on first authenticated event • Secure channels • Storage (database) access

  13. R-GMA Integration • Work in progress • The goals: • To lower database load • To provide notification service • To allow better integration with other information services

  14. R-GMA—First Extension

  15. LB Service Extensions • User defined attributes • To store additional information associated with a job • To retrieve job collections • Synchronous API • Job checkpointing (at the application level) • Information stored in Bookkeeping server

  16. Job Partitionning • Group ID • Job collections • Hierarchical • Aggregate queries

  17. Conclusion • LB service provides • Job tracking • Persistent event storage • Job state provision • Future work • (R-)GMA integration • Authorization • Collective operations

  18. Thank you for your interest

More Related