180 likes | 207 Views
The Grid Job Monitoring Service. Lud ě k Matyska et al. CESNET, z.s.p.o. Prague Czech Republic. Motivation. Job tracking Too complex environment Responsibility delegation Independent decision by components Security issues (only delegated contact) Parallel and multipart jobs
E N D
The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic
Motivation • Job tracking • Too complex environment • Responsibility delegation • Independent decision by components • Security issues (only delegated contact) • Parallel and multipart jobs • Too many sub-tasks • View aggregation
The Logging and Bookkeeping Service • Collects events associated with job life, e.g. • Job submitted • Resource found • Job started on a CE (Computing Element) • Job finished its computation • Stores them in bookkeeping and logging databases • Provides the job state to end users
LB service architecture • Two APIs • logging API • server API • Local logger service • The database servers
Architecture—Comments • Message format: • ULM based (NetLogger) • Semantic rules prescribed • Local logger service • locallogger daemon • interlogger daemon • local persistency (local disk file) • Data transfer to database servers • Bookkeeping server: persistent during the job life time • Logging server: “eternally” persistent
Logging API • Simple • Just one function dg_log_event() • Always stores date/time, event producer, jobID • Authenticated
Server API • State computed on-demand • Three core functions: • List of user’s jobs • Job status for a given job • List of events related to a given job • Authenticated
Job Identification • GRID-wide (global) identifier • Used to identify the appropriate bookkeeping server • Currently “wired in” • In the future probably via Information service • URL-like syntax: https://hostname:port/unique_string?... • unique_string —to distinguish individual jobs • Bookkeeping server “speaks” https protocol
Security Considerations • Authentication • Both for logging and database queries • Certificate based (user and/or host/service) • User associated with jobID on first authenticated event • Secure channels • Storage (database) access
R-GMA Integration • Work in progress • The goals: • To lower database load • To provide notification service • To allow better integration with other information services
LB Service Extensions • User defined attributes • To store additional information associated with a job • To retrieve job collections • Synchronous API • Job checkpointing (at the application level) • Information stored in Bookkeeping server
Job Partitionning • Group ID • Job collections • Hierarchical • Aggregate queries
Conclusion • LB service provides • Job tracking • Persistent event storage • Job state provision • Future work • (R-)GMA integration • Authorization • Collective operations