180 likes | 211 Views
Explore a comprehensive monitoring service for complex grid job environments, enabling job tracking, security, and delegation of responsibilities. The system offers services for logging, bookkeeping, and secure job movement tracking, ensuring efficient handling of parallel and multipart jobs.
E N D
The Grid Job Monitoring Service Luděk Matyska et al. CESNET, z.s.p.o. Prague Czech Republic
Motivation • Job tracking • Too complex environment • Responsibility delegation • Independent decision by components • Security issues (only delegated contact) • Parallel and multipart jobs • Too many sub-tasks • View aggregation
The Logging and Bookkeeping Service • Collects events associated with job life, e.g. • Job submitted • Resource found • Job started on a CE (Computing Element) • Job finished its computation • Stores them in bookkeeping and logging databases • Provides the job state to end users
LB service architecture • Two APIs • logging API • server API • Local logger service • The database servers
Architecture—Comments • Message format: • ULM based (NetLogger) • Semantic rules prescribed • Local logger service • locallogger daemon • interlogger daemon • local persistency (local disk file) • Data transfer to database servers • Bookkeeping server: persistent during the job life time • Logging server: “eternally” persistent
Logging API • Simple • Just one function dg_log_event() • Always stores date/time, event producer, jobID • Authenticated
Server API • State computed on-demand • Three core functions: • List of user’s jobs • Job status for a given job • List of events related to a given job • Authenticated
Job Identification • GRID-wide (global) identifier • Used to identify the appropriate bookkeeping server • Currently “wired in” • In the future probably via Information service • URL-like syntax: https://hostname:port/unique_string?... • unique_string —to distinguish individual jobs • Bookkeeping server “speaks” https protocol
Security Considerations • Authentication • Both for logging and database queries • Certificate based (user and/or host/service) • User associated with jobID on first authenticated event • Secure channels • Storage (database) access
R-GMA Integration • Work in progress • The goals: • To lower database load • To provide notification service • To allow better integration with other information services
LB Service Extensions • User defined attributes • To store additional information associated with a job • To retrieve job collections • Synchronous API • Job checkpointing (at the application level) • Information stored in Bookkeeping server
Job Partitionning • Group ID • Job collections • Hierarchical • Aggregate queries
Conclusion • LB service provides • Job tracking • Persistent event storage • Job state provision • Future work • (R-)GMA integration • Authorization • Collective operations