460 likes | 574 Views
Computing Services. Tiziana.Ferrari@cnaf.infn.it INFN – CNAF Corso di Laurea specialistica in Informatica Anno Acc. 2004/2005 Slide sources: Practical approaches to workload management in the EGEE project, M.Sgaravatto, Sep 2004 CE Design Report, L.Zangrando, INN Padova
E N D
Computing Services Tiziana.Ferrari@cnaf.infn.it INFN – CNAF Corso di Laurea specialistica in Informatica Anno Acc. 2004/2005 Slide sources: Practical approaches to workload management in the EGEE project, M.Sgaravatto, Sep 2004 CE Design Report, L.Zangrando, INN Padova GRAM, Globus Toolkit Developer tutorial, The Globus Project Computing Services
Outline • PART I: Computing Element Definition • PART II: Web Services-based CE Architecture (Example 1) • PART III: Web Services-based CE Architecture: GRAM (Example 2) • PART IV: CE Glue Schema v 1.1 • References Computing Services
PART IComputing Element Definition Computing Services
Application Application Internet Protocol Architecture “Sharing single resources”: negotiating access, controlling use” Resource Connectivity Transport Internet Fabric Link Computing Services in the Layered Grid Architecture Grid Architecture Internet Architecture Collective Computing Element Computing Services
Computing resources at the fabric layer • Cluster. A cluster is a container that groups together • Subclusters: Subcluster elements represent “homogeneous” collections of computational nodes; • Nodes: unique nodes, such as head nodes, or individual computing nodes. A cluster may be referenced by more then one computing services at the “Resource” layer. • SubCluster. A subcluster represents a “homogeneous” collection of nodes, where the homogeneity is defined by a collection whose required node attributes all have the same value. For example, a subcluster represents a set of nodes with the same CPU, memory, OS, network interfaces, etc. Strictly speaking, subclusters are not necessary, but they provide a convenient way of representing useful collections of nodes. A subcluster captures a node count and the set of attributes for which homogeneous values are being asserted. • Host. Represents a physical computing element. This element characterizes the physical configuration of a computing node:processors, software, storage elements, etc. Computing Services
Computing resources at the fabric layer: hierarchy Resource Fabric Computing Services
Computing Element (1/2) • The Computing Element (CE) is the service representing a computing resource and comprises a set of functionalities related to computing. • Functionalities: • job management (job submission, job control, etc.), • provision of information about the resource characteristics and status, • resource reservation enforcement, • resource reservation usage monitoring, etc. Computing Services
Computing Element (2/2) • Different architecture and implementation examples of the CE are: the DataGrid CE, the Alien CE and the Globus GRAM. • A CE refers to a set, or cluster of computational resources, managed by a Local Resource Management Systems (LRMS). This cluster can encompass resources that are heterogeneous in their hardware and software configuration. • When a CE encompasses heterogeneous resources, it is not sufficient to let the underlying LRMS dispatch jobs to any worker nodes. Instead, when a job has been submitted to a CE, the underlying resource management system must be instructed to submit the job to a resource matching the requirements specified by the user. • The interface with the underlying LRMS must be very well specified (possibly according to existing standards), to ease the integration of new resource management systems (even by third party entities) as needed. The definition and provision of common interfaces to different resource management systems is still an open issue, but there are proposed recommendations currently under discussion (such as the Distributed Resource Management Architecture API, DRMAA, currently discussed within the Global Grid Forum). Computing Services
Job types • Sequential, batch jobs • Parallel (MPI) jobs • Checkpointable jobs • Interactive jobs • DAG jobs (set of jobs with inter-dependencies modeled with Directed Cycle-Free Graphs) • DAG whose nodes have to be planned and executed within the CE • Partitionable jobs • Jobs to be partitioned within the CE Computing Services
Push vs pull model A given CE can work both in push as pull mode. • PUSH: the job is pushed to a CE for its execution. When a job is pushed to a CE, it gets accepted only if there are resources matching the requirements specified by the user, and which are usable according to the local policies set by the local administrator. The jobs gets then dispatched to a worker node matching all these constraints. • PULL: the CE is asking the Workload Management Service for jobs. When a CE is willing to receive a job (according to policies specified by the local administrator, e.g. when the CE local queue is empty or it is getting empty), it requests a job from a known Workload Management Service. This notification request must include the characteristics and the policies applied to the available resources, so that this information can be used by the Workload Management Service to select a suitable job to be executed on the considered resource. Computing Services
Pull model: getting a job from the Workload Manager Service • Approach 1: the CE requests a job from all known Workload Management Services. If two or more Workload Management Services offer a job, only the first one to arrive is accepted by the CE, while the others are refused. • Approach 2: the CE requests a job from just one Workload Management Service. The CE then gets ready to accept a job from this Workload Management Service. If the contacted Workload Management Service has no job to offer within a certain time frame, another Workload Management Service is notified. Such a mechanism would allow supporting priorities on resource usage: a CE belonging to a certain VO would contact first a Workload Management Service referring to that VO, and only if it does not have jobs to be executed, the Workload Management Services of other VOs are notified, according to policies defined by the owner of the resource. Computing Services
1. Job Management Job Management is the main functionality provided by the CE. It allows to: • run jobs (which includes also the staging of all the required files). Characteristics and requirements of jobs that must be executed are specified by using a given language, for example the Job Description Language (JDL) (which is also used within the whole Workload Management System); • get an assessment of the foreseen quality of service for a given job to be submitted: • existence of resources matching the requirements and available according to the local policies • local queue traversal time (the time elapsed since the job entered the queue of the LRMS until it starts execution). • cancel previously submitted jobs; • suspend / resume jobs, if the LRMS allows these operations; • send signals to jobs. • get the status of some specified jobs, or of all the active jobs ``belonging'' to the user issuing the request; • be notified on job status, for example when a job changes its status or when a certain status is reached. Computing Services
2. Information Provisioning A CE must also provide information describing itself. • In the push model this information is published in the information Service, and it is used during resource discovery (through the match-making engine in the workload manager) which matches available resources to queued jobs. • In the pull model the CE information is embedded in the ``CE availability'‘ message, which is sent by the CE to a Workload Management Service. The matchmaker then uses this information to find a suitable job for the CE. • The information that each CE should provide will include: • the characteristics of the CE (e.g. the types and numbers of existing resources, their hardware and software configurations, etc.); • the status of the CE (e.g. the number of in use and available resources, the number of running and pending jobs, etc.); • the policies enforced on the CE resources (e.g. the list of users and/or VOs authorized to run jobs on the resources of the CE, etc.). • resource usage: must measure user activities on the CE resources, providing resource usage information. This information, after having been properly translated in an appropriate format, has to be forwarded to the Grid Accounting System. Computing Services
PART IIa Web Services-based CE Architecture(EGEE Project, Example 1) Computing Services
Access to the CE • From an implementation point of view the CE, exposing a Web Service interface, may be used by: • a generic client: an end-user interacting directly with the Computing Element, or • the Workload Manager System, which submits a given job to an appropriate CE found by a matchmaking process. Computing Services
CE Architecture (1/2) Client JobSubmit JobAssess JobKill JobSuspend JobResume JobGetStatus WEB WEB CE Mon Web service accepting job management requests LSF PBS ? Worker Nodes Computing Services
CE Architecture (2/2) Client Notifications Job requests WEB WEB CE Mon 1. Asynchronous. notifications about job/CE events 2. Job requests (for CE working in pull mode) LSF PBS ? Worker Nodes Computing Services
Components (1/2) • CE: it exposes a Web Service interface: • it represents the entry point for submitting jobs to the resources of the CE. • the CE includes the functionality of a Site Gatekeeper and is responsible for the mapping between Grid users and local users. • It checks if the job can be accepted according to the configuration options that could have been set to limit the load caused by job processing. • After having checked that the considered job can be executed in the CE (that is there are resources matching the constraints specified in the job JDL expression and which can be used according to the local policies), the CE updates theUser Context (UC),a data structure holding information about the user and the active jobs she/he owns on the CE, accordingly. • Job Controller: then the job is forwarded to the Job Controller (JC), which is in charge of submitting the job to the underlying LRMS and controlling its execution. • The client interacts with the JC (also exposing a Web Service interface) to control jobs: to get their status, to suspend them, to kill them, etc. • The JC relies on the information stored in the user's UC when serving a request on that user's job. Computing Services
Components (2/2) • The CE Monitoring service deals with notifications. It can be customized in particular to: • asynchronously notify users on job status events, according to policies specified by users (e.g.when a job changes its status, when a job reaches a certain status, etc.). The jobs to be monitored and the typeof notifications to support are stored in the User Contexts of the job user; • notify about the CE characteristics and status. • For a CE working in pull mode, this service is also used to request jobs to the Workload Management Service. Computing Services
A client could: 1) ask the CE whether a job could be executed and what is the expected QoS (e.g. ETT) 2) submit a job The CE matches the job req. against the resources available and computes the expected QoS WN UC CE Architecture in detail (1/2) Client JDL jobAssess jobSubmit QoS WEB WEB CE JC JM UC getWN insertWN deleteWN updateWN getUC createUC deleteUC updateUC DRMAA interface getUC updateUC LSF PBS ? WNs Local information base Computing Services
CE Architecture in detail (2/2) UC job WN UC Client JDL jobKill jobSuspend jobResume jobGetStatus jobGetOutput jobSignal jobMonitorSub jobAssess jobSubmit notify JC URL Job status WEB WEB CE submit JC JM JDL getWN insertWN deleteWN updateWN getUC createUC deleteUC updateUC DRMAA?? getUC updateUC LSF PBS ? WNs Computing Services
UML sequence diagram: jobSubmit Computing Services
UML sequence diagram: jobDelete Computing Services
UML sequence diagram: jobStatus Computing Services
API specification • jobAssess • jobSubmit • jobSuspend / jobResume • jobList • jobKill • jobGetStatus / jobGetAllStatus • jobGetOutput • jobMonitorSub • jobSignal Computing Services
API specification jobAssess • Description: Checks whether the job specified in the JDL could be run in the CE. It matches the job requirements against the available resources. If the job is effectively runnable on the worker nodes of the CE, it provides an estimation of the exptected QoS (e.g. waiting time in the local queue before the job can be runned). jobSubmit • Description: Submit the job specified in the JDL to the CE. Computing Services
API specification jobSuspend • Description: Allows to suspend the execution of the specified job(s) or to hold the job(s) in the local queue. jobResume • Description: Allows to resume the execution of the specified job(s) or to release the job(s) in the local queue. jobKill • Description: Allows to kill one or more jobs. jobList • Description: Retrieves the list of the jobIDs submitted by the user. Computing Services
API specification jobGetOutput • Description: Allows the user to retrieve the final results of the execution of the specified job(s). jobGetStatus • Description: Retrieves the status of the specified job(s). jobSignal • Description: sends a signal to the specified job(s). jobMonitorSub • Description: Allows the user to subscribe to the asyncronous notification system (JM) of the CE (e.g. To be notified about job status chenges) Computing Services
PART IV a Web Services-based CE Architecturethe GRAM(Globus, Example 2) Computing Services
Grid Resource Allocation and Management (GRAM) • In this case the Resource Specification Language (RSL) is used to communicate requirements instead of the JDL (same purpose, but different syntax). • The Grid Resource Allocation and Management (GRAM) API allows programs to be started on remote resources, despite local heterogeneity. Computing Services
GRAM Components MDS client API calls to locate resources Client Grid Index Info Server (MDS*) Site boundary MDS client API calls to get resource info GRAM client API calls to request resource allocation and process creation. Grid Resource Info Server (MDS) Query current status of resource GRAM client API state change callbacks Globus Security Infrastructure Local Resource Manager Allocate & create processes Request Job Manager Create Gatekeeper Process Parse Monitor & control Process RSL Library Process * MDS: Metacomputing Directory Service Computing Services
Web Service GRAM Architecture v. 3.2 (1/7) CE Status information and static data Authentication and authorization job Computing Services
Web Service GRAM Architecture v. 3.2 (2/7) • The Master is configured to use the Redirector to redirect calls to it and use the Starter UHE module when there is not a running UHE for the user. A createService call on the Master uses the Redirector to invoke the Starter UHE module and start up a UHE. • The Master publishes its handle to a remote registry (optional) • A client submits a createService request which is received by the Redirector • The Redirector calls the Starter UHE class which authorizes the request via the grid-mapfile to determine the local username and port to be used and constructs a target URL • The Redirector attempts to forward the call to the said target URL. If it is unable to forward the call because the UHE is not up, the Launch UHE module is invoked • The Launch UHE creates a new UHE process under the authenticated user's local uid • The Starter UHE waits for the UHE to be started up (ping loop) and returns the target URL to the Redirector • The Redirector forwards the createService call to the MJFS unmodified and mutual authentication/authorization can take place • MJFS creates a new Managed Job Service (MJS) • MJS submits the job into a back-end scheduling system • Subsequent calls to the MJS from the client will be redirected through the Redirector • RIPS providing data to the MJS instances and Master. It gathers data from the local scheduling system, file system, host info, ... • FindServiceData requests to the Master will result in either an SDE returned (populated by the Service Data Aggregate) or redirected to the MJFS of the requestor's UHE • In order to stream stdout/stderr back to the client, the MJS creates 2 File Stream Factory Services (FSFS), one for stdout and one for stderr • The MJS then creates the File Stream Services (FSS) instances as specified in the job request • The grim handler is run in the UHE to create a user host certificate. The user host certificate is used for mutual authentication between the MJS service and the client. Computing Services
Web Service GRAM Architecture (3/7) java class is used by the Redirector to resolve the incoming calls to a UHE. The gridmap file is used to obtain the username corresponding to a particular subject DN and one UHE is run per user on a machine. Mapping from username to port number of the UHE is maintained in a configuration file. When a request to resolve a URL comes in and an entry is found in the configuration file, the target URL is constructed and returned to the Redirector. If an entry does not exist in the configuration file, a free port number is chosen. It accepts all incoming soap Messages and redirects them to the User Host Environment Computing Services
Web Service GRAM Architecture (4/7) The Master Managed Job Factory Service is responsible for exposing the virtual GRAM service to the outside world. It configures the Redirector to direct createService calls sent to it through the Startup UHE, and launch UHE in order to eventually end up unmodified to the MJFS. The Redirector is instructed to redirect subsequent createService calls sent to it to a user's hosting environment. The Master uses the Service Data Aggregator to collect and populate local Service Data Elements which represent local scheduler data (e.g. freenodes, totalnodes) and general host information (e.g. host cpu type, host OS). If the request is for any known MJFS SDE, then is it redirected to he MJFS of the UHE. All other queries are handled locally. Computing Services
Web Service GRAM Architecture (5/7) The Managed Job Factory Service is responsible for instantiating a new MJS when it receives a CreateService request. The MJFS stays up for the life of the UHE. Service that given a job request specification can submit a job to a local scheduler, monitor its status and send notification. The MJS will start two File Streaming Factory Service (FSFS), one for the job's stdout and one for the job's stderr. The MJS starts the initial set of FSS instances as specified in the job specification. The FSFS's Grid Service Handles (GSH) are available in the description of the MJS, which will enable the client to start additional FSS instances of stdout/err or terminate existing FSS instances. Computing Services
Web Service GRAM Architecture (6/7) The File Stream Factory Service is responsible for instantiating a new File Stream Service instances when it receives a CreateService request. It exposes two properties: the path to the local file being streamed and the the current size of the file. given a destination URL will stream from the local file (stdout or stderr) to the destination URL. It exposes two properties: The URL of the stream destination and a done flag indicating that the streaming of the file has been completed. Computing Services
Web Service GRAM Architecture (7/7) The File Stream Factory Service is responsible for instantiating a new File Stream Service instances when it receives a CreateService request. It exposes two properties: the path to the local file being streamed and the the current size of the file. given a destination URL will stream from the local file (stdout or stderr) to the destination URL. It exposes two properties: The URL of the stream destination and a done flag indicating that the streaming of the file has been completed. Computing Services
PART IVGLUE CE Schema v.1.1 Computing Services
Computing Element • The CE service needs an abstract representation in order to save information about CE service instances in a Grid Information Service and perform discovery. • The schema expresses an abstraction (for example, the CE properties and fnctionality) in a structured, machine-processable form. • The “GLUE” CE schema contains a minimum but necessary set of qualifying attributes needed to distinguish different service instances and to perform discovery. • Typically some of the attributes are static, or rarely change, while other attributes, for example the ones about the status of the CE, are dynamic. • The schema described in the following slides, is based on the following CE abstraction: an entry point into a queuing system. • There is one computing element per queue. • Queuing systems with multiple queues are represented by creating one computing element per queue. (*) • The information associated with a computing element is limited only to information relevant to the queue. • All information about the physical resources access by a queue are represented by the Cluster information element. • (*) Note: in the CE implementation architecture described before in this presentation, a CE can be the entry point of multiple heterogeneous queues. Computing Services
GLUE Schema • Attributes are grouped together to form named objects. • There are two types of objects in the information model: • Structural objects (computing elements, cluster, sub-cluster, nodes nodes and hosts) are containers for other objects. • Auxiliary objects include the attributes that carry the actual information. • Objects which are parts of containers can be of different types: • required, • advised, • containing optional auxiliary objects. Computing Services
CE Schema Structure (1/3) • The Computing Element is a container and can include the following objects: • Info (required): • UniqueID: unique identifier for the computing element. Example: CE-hn:CE-port/jobmanager-CE-lrms-CE-queue • InformationServiceURL: URL of the local information service providing for info about this entity. • Name: a name for this service • State (optional): • LRMSType: Name of local resource management system • LRMSVersion: Version of local resource manager • GRAMVersion: the GRAM version • HostName: fully qualified host name for host on which the gatekeeper (entry point to the CE) that corresponds to the computing element runs. • GatekeeperPort: Port number for the gatekeeper. • TotalCPUsNum: Number of CPUs available to the queue. NB: this number should not be used to total available resources as more then one queue may be pointed to the same physical resources Computing Services
CE Schema Structure (2/3) • Policy (optional) : • MaxWallClockTime: the maximum wall clock time allowed for jobs submitted to the CE in mins (0=not specified) • MaxCPUTime: the maximum CPU time allowed for jobs submitted to the CE in mins (0=not specified) • MaxTotalJobs: the maximum allowed number of jobs in the CE (0=not specified) • MaxRunningJobs: the maximum number of jobs allowed to be running (0=not specified) • Priority: info about the Queue Priority • State (optional) : • RunningJobs: Number of currently running jobs • TotalJobs: number of jobs in the CE (=RunningJobs+WaitingJobs) • Status: queue status which can be • 1. Queueing: the queue can accept job submission, but can’t be served by the scheduler • 2. Production: the queue can accept job submissions and is served by a scheduler • 3. Closed: The queue can’t accept job submission and can’t be served by a scheduler • 4. Draining: the queue can’t accept job submission, but can be served by a scheduler • WaitingJobs: number of jobs that are in a state different than running • WorstResponseTime: Worst time between job submission till when job starts its execution in sec • EstimatedResponseTime: Estimated time between job submission till when job starts its execution in sec • FreeCPUs: Number of free CPUs available to a scheduler (generally used with Condor) Computing Services
CE Schema Structure (3/3) • Job (optional): • LocalOwner: Owner local username • GlobalOwner: Owner GSI subject name • LocalID: Job local id • GlobalID: Job global id • Status: Job status {SUBMITTED, WAITING, READY, SCHEDULED, RUNNING, ABORTED, DONE, CLEARED, CHECKPOINTED} • SchedulerSpecific: Scheduler specific info • AccessControlBase (optional): • Rule: A rule that grants/denies access to the Computing Element service, specific semantic needs to be defined (e.g. list of X509 user certificate subjects, VO names). Computing Services
References • Glue Computing Element Schema; version 1.1, March 2003, (http://www.cnaf.infn.it/~sergio/datatag/glue/v11/CE/index.html). • EGEE Middleware Architecture (Release 1); Deliverable DJRA1.1, pag. 32-35 (http://edms.cern.ch/document/476451). • GRAM, Globus Toolkit Developer tutorial, The Globus Project, 2004 () • WS GRAM v 3.2: Developer's Guide, The Globus Project, 2004. Computing Services