250 likes | 368 Views
UI Interactions and Interfaces with the Workload Manager Components DataGrid WP1. F. Pacini fpacini@datamat.it. Summary. Interactions between UI and RB Interactions between UI and M&C Other Points. Interactions between UI and RB (1/10).
E N D
UI Interactions and Interfaceswith the Workload Manager ComponentsDataGrid WP1 F. Pacini fpacini@datamat.it
Summary • Interactions between UI and RB • Interactions between UI and M&C • Other Points
Interactions between UI and RB (1/10) • The Job Submission UI contacts the RB when the following commands are issued by the user: • dg-job-submit • dg-list-job-match • dg-job-cancel • Communication is performed via socket (TCP/IP) • An agreement on the Communication protocol is needed
Interactions between UI and RB (2/10) • dg-job-submit dg-job-submit <jdl_file> [-resourceres_id] [-notifye_mail_address] • the information flow from the UI to the RB consists of a job class-ad built from the job description file plus a variable indicating the request type. The job class-ad object consists in a list of entries “attribute = expression”. The following attributes are always present in the job class-ad (jobAd): • UserID • CertificateSubject • ExecutableName • Input • Output • Constraints
Interactions between UI and RB (3/10) • If the dg-job-submit has been issued with the “-resource” option, then the job-ad contains the attribute: • ResourceID = res_id and the RB shall submit the job to the resource identified by “res_id” without going through the match-making process. • If the dg-job-submit has been issued with the “-notify” option, then the job class-ad contains the attribute: • UserContact = e_mail_address and the RB shall send an e-mail notification to e_mail_address at each job status transition. Content is the same as output from dg-job-status command (TBD).
Interactions between UI and RB (4/10) • The variable indicating the request type, say requestType, is an enumerative with the following values: • JOB_SUBMIT • LIST_MATCH and is passed in this case with the JOB_SUBMIT value. • Summarising the UI passes to the RB a structure with the two following fields: • requestType • jobAd
Interactions between UI and RB (5/10) • The expected response from the RB consists of: • ReturnCode, a numeric code indicating the operation result • ReturnMessage, a string describing the operation result • dg_jobId, a string identifying the submitted job
Interactions between UI and RB (6/10) • dg-list-job-match dg-list-job-match <jdl_file> • the information flow from the UI to the RB consists of a job class-ad built from the job description file plus a variable indicating the request type. The job class-ad object consists in a list of entries “attribute = expression”. The following attributes are always present in the job class-ad (jobAd): • UserID • CertificateSubject • ExecutableName • Input • Output • Constraints
Interactions between UI and RB (7/10) • The variable indicating the request type, requestType, is is passed in this case with the LIST_MATCH value. • The jobAd associated with this requestType will never contain neither the ResourceID nor the UserContact attributes. • In this case the RB does not submit the job but only searches for resources compatible with the input jobAd.
Interactions between UI and RB (8/10) • The expected response from the RB consists of: • ReturnCode, a numeric code indicating the operation result (0 stays for success) • ReturnMessage, a string describing the operation result • ResourceList, a list of ResourceIds, i.e. strings identifying the resources matching with the jobAd.
Interactions between UI and RB (9/10) • dg-job-cancel dg-job-cancel <jobID1……..jobIDn | -all > • the information flow from the UI to the RB consists of • userID, a string representing the user identifier • certSubject, a string containing the user certificate subject • dg_jobIdList, a list of string representing the identifiers of jobs to be canceled as specified by the user • if the dg-job-cancel command has been issued with the “-all” input parameter, then the dg_jobIdList passed to the RB will be empty indicating that all jobs submitted by the user identified by userID have to be canceled.
Interactions between UI and RB (10/10) • The expected response from the RB consists of: • ReturnCode, a numeric code indicating the operation result • ReturnMessage, a string describing the operation result • dg_jobIdList, a list of dg_jobIds, i.e. strings identifying the jobs effectively deleted
Interactions between UI and M&C (1/11) • The Job Submission UI contacts the M&C when the following commands are issued by the user: • dg-job-status • dg-get-logging-info • to serve this request the UI uses the bookkeeping and logging APIs made available by the M&C component (see the Cesnet L&B Service Document) • APIs implementation shall encompass network communication
Interactions between UI and M&C (2/11) • dg-job-status dg-job-status <jobID1……..jobIDn > | -all > [full] • The UI will use the provided L&B server API with the following input information: • UserId, a string representing the user identifier • dg_jobIdList, a list of dg_jobIds • InformationLevel, indicating the required information level (SHORT/FULL) according to the command option “-full” • If the dg-job-status command has been issued with the “-all” input parameter, then the dg_jobIdList will be empty indicating that status information about all jobs submitted by the user identified by userID are requested.
Interactions between UI and M&C (3/11) • Returned information should encompass: • ReturnCode, a numeric code indicating the operation result • ReturnMessage, a string describing the operation result • JobsStatusInfo, consisting of (TBD): • userID • dg_jobID • jobStatus • ResourceID • Executable • input • output • submissionTime (when the job has been submitted from the UI) • scheduledTime (when the job has been submitted to the resource) • startRunningTime (when the job has started its execution) • StopRunningTime (when the job has completed its execution) (if the InformationLevel is SHORT).
Interactions between UI and M&C (4/11) • JobsStatusInfo, consisting of (TBD): • userID • dg_jobID • jobStatus • ResourceID • ResourceName • Executable • input • output • submissionTime • scheduledTime • startRunningTime • StopRunningTime • CpuTime • Rank • Constraint • ResourceManagementType • ResourceManagementVersion • Gramversion • Architecture • OpSys • traversalTime • TotalCpus • FreeCpus • RunningJobs • IdleJobs • MaxTotalJobs • MaxRunningJobs • Status (if the InformationLevel is FULL)
Interactions between UI and M&C (5/11) • Summarising with the dg-job-statuscommand we have:
Interactions between UI and M&C (6/11) • dg-get-logging-info dg-get-logging-info <jobID1……..jobIDn > | -all >[-from T1] [-to T2] [-full] • The UI will use the provided L&B server API with the following input information: • UserId, a string representing the user identifier • dg_jobIdList, a list of dg_jobIds • fromTime, timestamp • toTime, timestamp • InformationLevel, indicating the required information level (SHORT/FULL) according to the command option “-full” • If the dg-get-logging-info command has been issued with the “-all” input parameter, then the dg_jobIdList will be empty.
Interactions between UI and M&C (7/11) • Returned information should encompass: • ReturnCode, a numeric code indicating the operation result • ReturnMessage, a string describing the operation result • JobsLogInfo according to the requested InformationLevel. • Summarising with the dg-get-logging-infocommand we have:
userID jobID jobStatus submissionTime scheduledTime startTime finishTime executable executableSize Interactions between UI and M&C (8/11) • JobLogInfo has to be defined: • input data LFN • output data LFN • pendingReasons • Constraint • Rank • NumCpus • CpuTime • swapSpace • totalI/O • totalDataSpace • WallClockTime
Interactions between UI and M&C (9/11) • ResourceID • ResourceName • ResourceManagementType • ResourceManagementVersion • Gramversion • executingHost • Architecture • OpSys • traversalTime • TotalCpus • MaxTotalJobs • MaxRunningJobs • ResourceStatus • RunWindows • ResourcePriority • MaxCpuTime • MaxWallTime • networkReq • fromTime • toTime • RunWindows • queuePriority • MaxCpuTime • MaxWallTime
Interactions between UI and M&C (10/11) • Some points from the L&B service document (pages 2,3,5): • SUBMITTED status: does it mean that the job is still in the UI? • The UI does not know the dg_jobId before the job is in the RB. How can use the logging service? • CHKPT status: is checkpoint supported for PM9? • CLEARED status: who triggers transition 8? Who is going to log the JobClearedEvent? • GridScheduler, Condor-G and Globus job-manager are expected to log: the UI too? • Modification to Globus job-manager for logging: which effort? Who does it? • JobDoneEvent: does Globus provides the job exit status?
Interactions between UI and M&C (11/11) • Some points from the L&B service document (pages 10,11,13): • L&B API seems not to provide a way to select info by time • Is it foreseen for PM9? • Shall the UI filter by time locally? • jobLog(): “level” vs “InformationLevel” • jobStatus(): “appropriate details” are already defined? • Encapsulation of network communication in the L&B server API is needed • JobSubmitEvent: source of this log event is the GUI. Again jobID is needed but the UI does not know it.
Other Points (1/2) • The information flow between the RB and the SS (the Condor-G wrapper) consists of class-ad objects • The RB maintain a persistent queue of jobs submitted by the users through the UI. Jobs are described by jobAd objects and identified by the dg_jobIds. • Once a suitable resource for a job has been found by the RB through the match making process, the jobAd is enriched with the found ResourceId and passed to the SS for submission • Which is the information returned to teh RB? • a job handle (the condor_jobId) and the jobStatus (condor_rm command)?
Other Points (2/2) • How to detect job status transitions? Who does it? • Inspection of the Condor-G log files • globus_gram_client_job_status function • Other ways?