440 likes | 596 Views
Workload Management. David Colling Imperial College London. Release 2 is not based on release 1 Whole new architecture (pretty much described in D1.4) More modular I have little practical experience of this new architecture (yet). . So what is the new architecture?.
E N D
Workload Management David Colling Imperial College London
Release 2 is not based on release 1 • Whole new architecture (pretty much described in D1.4) • More modular • I have little practical experience of this new architecture (yet).
So what is the new architecture? See D1.4 for details…
The architecture • User Interface: • Although there have been several changes to the architecture, the commands available at the user end are (almost) the same… now • edg-job-submit etc • Also now apis • Network Server • The Network Server is a generic network daemon, responsible for accepting incoming requests from the UI (e.g. job submission, job removal), which, if valid, are then passed to the Workload Manager.
The architecture • Workload manager: • The Workload Manager is the core component of the Workload Management System. Given a valid request, it has to take the appropriate actions to satisfy it. • To do so, it may need support from other components, which are specific to the different request types.
The architecture • Resource Broker: • This has been turned into one of the modules that help the workload manager, actually 3 sub-modules… • Matchmaking • Ranking • Scheduling • Job Adapter • The Job Adapter put the finishing touches to the job’s jdl and creates the job wrapper.
The architecture • Job Controller and CondoG • Actually submit the job to the resources and track progress. So how does this all work…
UI Job submission example (for a “simple” job) RB node Replica Catalog Network Server Workload Manager Inform. Service Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputData = "LF:testbed0-00019"; ReplicaCatalog = "ldap://sunlab2g.cnaf.infn.it:2010/rc=WP2 INFN Test Replica Catalog,dc=sunlab2g, dc=cnaf, dc=infn, dc=it"; DataAccessProtocol = "gridftp"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 6.2“ && other.GlueCEPolicyMaxWallClockTime > 10000; Rank = other.GlueCEStateFreeCPUs; Job Status UI RB node Job submission submitted Replica Catalog Network Server Workload Manager Inform. Service Job Description Language (JDL) to specify job characteristics and requirements UI: allows users to access the functionalities of the WMS Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
NS: network daemon responsible for accepting incoming requests submitted waiting UI Job submission RB node Job Status Replica Catalog Network Server Job Input Sandbox files Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI Job submission RB node Job Status Replica Catalog Network Server Job Workload Manager Inform. Service RB storage WM: responsible to take the appropriate actions to satisfy the request Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI Job submission RB node Job Status Replica Catalog Network Server Match- maker Workload Manager Inform. Service RB storage Where does this job must be executed ? Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI Job submission RB node Job Status Replica Catalog Network Server Matchmaker: responsible to find the “best” CE where to submit a job Match- Maker/ Broker Workload Manager Inform. Service RB storage Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI Job submission RB node Job Status Where are (which SEs) the needed data ? Replica Catalog Network Server Match- Maker/ Broker Workload Manager Inform. Service RB storage What is the status of the Grid ? Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI Job submission RB node Job Status Replica Catalog Network Server Match- maker Workload Manager Inform. Service RB storage CE choice Job Contr. - CondorG CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI Job submission RB node Job Status Replica Catalog Network Server Workload Manager Inform. Service RB storage Job Adapter Job Contr. - CondorG CE characts & status JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.) SE characts & status Computing Element Storage Element
submitted waiting UI ready Job submission RB node Job Status Replica Catalog Network Server Workload Manager Inform. Service RB storage Job Job Contr. - CondorG JC: responsible for the actual job management operations (done via CondorG) CE characts & status SE characts & status Computing Element Storage Element
submitted waiting UI ready scheduled Job submission RB node Job Status Replica Catalog Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox files CE characts & status SE characts & status Job Computing Element Storage Element
submitted waiting UI ready scheduled running Job Job submission RB node Job Status Replica Catalog Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Input Sandbox “Grid enabled” data transfers/ accesses Computing Element Storage Element
submitted waiting UI ready scheduled running done Job submission RB node Job Status Replica Catalog Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Output Sandbox files Computing Element Storage Element
Job submission submitted waiting UI ready scheduled running done RB node Job Status edg-job-get-output <dg-job-id> Replica Catalog Network Server Workload Manager Inform. Service RB storage Job Contr. - CondorG Output Sandbox Computing Element Storage Element
UI Job submission RB node Job Status submitted Replica Catalog Network Server waiting ready Output Sandbox files Workload Manager Inform. Service RB storage scheduled Job Contr. - CondorG running done cleared Computing Element Storage Element
RB node Network Server UI Workload Manager Job Contr. - CondorG Computing Element Logging and bookkeeping. edg-job-status <dg-job-id> LB: receives and stores job events; processes corresponding job status Job status Logging & Bookkeeping Log Monitor Log of job events LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB
New functionality… • Release 2 of WP 1 software • New functionality includes: • MPI job submission • User APIs • Accounting infrastructure (Management have decided not to deploy this for testbed 2) • Interactive job support • Job logical checkpointing
New functionality… All these are implemented… Specify which sort of job using the JobType classad e.g. JobType = “Checkpointable” However only tested on the WP 1 testbed as yet… Don’t have time to go through all of these so will just will just go through checkpointing.
UI Computing Element X Computing Element Y Job checkpointing scenario RB node Network Server Workload Manager Logging & Bookkeeping Server Job Contr. - CondorG
edg-job-submit jobchkpt.jdl jobchkpt.jdl [JobType = “Checkpointable”; Executable = "hsum.exe"; StdOutput = Outfile; InputSandbox = "/home/user/hsum.exe”, OutputSandbox = “Outfile”, Requirements = member("ROOT", other.GlueHostApplicationSoftwareRunTimeEnvironment) && member("CHKPT", other.GlueHostApplicationSoftwareRunTimeEnvironment); Rank = -other.GlueCEStateEstimatedResponseTime;] Job Status UI Computing Element X Computing Element Y RB node submitted Replica Catalog Network Server Workload Manager Logging & Bookkeeping Server Job Description Language (JDL) to specify job characteristics and requirements UI: allows users to access the functionalities of the WMS Job Contr. - CondorG
submitted waiting UI ready scheduled running Computing Element X Computing Element Y Job RB node Job Status Network Server 1 Job Match- maker 1 Job 2 3 Input Sandbox files Workload Manager Logging & Bookkeeping Server RB storage 4 Job Adapter 5 Job Job Contr. - CondorG 6 Input Sandbox files 6 Job
submitted waiting UI ready scheduled running Computing Element X Computing Element Y Job RB node Job Status Network Server Workload Manager Logging & Bookkeeping Server RB storage Job Contr. - CondorG … <save intermediate files>; State.saveValue(“var1”, value1>; … State.saveValue(“varn”, valuen); State.saveState(); … From time to time user’s job asks to save the intermediate state
submitted waiting UI ready scheduled running Computing Element X Computing Element Y Job RB node Job Status Network Server Workload Manager Logging & Bookkeeping Server RB storage Job Contr. - CondorG Saving of intermediate files Saving of job state
submitted waiting UI ready scheduled running done (failed) Job RB node Job Status Network Server Workload Manager Logging & Bookkeeping Server RB storage Job Contr. - CondorG Job fails (e.g. for a CE problem) Computing Element X Computing Element Y
submitted waiting UI ready scheduled running done (failed) waiting Computing Element X Computing Element Y Job RB node Job Status Network Server Match- maker Workload Manager Logging & Bookkeeping Server RB storage Where must this job be executed ? Possibly on a different CE where the job was previously submitted … Reschedule and resubmit job Job Contr. - CondorG Job
submitted waiting UI ready scheduled running done (failed) waiting Computing Element X Computing Element Y RB node Job Status Network Server Match- maker Workload Manager Logging & Bookkeeping Server RB storage CE choice: CEy Job Contr. - CondorG
ready UI scheduled running done (failed) waiting ready Computing Element X Computing Element Y RB node Job Status Network Server Workload Manager Logging & Bookkeeping Server RB storage Job Adapter Job Job Contr. - CondorG CE characts & status
ready UI scheduled running done (failed) waiting ready scheduled Computing Element X Computing Element Y Job RB node Job Status Network Server Workload Manager Logging & Bookkeeping Server RB storage Job Contr. - CondorG Input Sandbox files Job
running running UI Computing Element X Computing Element Y Job RB node Job Status scheduled Network Server Workload Manager Logging & Bookkeeping Server done (failed) RB storage waiting Retrieval of last saved state when job starts Job Contr. - CondorG ready Retrieval of intermediate files (previously saved) scheduled
running running UI Computing Element X Computing Element Y Job RB node Job Status scheduled Network Server Workload Manager Logging & Bookkeeping Server done (failed) RB storage waiting Job Contr. - CondorG Job keeps running starting from the point corresponding to the retrieved state (doesn’t need to start from the beginning) ready scheduled Job
Further additional functionality • The order of implementation is not up to WP 1 people… • Dependent jobs: • Using Condor DAGMan • For example…
Further additional functionality A = [ Executable = "A.sh"; PreScript = "PreA.sh"; PreScriptArguments = { "1" }; Children = { "B", "C" } ]; B = [ Executable = "B.sh"; PostScript = "PostA.sh"; PostScriptArguments = { "$RETURN" }; Children = { "D" } ]; C = [ Executable = "C.sh"; Children = { "D" } ]; D = [ Executable = "D.sh"; PreScript = "PreD.sh"; PostScript = "PostD.sh"; PostScriptArguments = { "1", "a" } ]
Further additional functionality Job partitioning will be similar to checkpointing, with the jobs being partitioned according to some variable. Partitioned jobs will also have a pre-job and aggregator e.g.
Further additional functionality JobType = Partitionable; Executable = ...; JobSteps = ...; StepWeight = ...; Requirements = ...; ... ... Prejob = [ Executable = ... Requirements = ...; ... ... Aggregator = [ Executable = ... Requirements = ...; ... ... ];
Further additional functionality Also planned is advanced reservation of resources and co-location. Much more monitoring and performance quantification…
Summary • New architecture has been implemented • Lots of new functionality … but not stress tested • Further functionality and performance quantification implemented by testbed 3.
Further into the future… EDG will not use OGSA, however the future is in the OGSA grid world. Work is being done at LeSC (See Steven Newhouse’s talk tomorrow) to wrap the WP 1 components. Communication via JDML and LBML Virtualisation of RB through OGSA factory Use virtualisation to load balance Increase interoperability