180 likes | 317 Views
Keldysh Institute of Applied Mathematics Russian Academy of Sciences. Resource Manager for Grid with global job queue and with planning based on local schedules. V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii, A.V.Orlov, E.V.Huhlaev
E N D
Keldysh Institute of Applied Mathematics Russian Academy of Sciences Resource Manager for Grid with global job queue and with planning based on local schedules V.N.Kovalenko, E.I.Kovalenko, D.A.Koryagin, E.Z.Ljubimskii, A.V.Orlov, E.V.Huhlaev {kvn,kei,koryagin,ljubimsk,ao,huh}@keldysh.ru 1
Job submittinginGlobussystem Job submittingby means of Broker Broker 2
Resource Brokers • GRID Resource Broker (GRB) – HPC lab, University of Lecce, Italy and CACR, California Institute of Technology. http://sara.unile.It/grb/ • EZ-Grid - Department of Computer Science, University of Houston. http: //www.cs.uh.edu/~ ezgrid/ • MetaDispatcher – Keldysh Institute of Applied Mathematics, Moscow 3
Job submittinginGlobussystem Job submittingby means of Broker Broker 4
Problem of scheduling The problem of scheduling is decided on two sets: 1) the set of jobs and 2) the set of computing elements. Scheduling results: • The dispatch time for each job • The place, where the job should be directed and executed 6
Global queue MetaDispatcher job job job job Localqueue LRM Local level Global level Two management levels - local and global, each having own objects: job, queue, and management system - Local Resource Monitor (LRM) and MetaDispatcher. Config. Config. file 7
Price=20 job job Price=10 job job job job Price=1 job job job job job job Question 1: In What Order Should the Global Jobs Be Served? • The order, in which the scheduler serves the job queue, should differ from FIFO. • User should have available the management facilities for placing his job at any position in the global queue. To achieve that: • Limited budget is allocated to each user. • Within the budget limits user prices his jobs. • Function GP evaluates global priority of the job: GP=GP(price, required resources, run time) new job 8
Ifdestination point of a job is determined at the moment, when it comes in to a global queue, and the job is immediately routed to a local queue… new job job itmay be delayed there because of the local job arrival.At the same time resources of other computing elements may become free and idle. job job job job job The conclusion: It is more reasonablly to store global jobs in global queue as long as possible, best of all up to the moment of start. job job job job job Question 2:When Forward a Job to a Target Computing Element? 9
Question 3: To Which Computing Elements a Job Should Be Passed? The scheduling model of computing installation: A set of resources Resource description: Static attributes: (OS type, CPU time, memory volume) Dynamic attributes: free/busy, resource amount 10
Resource Running job Running job Running job Time Resource Release Time Busy resources have an additional attribute – release time estimated from the request of a running job. Being aware of the release time, the scheduler is able to plan the future usage of the busy resource. However the scheduler must have a guarantee, that the planned global job will really start and will not stay waiting in a local queue. 11
Question 4:How should the interaction of the global scheduler and local resource monitor be organized? Question 4:How the Interaction of the Global Scheduler and Local Resource Monitor Should Be Organized? + If two jobs, local and global,ask for free resources, which one should be preferred? Autonomy of computing element: Each computing element of the Grid belongs to a certain owner that could be able to restrict access for external jobs completely or partly. If global and local jobs make demands for the same resources, their priorities are compared. For this purpose each computing element i determines the function LPi() that calculates the local priority of a global job. This function depends on job’s price, consumable resources and run time: LPi = LPi (price, consumable resources, run time) 12
Global queue Resource jobG jobL PG= LP(jobG) Running job PG PG<PL Local queue PL Running job Time Question 4:How the Interaction of the Global Scheduler and Local Resource Monitor Should Be Organized? + The global scheduler should distribute its jobs so that the global jobs would not withhold the start of any more "expensive” local jobs. 13
Resource Running job priority3 priority2 Running job priority1 priority4 Running job Future Time Schedule The local schedule is the plan of resource occupation by local jobs for some period of time in the future. Local schedule: For each local job {priority, assigned resources, occupation and release time} 14
The local schedule is drawn up by the special agents of the global scheduler. Such agents, working on each computing installation, arrange the schedule in precise conformity with scheduling strategy and configuration parameters of the local monitor. The actual state of all local schedules is delivered to the information base of the global scheduler, and, thus, it has available the information about the usage plan of all virtual organization resources. On the basis of this aggregate schedule the scheduler can make up the layout of global jobsallocation to resources. 15
job job Agent Agent Agent job job Queue LRM LRM LRM Program architecture of scheduling Data Base Global queue Scheduler 16
The global scheduler implementing certain scheduling strategy make up the global schedule. • The information base resides adjacently with the scheduler and stores aggregate schedule. For data management the distributed system like Spitfire of Datagrid project with relational data base as a core is considered. • The local agents of the scheduler works on each computing element. Interacting with the local resource monitor, the agent arranges a local schedule of this computing element and transfers updates to the global scheduler. Proposed implementation is based on Maui scheduler. 17
Future directions: • Backfill algorithm implementation at the global level to avoid blocking of the jobs. • Advanced resource reservation for distributed multiprocessor jobs. • Economical model of virtual organization as applied to scheduling. 18