Job Scheduling for Grid Computing on Metacomputers

Job Scheduling for Grid Computing on Metacomputers Keqin Li Proceedings of the 19th IEEE International Parallel and Distributed Procession Symposium (IPDPS’05)

Outline • Introduction • The Scheduling Model • A Communication Cost Model • Scheduling Algorithms • Worst-Case Performance Analysis • Experimental Data

Introduction 1 • A metacomputer is a network of computational resources linked by software in such a way that they can be used as easily as a single computer. • A metacomputer is able to support distributed supercomputing applications by combining multiple high-speed high-capacity resources on a computational grid into a single, virtual distributed supercomputer.

Introduction 2 • The most significant result of the paper is that by using any initial order of jobs and any processor allocation algorithm, the list scheduling algorithm can achieve worst-case performance bound with Notation: p is the maximum size of an individual machine P is the total size of a metacomputer s is minimum job size with s ≥ p αis the ratio of the communication bandwidth within a parallel machine to the communication bandwidth of a network β is the fraction of the communication time in the jobs

Introduction • The Scheduling Model • A Communication Cost Model • Scheduling Algorithms • Worst-Case Performance Analysis • Experimental Data

A metacomputer is specified as M = (P1, P2, ..., Pm), where Pj , 1 ≤ j ≤ m, is the name as well as the size (i.e., the number of processors) of a parallel machine. • Let P = P1 +P2 +…+Pmdenote the total number of processors. • The m machines are connected by a LAN, MAN, WAN, or the Internet. • A job J is specified as (s, t), where s is the size of J (i.e., the number of processors required to execute J) and t is J’s execution time. The cost of J is the product st. • Given a metacomputer M and a list of jobs L = (J1, J2, ..., Jn), where Ji= (si, ti), 1 ≤ i ≤ n, we are interested in scheduling the n jobs on M.

A schedule of a job Ji= (si, ti) is • τiis the starting time of Ji • Jiis divided into ri subjobs Ji,1, Ji,2, ..., Ji,ri , of sizes si,1, si,2, ..., si,ri , respectively, with si = si,1 + si,2 + … + si,ri • The subjob Ji,kis executed on Pjkby using si,kprocessors, for all 1 ≤ k ≤ ri

siprocessors allocated to Ji communicate with each other during the execution of Ji. • Communication time between two processors residing on different machines connected by a LAN, MAN, WAN, or the Internet is significantly longer than that on the same machine. • The communication cost model takes both inter-machine and intra-machine communications into consideration. • The execution time ti is divided into two components, ti = ti,comp + ti,comm • Each processor on Pjkneeds to communicate with the si,kprocessors on Pjk and the si − si,kprocessors on Pjk’with k’ ≠k. • t*I,k, the execution time of the subjob Ji,k on Pjk, as

The execution time of job Ji is t*I = max(t*I,1 , t*i,2 , …, t*I,ri) we call t*I the effective execution time of job Ji. • The above measure of extra communication time among processors on different machines discourages division of a job into small subjobs.

Our job scheduling problem for grid computing on metacomputers can be formally defined as follows: given a metacomputer M = (P1, P2, ..., Pm) and a list of jobs L = (J1, J2, ..., Jn), where Ji = (si, ti), 1 ≤ i ≤ n, find a schedule ψof L, ψ= (ψ1, ψ2, ..., ψn), with ψi = (τi, (Pj1, si,1), (Pj2, si,2), ..., (Pjri, si,ri )), where Jiis executed during the time interval [τi, τi +t*i ] by using si,kprocessors on Pjkfor all 1 ≤ k ≤ ri, such that the total execution time of L on M, is minimized.

When α= 1, that is, extra communication time over a LAN, MAN, WAN, or the Internet is not a concern, the above scheduling problem is equivalent to the problem of scheduling independent parallel tasks in multiprocessors, which is NP-hard even when all tasks are sequential.

A complete description of the list scheduling (LS) algorithm is given in the next slide. • There is a choice on the initial order of the jobs in L. Four ordering strategies: • Largest Job First (LJF) – Jobs are arranged such that s1≥ s2≥…≥ sn • Longest Time First (LTF) – Jobs are arranged such that t1≥ t2≥…≥ tn • Largest Cost First (LCF) – Jobs are arranged such that s1t1≥ s2t2≥…≥ sntn. • Unordered (U) – Jobs are arranged in any order.

The number of available processors P’jon machine Pjis dynamically maintained. The total number of available processors is P’ = P’1 + P’2 + · · · + P’m

Each job scheduling algorithm needs to use a processor allocation algorithm to find resources in a metacomputer. • Several processor allocation algorithms have been proposed, including Naive, LMF (largest machine first), SMF (smallest machine first), and MEET (minimum effective execution time).

Let A(L) be the length of a schedule produced by algorithm A for a list L of jobs, and OPT(L) be the length of an optimal schedule of L. We say that algorithm A achieves worst-case performance bound B if A(L)/OPT(L) ≤ B for all L

Let t*i,LS be the effective execution time of a job Ji in an LS schedule. • Assume that all the n jobs are executed during the time interval [0, LS(L)]. • Let Jibe a job which is finished at time LS(L). • It is clear that before Jiis scheduled at time LS(L) − t*i,LS, there are no si processors available; otherwise, Ji should be scheduled earlier. • That is, during the time interval [0, LS(L)−t*i,LS], the number of busy processors is at least P − si+ 1. • During the time interval [LS(L)−t*i,LS, LS(L)], the number of busy processors is at least si. • Define effective cost of L in an LS schedule as • Then, we have

No matter which processor allocation algorithm is used, always have • The effective execution time of Ji in an optimal schedule is • Thus, we get where • It is clear that φiis an increasing function of si, which is minimized when si= s. Hence, we have where

=> => • Since => => • The right hand side of the above inequality is minimized when =>

=> The right hand side of the above inequality is a decreasing function of Si, which is maximized when Si = s.

Theorem. If Pj ≤ p for all 1 ≤ j ≤ m, and si ≥ s for all 1 ≤ i ≤ n, where p ≤ s, then algorithm LS can achieve worst-case performance bound where • The above performance bound is independent of the initial order of L and the processor allocation algorithm.

Corollary. If a metacomputer only contains sequential machines, i.e., p = 1, communication heterogeneity vanishes and the worst-case performance bound in the theorem becomes

Job Scheduling for Grid Computing on Metacomputers

Job Scheduling for Grid Computing on Metacomputers

Presentation Transcript

Middleware for Grid Computing On Virtual Machines

Grid Scheduling

Grid Computing

Job Scheduling on Amazon EC2

Grid Computing Now! Making the Case for Grid Computing

Grid Computing

Grid Computing

Two Lectures on Grid Computing

Grid Scheduling

Grid computing for CMS

Grid Computing:

Scheduling for Grid Computing

Parallel Computing Systems Part III: Job Scheduling

Job Scheduling

Grid computing

Job Scheduling

Performance-responsive Scheduling for Grid Computing

Job scheduling

Grid Computing

Job Scheduling for MapReduce

Grid Computing

Grid Computing