160 likes | 301 Views
Grid Scheduling. Cécile Germain-Renaud. Scheduling. Job A computation to run on a machine Possibly with network access e.g. input/output file (coarse grain) or communication with other jobs (the DAG model) Schedule s(J) = date to begin execution of task J Alloc(J) = machine assigned to J
E N D
Grid Scheduling Cécile Germain-Renaud
Scheduling • Job • A computation to run on a machine • Possibly with network access e.g. input/output file (coarse grain) or communication with other jobs (the DAG model) • Schedule • s(J) = date to begin execution of task J • Alloc(J) = machine assigned to J • One of the oldest Computer Science problems • Principles of classification: [Graham et al. Optimization and approximation in deterministic sequencing and scheduling: A survey. Ann. Discrete Math. 5, (1979), 287-326] • Computer-aided classification of complexity results (4536 at the time of the paper) [Lageweg et al. Computer-Aided complexity classification of combinational problems. CACM 11:2, 1892]
Classical scheduling in HPC T • Context: parallel computing/computers • Application = Direct Acyclic Graph (T, E, w, c) • T = set of sequential tasks • E = dependence constraints • w(t) = computational cost of task t • c(t,t’) = communication cost (data sent from t to t’) • Infrastructure • P identical processors • With or without preemption, dedicated (no sharing) • An optimization problem with objective function Makespan = Total execution time S(T) = max (s(t) + w(t)) • Complexity • NP-complete for independant tasks and no communication E = vide, p =2 and c= 0 • NP-complete for UET-UCT graphs (w = c = 1) • Very old: without communication, list scheduling provides a (2-1/p) approximation T’
Scheduling in Institutional Grids • Institutional: federation of ressources • accounted-for: fair-share on the medium to long time scale is a premium constraint • Partially autonomous local policies must be allowed • Grid • Permanent regime: on-line decisions • Large scale: strongly distributed • Information system • Scheduling services • Relevant contexts • Autonomous, multi-agents systems • Auction algorithms • Service Level Agreement (SLA) technology
UI UI UI UI UI EGEE gLite Scheduling Site (node) Proc Broker CE Broker Local scheduler Broker
UI UI UI UI UI EGEE gLite Scheduling Site (node) Proc BDII Broker Publish CE Broker Local scheduler Broker
UI UI UI UI UI EGEE gLite Scheduling Site (node) Proc BDII Publish CE Query Rank Broker Local scheduler • The information published is • Static: eg which type of VO is accepted • Dynamic: expected traversal time
UI UI UI UI UI EGEE gLite Scheduling Site (node) Proc BDII Publish CE Query Rank Broker Local scheduler Rank: may be any user-defined function, e.g. avoid « bad » machines Default is first locality, second expected traversal time
UI UI UI UI UI Query EGEE gLite Scheduling Site (node) Proc BDII Publish CE Update Broker Local scheduler BDII broker cache
Not only academic • Long waiting times • When EGEE was not so heavily loaded Overhead Ratio Execution time (s)
Batch scheduling • Very complex policies • Maximise throughput under constraints • Weighted fair-share – VOs, type of jobs • Priorities • Hardware requirements • Advance reservations • An indication of job duration is given by the type of queue: infinite, long, medium, short, and exotic ones [B. Bode et al.The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters]
Classical vs Grid • (Relatively) easy: • Throughput instead of makespan + Master-slave graph instead of DAG allow for instance to define cyclic schedules in polynomial time which are asymptotically optimal, but not local [Y. Robert] [A. Rosenberg] • Moderately difficult: information about • Applications • Infrastructures • The same program on different data may run at very different speed • The network performance is dynamic • Really difficult • Queues managed by local policies • On-line decision
zt = + q(B) at f(B)(1 – B)d Information and Scheduling (I) • Considerable work has been done in predicting CPU load in shared environments – desktops, clusters, desktop grids [P.A. Dinda, R. Wolski, J. Schopf] • The basic technique is linear time-series analysis • Self-similarity and epochal behavior • Usual goal is the prediction of the next value • Applied to soft real-time scheduling on shared clusters • Practical application in NWS
Information and scheduling (III) • Less work on predicting the behavior of dedicated systems • Papers are on parallel systems, mostly based on time-series techniques, but at least one based on a genetic algorithm [Downey, Foster, Wolski] • The traces are much more difficult to access • No time slice - Irregular time series: the records are event-driven • Which analysis • Average waiting time: clear but not very useful for prediction • Fitting a distribution: not convincing for // systems • Predicting an upper bound with a confidence interval: metric of success?
Information and grid • We cannot directly log the entire state of the system • Access rights • Size • Currently available data • The lifecycle of jobs going through certain brokers • The job ranking at the same brokers • The detailed behavior of the queues on certain sites • Certain = LAL + possibly other mainstream • Easy to get • Summary data about the lifecycle of all jobs • From which it could be possible to reconstruct the detailed state and dynamic of the CE
What should we learn ? • Learning besides time series make sense in a grid: massive use of community programs instead of (?) sparse runs of a very long and complex digital experiment • Information as sketched before • Beware: not be a steady-state system • New users, new machines, new software is the expected regime for some years from now • A community-based resource will tend display correlated activity • Is there an invariant social graph? Is it a feature? • System algorithms e.g. a site scheduler or the broker • Validation ? • Scheduling algorithms • Validation ?