130 likes | 252 Views
Heuristics for Meta Scheduling. DCBG Seminar Francis Tang. Overview. Introduction Metascheduler (scheduling on the Grid) Application Run Time Prediction Gibbons’s algorithm Templates and categories Prediction Update GridX implementation Further work.
E N D
Heuristics for Meta Scheduling DCBG Seminar Francis Tang Heuristics for Metascheduling
Overview • Introduction • Metascheduler (scheduling on the Grid) • Application Run Time Prediction • Gibbons’s algorithm • Templates and categories • Prediction • Update • GridX implementation • Further work Heuristics for Metascheduling
Each scientific cluster (e.g. White, Viper, Cobra and Mamba) runs a Scheduler (e.g. PBS, SGE, LSF) The scheduler allows users of a cluster to submit jobs to the same cluster A Metascheduler runs jobs by submitting to another scheduler (hence “Meta-”) It allows users to submit jobs to other clusters Intro - Metascheduling white PBS viper LSF GX cobra SGE mamba PBS Heuristics for Metascheduling
The purpose of a scheduler is to coordinate resource-use for increased efficiency The scheduler must decide where to send a job for execution How to make this decision? One approach: predict the run time of a job running on different clusters There are theoretical issues concerning run time prediction We do not even know if the program will terminate Thus the problem is undecidable since it is at least as hard as the Halting Problem Without annotating the programs, we must use heuristic approaches Intro – Run Time Prediction Heuristics for Metascheduling
Gibbons’s algorithm comprises two parts: Prediction: use a job’s attributes to guess the run time by comparing against historical records Update: update historical records based data as jobs complete Gibbons’s uses the following attributes: Owner/username (u) Executable name (e) Number of processors (n) It can be generalised to use other attributes Gibbons’s algorithm Heuristics for Metascheduling
A template is an equivalence relation over the space of all jobs Typically an equivalence relation is specified by a list of attributes which must be equal In the Gibbons’s algorithm, we use the following templates: (u,e,n), (u,e), (u,n), (u), (n), () Example: if C is created from (u,e), then jobs are considered similar if they have the same user name, executable name Example: if C is created from (u), then jobs are considered similar if have the same user name Categories Gibbons calls the equivalence classes induced by the equivalence relation categories Informally, a category is a set of similar jobs The meaning of “similar” is determined by the template (equivalence relation) from which it is created Given another job j, it is possible to determine whether j is similar to those of C Given a category, a prediction of a job’s run time can be made using those historical records which fall into C Gibbons’s – templates and categories Heuristics for Metascheduling
A category is defined to be a set of jobs Prediction is defined using historical data However, we only use categories and historical data to determine similarity against another job to give us predictions, e.g. linear regression, or mean Thus, we do not need to store every job in each set, nor the complete history We store: A representative job of each equivalence class – equivalently, just the values of attributes that deem the jobs to be “similar” Enough aggregate stats for prediction (e.g. for mean, we store cumulative sum and sample size) We define a refinement relation between the abstract and concrete state. The refinement relation allows us to check correctness of the algorithm Categories – a note on implementation Heuristics for Metascheduling
Gibbons’s - prediction • Suppose a user submits a job j • The algorithm finds a category C to which j is considered similar • Since there may be many such categories, we find the one which would give the most accurate prediction • C is used to predict the run time of j • Use linear regression if a cpu count is available for j • Otherwise, use a simple mean Heuristics for Metascheduling
Categories: (u,e) categories (francis, rnafind) (arun, repeats) (arun, scatter-blast) (u) categories (francis) (arun) () categories () Jobs: (francis, rnafind) matches i.1, ii.1 and iii.1 (francis, wavemotif) matches ii.1 and iii.1 (arun, scatter-blast) matches i.3, ii.2 and iii.1 (azmi, gillespie) matches iii.1 Gibbons’s – prediction (example) Heuristics for Metascheduling
Gibbons’s - update • When a job finishes, we know its run time • We must update the concrete state of the scheduler. The algorithm can be computed using the refinement relation. • Explicitly: for each template, we try to find a category of similar jobs • If found, we add this new job to the category. This entails updating the stats associated with the category, e.g. cumulative sum, sample size • If not found, we create a new category based on this job Heuristics for Metascheduling
Classes: (u,e) categories (francis, rnafind) (arun, repeats) (arun, scatter-blast) (u) categories (francis) (arun) () categories () Job (francis, rnafind) finishes: Add job to i.1, ii.1 and iii.1 Job (francis, repeats) finishes: Add job to ii.1 and iii.1 Create category (francis, repeats) Job (azmi, gillespie) finishes: Add job to iii.1 Create categories (azmi, gillespie), (azmi) Gibbons’s – update (example) Heuristics for Metascheduling
We generalise the Gibbons’s templates to allow us to schedule over the Grid Attributes: Machine (m), User (u), Executable (e), Num of processors (n) Templates: (m,u,e,n), (m,u,e), (m,u,n), (m,u), (m, n), (m), () We match processor counts based on exponential intervals: i.e. 1, 2, 3-4, 5-8, 9-16, 17-32, etc. So far, we only use mean as our predictor; linear regression will be added later Gibbons’s scheduler in GridX Heuristics for Metascheduling
Further work • Aging of historical information • Only using historical records up to 1 month (or some other limit) is straightforward • Using a weighted average (recent records are more heavily weighted) is more difficult, especially wrt update Heuristics for Metascheduling