A Unified Approach to Scheduling in Grid Environments

A Unified Approach to Scheduling in Grid Environments Peter Kelly DHPC Group, School of Computer Science Supervisors: Dr. Paul Coddington Dr. Andrew Wendelborn

What is grid computing? • Middleware which allows people and organisations to share computing resources in a coordinated manner • Data and computation can be distributed between machines in other institutions around the country/world • Remote access to resources that are not available locally Grid computing aims to be seamless and secure • Users interact with “the grid”, instead of individual machines • Different platforms integrated using standard protocols • Support for a wide range of applications Common ways in which a grid is used • Execute a program on a remote host • Read and write data stored on another machine • Access services provided by a particular server

Types of grids “Heavyweight” grids • Consist of supercomputers and other high-end machines • Generally used for computation sharing • High level of security and autonomy • Can be complex to set up and maintain “Lightweight” grids • Commodity desktop machines in homes and offices • Generally used for computation sharing • Usually based on master-worker model • Software are much simpler, can be installed by end users Service based grids • Can consist of any type of machine • Provides specific functionality rather than generic compute cycles • Platform independent; implementation details hidden behind interface

What is Scheduling? Need to decide when and where computation and other things are to occur Traditional CPU scheduling • Familiar case of task scheduling on a single CPU - time-slicing allocates CPU time to processes • SMP systems - several CPUs - time slices are allocated between processors • Parallel computers - many processors - tasks usually have exclusive access to a certain number of processors • Clusters of workstations – similar case – tasks must be assigned to specific processors Lots of past research already done for parallel computers and clusters • Grid computing has some similarities to these • However, many additional factors have to be considered

Grid Scheduling Many extra complexities Scheduling between multiple computers • Different CPU speeds, architectures • Network connectivity can vary widely between machines • Many different users and concurrent tasks Data location is important • Data should be close to computation for efficient access • Affected by network bandwidth and available storage resources Centralised vs. distributed scheduling • Centralised scheduling offers more control, but limits autonomy of individual resources • Centralised algorithms only scale to a few hundred machines • Distributed scheduling gives more control to machine owners, and is much more scalable

Grid Scheduling Current scheduling strategies for grids are limited • Some only support specific application models e.g. task farming • Generic mechanisms do not usually support parallel programs • Scheduling is generally based on independent jobs, or application-specific parallel scheduling • Centralised schedulers - limited scalability Three main types of scheduling • Job submission • Services • Data placement (replica management) These types of scheduling are normally independent of each other.

Job submission • Similar to batch processing model used on mainframes • Client supplies details of job to run, including program name, command-line arguments, environment variables • In many cases, client also uploads executable to run • Server runs job immediately or in the future, and notifies user on completion • Useful for gaining access to faster computers on the grid to run your own programs • Platform-dependent; client needs knowledge of server configuration, and if binary executable supplied, server must have specific OS/architecture Common examples • Globus GRAM • Condor • LSF • PBS

Job submission - Scheduling Metascheduling (done at grid level) • On which resource should the job run? • Choice based on job requirements, access permissions, machine load and other parameters • Parallel jobs can be scheduled to run across multiple resources • Each resource may physically contain multiple processors – e.g. parallel computer or cluster • Example – parameter sweep application. Each resource handles a particular range. Local scheduling (done at individual resource level) • Once job is assigned to a resource, when should it run? • Usually determined by local job queuing system • Job is run if machine is currently unused, or may be delayed until the other jobs have completed • Alternatively, job may start straight away and run concurrently with any other existing jobs on that machine

Resource broker (metascheduler) Client Computational resource Computational resource Computational resource

Services • Well-defined interface with a set of operations • Accessed via standard protocol – no knowledge of platform necessary • Different implementations of a service, all conforming to the same interface, can be accessed transparently by clients • Set of services provided by a machine is generally fixed – clients can’t supply code to execute as they can in the job submission model • Additional services can only be installed/configured by machine owner Uses • Client-server app – access a single service • Workflow application – access many services and coordinate flow of data between them Common technologies • Java RMI • Web Services • Sun RPC

Services - Scheduling • If a service is provided by more than one machine, client can chose which to connect to • Some machines may be more desirable than others, based on expected time to perform operations or transfer data • Clients either obtain list of servers from a central registry, or have messages redirected by some intermediate entity Example: Website mirroring • Multiple web servers host copies of the same site • Load balancer intercepts requests from clients and redirects to servers according to some scheduling algorithm • Alternatively, multiple DNS entries for the same site – client selects a machine

Data access Grid applications need to access data residing on other machines Common approaches • Mounting remote file system (e.g. NFS, CIFS) then using OS APIs • Protocols such as FTP, HTTP, GridFTP. Programs often use shared libraries implementing these (e.g. the GASS library) • Interposition agents – intercept system calls and redirect file operations to remote host • Pre-staging – data files copied to remote machine before job runs

Data scheduling and replication • The faster a program can access its data, the better • Can move data to the program, or move program to the data • Multiple copies (replicas) of the data can exist on different machines • Programs access the “closest” replica (the one they have the fastest access to) • Data from remote hosts may be cached locally Knowledge about data access patterns can help with replication • e.g. if many jobs access the same file, can pre-stage that file to remote machine(s) that will run the jobs

Data vs. Computation scheduling • Scheduling decisions that choose a machine just on CPU speed/load could result in data access that is very slow. • Is placing the job on Host 1 or Host 2 better? • This depends on how much data it reads… Host 1: 400Mhz Host 2: 2.6Ghz Data file 18Gb 256kbps DSL Job

Data vs. Computation scheduling • If the job access the entire file and does only a small amount of computation, it is better to run it on Host 1 • But if it only reads a few kb from different parts of the file, running the job on Host 2 would be quicker • File could be pre-staged to Host 2 if it is reused by multiple jobs Host 1: 400Mhz Host 2: 2.6Ghz Data file 18Gb 256kbps DSL Job

Service scheduling and communications speed • We need to take into account communication costs not just for data file access, but for interaction between multiple tasks, and when accessing services • Should the client access Server 1 or Server 2? • Again, this depends on the amount of data that needs to be transferred and the amount of computation performed by the service. • Scheduler needs lots of info – network bandwidth, CPU speed, application data access requirements 256kbps DSL Server 1 Client 1.2Gz, loadavg 0.1 100mbps ethernet Server 2 1.2Gz, loadavg 5.2

Problem Existing schedulers generally only consider one type of scheduling • Optimising one aspect of performance, e.g. execution speed, can have negative effects on other aspects, and on overall performance • Ignoring data transfer costs could result in large transfer times • Ignoring CPU performance could result in slow programs • Ignoring common requirements between jobs can result in missing out on chances for optimisation • Simplifying assumptions made by some schedulers don’t always hold Proposed Solution • Build a scheduler which considers all aspects of performance • This will allow for better overall performance in a wider range of situations

An integrated view Remove the distinction between different types of things that we need to schedule • Computation (tasks), data (files) and services can all be considered “entities” that comprise an application • Each entity is capable of residing on a subset of the machines on the grid • A scheduler can just consider where to place the entities, and a separate mechanism deals with the low-level details Examples • A task entity corresponding to a Java class can run on any machine with a JVM installed • A service entity, representing a connection made by the program to a service, can be assigned to any machine which provides that service • A file entity, corresponding to a particular file accessed by a program, can be moved or copied to any host with sufficient storage space available

Graph representation • Based on the entities and relationships between them, an application can be represented as a graph • Relationships include communication between tasks, I/O on files, and operation calls on services Service A Task 2 Task 1 Task 4 File Y Task 3 File X

Integrated scheduling • Scheduler is given information about each of the different entities • This is used to make scheduling decisions, which are then carried out by a separate component Task 1 Service A File X Type = task ComputationCost = 0.4 StorageSize = 0Mb MemoryReq = 18Mb Type = service ComputationCost = 0.8 StorageSize = 0Mb MemoryReq = 33Mb Type = file ComputationCost = 0 StorageSize = 180Mb MemoryReq = 0Mb

Process networks • A graph-based model of computation • Network consists of a number of processes, each of which perform computation and input/output data along channels • Data flows along channels connecting nodes This is a useful model on which to base a grid scheduler

Process networks Any program run on a grid can be conceptually thought of as a process network • Each task, service and file corresponds to a “process” • Tasks execute instructions according to code in the program • Files are like processes which perform no computation but simply input or output a stream of data (corresponding to read or write access) • Services are like tasks except their operations are pre-defined, and communication occurs through requests and responses Channels connecting from one node to another can represent: • Messages being sent from one task to another • Data read from or written to a file • The requests/responses sent between a task and a service

Initial research This will utilise PAGIS, a grid programming infrastructure based on process networks, developed within the DHPC group • Support exists for writing tasks in Java • Tasks can be configured together in a process network • Will be extended to support data files and services as nodes • Metascheduler to be written which runs within PAGIS and allows multiple programs to be scheduled concurrently Investigation • Centralised and distributed scheduling algorithms • Information about resources and applications needed by the scheduler • Advantages of integrated view vs. scheduling based on one aspect of performance • Look at a range of different grid and application scenarios

PAGIS

Further research Applicability of our research to other systems • Will look at scheduling in the context of additional systems: e.g. Condor, LSF, NetSolve, Globus, Nimrod, Taverna • Scheduler developed in previous stages will be enhanced and interfaced with other systems • An existing simulation environment will be used to experiment with scheduling algorithms on larger grid configurations Outcomes • Demonstrate and evaluate the integrated approach to scheduling • Theoretical level: What are the best scheduling algorithms for grid environments, and how are they affected by nature of grid and application? • Practical level: How can such scheduling be used to enhance the operation of existing grid systems?

Conclusion • Grid computing is far from a “solved problem” • Scheduling is an important area of research • Much work done previously in the area of parallel and cluster computing • Some scheduling support already available in existing grids, but with various limitations In this project • A range of different scheduling algorithms and techniques will be investigated • These will include existing algorithms used for parallel/cluster computing, as well as the development of new algorithms • Integrated approach to scheduling is proposed – will allow for more effective scheduling over a wider range of cases • Demonstration of our approach and evaluation of how well it applies in a range of situations

Questions?

A Unified Approach to Scheduling in Grid Environments

A Unified Approach to Scheduling in Grid Environments

Presentation Transcript

Grid Scheduling

Motivation in WBL Environments: a ‘winning’ approach

Toward a unified approach to fitting loss models

A Unified Approach to Engineering Science

A Scheduling Service Oriented Approach for Workflow Scheduling

Connecting to the Grid : A partnership approach

A user-friendly approach to grid security

Grid Computing Environments

A Unified Approach to Mainstreaming Equality and Diversity

Reinforcement Learning applied to Meta -scheduling in grid environments Bernardo Costa Inês Dutra

Unified Approach

Application Scheduling in a Grid Environment

A UNIFIED APPROACH TO GLOBAL PROGRAM OPTIMIZATION

Grid Scheduling

A Resource Allocation Approach for Supporting Time-Critical Applications in Grid Environments

A Unified Approach to Approximating Resource Allocation and Scheduling

Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration

Grid Programming Environments

Fault-tolerant Scheduling of Fine-grained Tasks in Grid Environments

LPSAT: A Unified Approach to RTL Satisfiability

Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments

The Unified Approach