The KOALA Grid Scheduler over DAS-3 and Grid’5000

The KOALA Grid Schedulerover DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru Iosup, Mathieu Jan, Hashim Mohamed, Ozan Sonmez Parallel and Distributed Systems Group

Contents • Our context: grid scheduling and co-allocation • The design of the KOALA co-allocating scheduler • Some performance results • KOALA over Grid’5000 and DAS-3 • Conclusion & future work

Grid scheduling environment • System • Grid schedulers usually do not own resources themselves • Grid schedulers have to interface to different local schedulers • Sun Grid Engine (SGE 6.0 ) on DAS-2/DAS-3 • OAR on Grid’5000 • Workload • Various kind of applications • Various requirements

multiple separate jobs grid Co-Allocation (1) • In grids, jobs may use multiple types of resources in multiple sites: co-allocation or multi-site operation • Without co-allocation, a grid is just a big load-sharing device • Find suitable candidate system for running a job • If the candidate is not suitable anymore, migrate

single global job grid Co-Allocation (2) • With co-allocation • Use available resources (e.g., processors) • Access and/or process geographically spread data • Application characteristics (e.g., simulation in one location, visualization in another) • Problems • More difficult resource-discovery process • Need to coordinate allocations of local schedulers • Slowdown due to wide-area communications

A model for co-allocation: schedulers global queue with grid scheduler KOALA load sharing co-allocation local queues with local schedulers LS LS LS global job clusters non-local job local jobs

A model for co-allocation: job types fixed job non-fixed job job components job component placement fixed scheduler decides on component placement flexible job same total job size scheduler decides on split up and placement

A model for co-allocation: policies • Placement policies dictate where the components of a job go • Placement policies for non-fixed jobs • Load-aware: Worst Fit (WF) (balance load in clusters) • Input-file-location-aware: Close-to-Files (CF) (reduce file-transfer times) • Communication-aware: Cluster Minimization (CM) (reduce number of wide-area messages) • Placement policies for flexible jobs: • Communication- andqueue time-aware: Flexible Cluster (CM + reduce queue wait time) Minimization (FCM)

KOALA: a Co-Allocating grid scheduler • Main goals • Processor co-allocation: non-fixed/flexible jobs • Data co-allocation: move large input files to the locations where the job components will run prior to execution • Load sharing: in the absence of co-allocation • KOALA • Run alongside local schedulers • Scheduler independent from Globus • Uses Globus components (e.g., RSL and GridFTP) • For launching jobs uses its own mechanisms or Globus DUROC • Has been deployed on the DAS2 in September 2005

KOALA: the architecture • PIP/NIP: information services • RLS: replica location service • CO: co-allocator • PC: processor claimer • RM: run monitor • RL: runners listener • DM: data manager • Ri: runners ? SGE

KOALA: the runners • The KOALA runners are adaptation modules for different application types • Set up communication • Launch applications • Current runners • KRunner: default KOALA runner that co-allocates processors and that’s it • DRunner: DUROC runner for co-allocated MPI applications • IRunner: runner for applications using the Ibis Java library for grid applications

KOALA: job flow with four phases runners new submission + place job claim processors _ + _ retry retry placement queue claiming queue Phase 1: job placement Phase 3: claim processors Phase 4: launch job Phase 2: file transfer

KOALA: job time line processor gained time processor wasted time • If advanced reservations are not supported, don’t claim processors immediately after placing, but wait until close to the estimated job start time • So processors are left idle (processor gained time) • Placing and claiming may have to be retried multiple times time job submission job placement claiming time estimated start time estimated file-transfer time

KOALA: performance results (1) 90 KOALA workload background load processor gained time processor wasted time 20 CF placement tries WF placement tries CF claiming tries WF claiming tries • With replication (3 copies of input files, 2, 4, or 6 GB) • Offer a 30% co-allocation load during two hours • Try to keep the background load between 30% and 40% utilization (%) number of tries CF 1x8 2x8 4x8 1x16 2x16 4x16 job size (number of components X component size) time (s) See, e.g.: H.H. Mohamed and D.H.J. Epema, “An Evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusters,” IEEE Cluster 2004.

KOALA: performance results (2) • Communication-intensive applications • Workload 1: low load • Workload 2: high load • Background load: 15-20% average execution time (s) average wait time (s) workload 1 workload 2 average middleware overhead (s) workload 1 workload 2 See: O. Sonmez, H.H. Mohamed, D.H.J. Epema, Communication-Aware Job-Placement Policies for the KOALA Grid Scheduler, 2nd IEEE Int’l Conf. on e-Science and Grid Computing, dec. 2006. number of job components

Grid’5000 and DAS-3 interconnection: scheduling issues • Preserve each system usage • Characterize jobs (especially for Grid’5000) • Usage policies • Allow simultaneous use of both testbeds • One more level of hierarchy in latencies • Co-allocation of jobs • Various type of applications: PSAs, GridRPC, etc DAS-3

KOALA over Grid’5000 and DAS-3 • Goal: testing KOALA policies … • … in a heterogeneous environment • … with different workloads • … with OAR reservation capabilities • Grid’5000 from DAS-3 • “Virtual” clusters inside KOALA • Used whenever DAS-3 is overloaded • How: deployment of DAS-3 environment on Grid’5000 DAS-3

KOALA over Grid’5000 and DAS-3: how Orsay DAS-3 DAS-3 … DAS-3 DAS-3 DAS-3 file-server OAR Rennes DAS-3 Lyon

Using DAS-3 from Grid’5000 • Authorize Grid’5000 users to submit jobs … • via SGE directly, OARGrid or KOALA • Usage policies? • Deployment of environments on DAS-3 as in Grid’5000? • When: during nights and week-end? • Deployment at grid level • KOALA submit kadeploy jobs DAS-3

Current progress • Collected traces of Grid’5000 [done] • OAR tables of 15 clusters • OARGrid tables • LDAP database • Analysis in progress • KOALA over Grid’5000 [in progress] • KOALA communicate with OAR for its information service [done] • GRAM interface to OAR • “DAS-2” image on Grid’5000: Globus, KOALA, OAR DAS-3

Conclusion • KOALA is a grid resource management system • Support processor and data co-allocation • Several job placement policies (WF, CF, CM, FCM) • Use bandwidth and latency in job placements (lightpaths?) • Deal with more application types (PSAs, …) • A decentralized P2P KOALA Future work

Information • Publications • see PDS publication database at www.pds.ewi.tudelft.nl • Web site • KOALA: www.st.ewi.tudelft.nl/koala

Slowdown due to wide-area communications • Co-allocated applications are less efficient due to the relatively slow wide-area communications • Extension factor of a job service time on multicluster service time on single cluster • Co-allocation is beneficial when the extension factor ≤ 1.20 • Unlimited co-allocation is no good • Communications libraries may be optimized for wide-area communication (>1 usually) See, e.g.: A.I.D. Bucur and D.H.J. Epema, “Trace-BasedSimulations of Processor Co-Allocation Policies in Multiclusters,” HPDC 2003.

The KOALA Grid Scheduler over DAS-3 and Grid’5000