1 / 23

The KOALA Grid Scheduler over DAS-3 and Grid’5000

The KOALA Grid Scheduler over DAS-3 and Grid’5000. Processor and data co-allocation in grids. Dick Epema, Alexandru Iosup, Mathieu Jan , Hashim Mohamed, Ozan Sonmez. Parallel and Distributed Systems Group. Contents. Our context: grid scheduling and co-allocation

kenton
Download Presentation

The KOALA Grid Scheduler over DAS-3 and Grid’5000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The KOALA Grid Schedulerover DAS-3 and Grid’5000 Processor and data co-allocation in grids Dick Epema, Alexandru Iosup, Mathieu Jan, Hashim Mohamed, Ozan Sonmez Parallel and Distributed Systems Group

  2. Contents • Our context: grid scheduling and co-allocation • The design of the KOALA co-allocating scheduler • Some performance results • KOALA over Grid’5000 and DAS-3 • Conclusion & future work

  3. Grid scheduling environment • System • Grid schedulers usually do not own resources themselves • Grid schedulers have to interface to different local schedulers • Sun Grid Engine (SGE 6.0 ) on DAS-2/DAS-3 • OAR on Grid’5000 • Workload • Various kind of applications • Various requirements

  4. multiple separate jobs grid Co-Allocation (1) • In grids, jobs may use multiple types of resources in multiple sites: co-allocation or multi-site operation • Without co-allocation, a grid is just a big load-sharing device • Find suitable candidate system for running a job • If the candidate is not suitable anymore, migrate

  5. single global job grid Co-Allocation (2) • With co-allocation • Use available resources (e.g., processors) • Access and/or process geographically spread data • Application characteristics (e.g., simulation in one location, visualization in another) • Problems • More difficult resource-discovery process • Need to coordinate allocations of local schedulers • Slowdown due to wide-area communications

  6. A model for co-allocation: schedulers global queue with grid scheduler KOALA load sharing co-allocation local queues with local schedulers LS LS LS global job clusters non-local job local jobs

  7. A model for co-allocation: job types fixed job non-fixed job job components job component placement fixed scheduler decides on component placement flexible job same total job size scheduler decides on split up and placement

  8. A model for co-allocation: policies • Placement policies dictate where the components of a job go • Placement policies for non-fixed jobs • Load-aware: Worst Fit (WF) (balance load in clusters) • Input-file-location-aware: Close-to-Files (CF) (reduce file-transfer times) • Communication-aware: Cluster Minimization (CM) (reduce number of wide-area messages) • Placement policies for flexible jobs: • Communication- andqueue time-aware: Flexible Cluster (CM + reduce queue wait time) Minimization (FCM)

  9. KOALA: a Co-Allocating grid scheduler • Main goals • Processor co-allocation: non-fixed/flexible jobs • Data co-allocation: move large input files to the locations where the job components will run prior to execution • Load sharing: in the absence of co-allocation • KOALA • Run alongside local schedulers • Scheduler independent from Globus • Uses Globus components (e.g., RSL and GridFTP) • For launching jobs uses its own mechanisms or Globus DUROC • Has been deployed on the DAS2 in September 2005

  10. KOALA: the architecture • PIP/NIP: information services • RLS: replica location service • CO: co-allocator • PC: processor claimer • RM: run monitor • RL: runners listener • DM: data manager • Ri: runners ? SGE

  11. KOALA: the runners • The KOALA runners are adaptation modules for different application types • Set up communication • Launch applications • Current runners • KRunner: default KOALA runner that co-allocates processors and that’s it • DRunner: DUROC runner for co-allocated MPI applications • IRunner: runner for applications using the Ibis Java library for grid applications

  12. KOALA: job flow with four phases runners new submission + place job claim processors _ + _ retry retry placement queue claiming queue Phase 1: job placement Phase 3: claim processors Phase 4: launch job Phase 2: file transfer

  13. KOALA: job time line processor gained time processor wasted time • If advanced reservations are not supported, don’t claim processors immediately after placing, but wait until close to the estimated job start time • So processors are left idle (processor gained time) • Placing and claiming may have to be retried multiple times time job submission job placement claiming time estimated start time estimated file-transfer time

  14. KOALA: performance results (1) 90 KOALA workload background load processor gained time processor wasted time 20 CF placement tries WF placement tries CF claiming tries WF claiming tries • With replication (3 copies of input files, 2, 4, or 6 GB) • Offer a 30% co-allocation load during two hours • Try to keep the background load between 30% and 40% utilization (%) number of tries CF 1x8 2x8 4x8 1x16 2x16 4x16 job size (number of components X component size) time (s) See, e.g.: H.H. Mohamed and D.H.J. Epema, “An Evaluation of the Close-to-Files Processor and Data Co-Allocation Policy in Multiclusters,” IEEE Cluster 2004.

  15. KOALA: performance results (2) • Communication-intensive applications • Workload 1: low load • Workload 2: high load • Background load: 15-20% average execution time (s) average wait time (s) workload 1 workload 2 average middleware overhead (s) workload 1 workload 2 See: O. Sonmez, H.H. Mohamed, D.H.J. Epema, Communication-Aware Job-Placement Policies for the KOALA Grid Scheduler, 2nd IEEE Int’l Conf. on e-Science and Grid Computing, dec. 2006. number of job components

  16. Grid’5000 and DAS-3 interconnection: scheduling issues • Preserve each system usage • Characterize jobs (especially for Grid’5000) • Usage policies • Allow simultaneous use of both testbeds • One more level of hierarchy in latencies • Co-allocation of jobs • Various type of applications: PSAs, GridRPC, etc DAS-3

  17. KOALA over Grid’5000 and DAS-3 • Goal: testing KOALA policies … • … in a heterogeneous environment • … with different workloads • … with OAR reservation capabilities • Grid’5000 from DAS-3 • “Virtual” clusters inside KOALA • Used whenever DAS-3 is overloaded • How: deployment of DAS-3 environment on Grid’5000 DAS-3

  18. KOALA over Grid’5000 and DAS-3: how Orsay DAS-3 DAS-3 … DAS-3 DAS-3 DAS-3 file-server OAR Rennes DAS-3 Lyon

  19. Using DAS-3 from Grid’5000 • Authorize Grid’5000 users to submit jobs … • via SGE directly, OARGrid or KOALA • Usage policies? • Deployment of environments on DAS-3 as in Grid’5000? • When: during nights and week-end? • Deployment at grid level • KOALA submit kadeploy jobs DAS-3

  20. Current progress • Collected traces of Grid’5000 [done] • OAR tables of 15 clusters • OARGrid tables • LDAP database • Analysis in progress • KOALA over Grid’5000 [in progress] • KOALA communicate with OAR for its information service [done] • GRAM interface to OAR • “DAS-2” image on Grid’5000: Globus, KOALA, OAR DAS-3

  21. Conclusion • KOALA is a grid resource management system • Support processor and data co-allocation • Several job placement policies (WF, CF, CM, FCM) • Use bandwidth and latency in job placements (lightpaths?) • Deal with more application types (PSAs, …) • A decentralized P2P KOALA Future work

  22. Information • Publications • see PDS publication database at www.pds.ewi.tudelft.nl • Web site • KOALA: www.st.ewi.tudelft.nl/koala

  23. Slowdown due to wide-area communications • Co-allocated applications are less efficient due to the relatively slow wide-area communications • Extension factor of a job service time on multicluster service time on single cluster • Co-allocation is beneficial when the extension factor ≤ 1.20 • Unlimited co-allocation is no good • Communications libraries may be optimized for wide-area communication (>1 usually) See, e.g.: A.I.D. Bucur and D.H.J. Epema, “Trace-BasedSimulations of Processor Co-Allocation Policies in Multiclusters,” HPDC 2003.

More Related