Batch Scheduling at LeSC with Sun Grid Engine

David McBride <dwm@doc.ic.ac.uk> Systems Programmer London e-Science Centre Department of Computing, Imperial College Batch Scheduling at LeSC with Sun Grid Engine

Overview • End-user requirements • Brief description of compute hardware • Sun Grid Engine software deployment • Tweaks to the default SGE configuration • Future changes • References for more information and questions.

End-User Requirements • We have many different users: high-energy physicists, bioinfomaticians, chemists, parallel software researchers. • Jobs are many and varied: • Some users run relatively few long running tasks, others submit large clusters of shorter jobs. • Some require several cluster nodes to be co-allocated at runtime (16, 32+ MPI hosts), others simply use a single machine. • Some require lots of RAM.. (1, 2, 4, 8GB+ per machine) • In general users are fairly happy so long as they get a reasonable response time.

Hardware • Saturn: 24-way 750Mhz UltraSparcIII Sun E6800 • 36GB RAM, ~20TB online RAID storage, • 24TB tape library to support long-term offline backups. • Running Solaris 8 • Viking cluster: 260 Node dual P4 Xeon 2Ghz+ • 128 machines with Fast Ethernet; 2x64 machines also with Myrinet • 2 front-end nodes & 2 development nodes. • Running RedHat Linux 7.2 (plus local additions and updates) • Mars cluster: 204 Node dual AMD Opteron 1.8Ghz+ • 128 machines with Gigabit Ethernet; 72 machines also with Infiniband. • Running RedHat Enterprise Linux 3 (plus local refinements) • 4 front-end interactive nodes.

Sun Grid Engine Deployment • Two separate logical SGE installations • Saturn acts as the master node for both cells. • However, Viking is running SGE 5.3 and Mars is running SGE 6.0. • Mars is still ‘in beta’; Viking is still providing the main production service. • When Mars’s configuration is finalized, end-users will be migrated to Mars – Viking will then be reinstalled with the new configuration.

Changes to Default Configuration • Issue 1: • If all the available worker nodes are running long-lived jobs, then a new short-lived job added to the queue will not execute until one of the long-lived jobs has completed. (SGE does not provide a job checkpoint-and-preempt facility.) • Resolution: A subset of nodes are configured to only run short-lived jobs. • Trades slightly reduced cluster utilization for shorter average-case response time for short-lived jobs.

Tweaks to SGE configuration • Issue 2: • If a job is submitted that requires the co-allocation of several cluster nodes simultaneously (eg for a 16-way MPI job) then that job can be starved by a larger number of single-node jobs. • Resolution: Manually intervene to manipulate queues so that the large 16-way job will be scheduled. (SGE 5.3) • Resolution: Upgrade to SGE 6 which uses a more advanced scheduling algorithm (advance reservation with backfill.)

Future Changes • Default requirements for jobs: • Different cluster nodes have different resources; eg some have more memory, fast processors, than others. • Sometimes a low-requirement job will be allocated to one of these more capable machines unnecessarily because the submitter has not specified the job’s requirements. • This can prevent a job which does have high requirements from being run as quickly. • Plan to change the SGE configuration so that a job will, by default, only require the resources of the least-capable node. • Places onus on user to request extra resources if needed.

Future Changes: LCG • We are participating in the Large Hadron Collider Compute Grid as part of the London Tier-2. • This has been non-trivial; the standard LCG distribution only supports PBS-based clusters. • We’ve developed SGE-specific Globus JobManager and Information Reporter components for use with LCG. • We have also been working with the developers to address issues with running on 64bit Linux distributions. • Currently deploying front-end nodes (CE, SE, etc.) to expose Mars as an LCG compute site. • We are also joining the LCG Certification Testbed to provide a SGE-based test site to help ensure future support.

References • London e-Science Centre homepage: • http://www.lesc.ic.ac.uk/ • SGE intergration tools for Globus Toolkit 2, 3, 4 and LCG: • http://www.lesc.ic.ac.uk/projects/sgeindex.html

Q&A

Batch Scheduling at LeSC with Sun Grid Engine

Batch Scheduling at LeSC with Sun Grid Engine

Presentation Transcript

Grid Systems and scheduling

Scheduling in Batch Systems

Grid Scheduling

Batch Scheduling of Conflicting Jobs

Batch at the limit?!

Batch Computing at Altera

Sun Grid Engine

Sun Grid Engine Grid Computing Assignment – Fall 2005

Submitting Jobs to the Sun Grid Engine

Grid Scheduling

Preparing for the Grid— Changes in Batch Systems at Fermilab

Exploring Issues with Workflow Scheduling on the Grid

Scheduling for Grid Computing

Batch Scheduling at CERN (LSF)

Grid scheduling issues in the Sun Data and Compute Grids Project

Batch Software at JLAB

Submitting Jobs to the Sun Grid Engine at Sheffield and Leeds (Node1)

LESC PrizeGiving 2014

Batch Computing at Altera

Grid Scheduling and Multithreading

Batch at the limit?!

Grid Systems and scheduling