Cluster Scheduler

Cluster Scheduler Ytao2013.5.15 Reference: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center NSDI’2011 Multi-agent Cluster Scheduling for Scalability and Flexibility. Berkerlytechdoc EECS-2012-273. (doctoral dissertation) Omega: flexible scalable schedulers for large compute clusters EuroSys’2013

Cluster Scheduler(Intro) • Cloud computing framework varies. • New frameworks will likely continue to merge, and no single framework will be optimal for all application. • Cluster with multiple frameworks improves ultilization and data sharing.

Problem Statement: • In the face of increasing demand for cluster resources by diverse cluster computing applications and the growing number of machines in typical clusters, it is a challenge to design cluster schedulers that provide flexible, scalable, and effcient resource allocations

Cluster SchedulerMonolithic State Scheduling(MSS) • Traditional & popular ones: (LSF , condor Hadoop) • Concept: a single scheduling agent process that makes all scheduling decisions sequentially • Usage: The agent takes input about framework requirements, resource availability, and organizational policies, and computes a global schedule for all tasks

Cluster SchedulerMonolithic State Scheduling(MSS 2) • Advantage: optimal scheduling. Global! • Challenge: • Complexity: capture all framework requirements • Scalebility : New frameworks emerge • Lose framework’s own scheduling optimization

Cluster Scheduler (update) • Scalability. (response time, number of machines) • Flexibility (heterogeneous mix of job) • Usability and Maintainability(easily adapt new types of jobs, frameworks) • Fault isolation(Minimize dependencies between unrelated jobs) • Utilization(Achieve high cluster resource utilization. e.g., cpu utilization, memory utilization)

Partitioned State Scheduling(PSS) • PSS: in PSS, cluster state is divided between multiple scheduling agents as non-overlapping scheduling domains • Statically Partitioned State Scheduling (SPS): statically set cluster resources for particular frameworks. • Dynamically Partitioned State Scheduling(DPS) : Mesos NSDI 2011

Replicated State Scheduling(RSS) • in RSS scheduling domains may overlap and optimistic consistency control is used to resolve conflicting transactions • Omega EuroSys 2013

Cluster Environment • Use of commodity servers • Tens to hundreds of thousands of servers • Heterogeneous resources • Use of commodity networks

Mix workloads • Service Jobs vs. Terminating Jobs • Service Jobs consist of a set of service tasks that conceptually are intended to run forever, and these tasks are interacted with by means of request-response interfaces. , e.g., a set of web servers or relational database servers. • Terminating Jobs, on the other hand, are given a set of inputs, perform some work as a function of those inputs, and are intended terminate eventually (traditional HPC cluster management only considers this)

Mesos • Goal: • Support and demonstrate multi-agent scheduling • Support fair-sharing meta-scheduling policy • Increase overall cluster utilization • Scale to tens of thousands of machines and hundreds of jobs • Mesos aims to provide a scalable and resilient core for enabling various frameworks to efficiently share clusters. • To define a minimal interface that enables efficient resource sharing across frameworks, • push control of task scheduling and execution to the frameworks.

Mesos (resource allocation) • Resource offer strategy: spare available resource • Fairness • Priority • Framework rejects offer if resource allocation is not satisfied. Wait for a good offer.

Mesos(sum) • take advantage of short tasks to increase cluster utilization • Two-level :Resource offer and scheduler makes it scalable. • Good at batch jobs. How about service jobs?

RSS Omega • One of the major drawbacks of Partitioned State Scheduling is that scheduling domains must be selected before the scheduling agent performs its task-resource assignments, thereby potentially restricting the “goodness of fit” that might be achieved by the scheduling agent in its task-resource assignments.

Role of job manager • 1.If job queue is not empty, remove next job from job queue • 2. Sync: Begin a transaction by synchronizing private cluster state with common cluster • 3. Schedule: Engage scheduling agent to attempt to create task-resource assignments for all tasks in job, modifying private cluster state in the process. • 4. Submit: Attempt to commit job transaction (i.e., all task-resource assignments for the job) from private cluster state back to common cluster state. Job transaction can succeed or fail. • 5. Record which task-resource assignments were successfully committed to common cluster state. • 6. If any tasks in job remain unscheduled---either because no suitable resources were found for the task during the “schedule" stage or the task-resource assignment, experienced a---insert job back into job queue to be handled again in a future transaction

Role of meta scheduling agent • Attempt to execute transactions submitted by job managers according to transaction mode settings • Detect conflicts according to conflict detection semantics and policies • Enforce meta-scheduling policies

For each submitted job • 1. reject task-resource assignments that would violate policies • 2. reject task-resource assignments that conflict with previously accepted transactions • 3. reject job all task-resource assignments in a job transaction if using all-or-nothing transaction semantics and at least one task-resource assignment was rejected

Conflict mode • Machine-granularity: • Reject task-resourse assign only if sequential number and machine are both conflicted. • Resource-granularity: • Reject task-resourse assign if cluster state enter an invalid state(cpu memory)

Transaction Granularity Semantics • all-or-nothing • if a single task-resource assignment conflicts thennone of the task-resource assignments in the transaction are applied to common cluster state • incremental • each task-resource assignment can fail independent of all others

Omega (sum) • trade of between whole cluster resource utilization and conflicts. • By having a view of whole cluster, dynastically assign task number based on global optimal is feasible • Meta-sheduler needs more modification

Thanks

Cluster Scheduler

Cluster Scheduler

Presentation Transcript

Deploy Scheduler

Cranberry Scheduler

Linux Scheduler

Scheduler

Scheduler Activations

Linux Scheduler

Meeting Scheduler

Video Scheduler

Scheduler Basics

Referee Scheduler

Sassy Scheduler

CPU Scheduler

ORAC Scheduler

Meeting Scheduler

JF Scheduler

Exploring Quartz Scheduler quartz-scheduler

Loading Scheduler

Scheduler 1.0

HOOP Scheduler

Scheduler Activations

STAR Scheduler

SCHEDULER ACTIVATIONS