EECS 262a Advanced Topics in Computer Systems Lecture 11 Scheduling October 3 rd , 2019

EECS 262a Advanced Topics in Computer SystemsLecture 11SchedulingOctober 3rd, 2019 John KubiatowiczElectrical Engineering and Computer Sciences University of California, Berkeley http://www.eecs.berkeley.edu/~kubitron/cs262

Today’s Papers • Lottery Scheduling: Flexible Proportional-Share Resource ManagementCarl A. Waldspurger and William E. Weihl. Appears in Proceedings of the First USENIX Symposium on Operating Systems Design and Implementation (OSDI), 1994 • Integrating Multimedia Applications in Hard Real-Time SystemsLuca Abeni and Giorgio Buttazzo. Appears in Proceedings of the Real-Time Systems Symposium (RTSS), 1998 • Thoughts?

Scheduling Review • Scheduling: selecting a waiting process and allocating a resource (e.g., CPU time) to it • First Come First Serve (FCFS/FIFO) Scheduling: • Run threads to completion in order of submission • Pros: Simple (+) • Cons: Short jobs get stuck behind long ones (-) • Round-Robin Scheduling: • Give each thread a small amount of CPU time (quantum) when it executes; cycle between all ready threads • Pros: Better for short jobs (+) • Cons: Poor when jobs are same length (-)

Long-Running Compute tasks demoted to low priority Multi-Level Feedback Scheduling • A scheduling method for exploiting past behavior • First used in Cambridge Time Sharing System (CTSS) • Multiple queues, each with different priority • Higher priority queues often considered “foreground” tasks • Each queue has its own scheduling algorithm • e.g., foreground – Round Robin, background – First Come First Serve • Sometimes multiple RR priorities with quantum increasing exponentially (highest:1ms, next:2ms, next: 4ms, etc.) • Adjust each job’s priority as follows (details vary) • Job starts in highest priority queue • If timeout expires, drop one level • If timeout doesn’t expire, push up one level (or to top)

Lottery Scheduling • Very general, proportional-share scheduling algorithm • Problems with traditional schedulers: • Priority systems are ad hoc at best: highest priority always wins, starvation risk • “Fair share” implemented by adjusting priorities with a feedback loop to achieve fairness over the (very) long term (highest priority still wins all the time, but now the Unix priorities are always changing) • Priority inversion: high-priority jobs can be blocked behind low-priority jobs • Schedulers are complex and difficult to control with hard to understand behaviors

Lottery Scheduling • Give each job some number of lottery tickets • On each time slice, randomly pick a winning ticket and give owner the resource • On average, resource fraction (CPU time) is proportional to number of tickets given to each job • Tickets can be used for a wide variety of different resources (uniform) and are machine independent (abstract)

How to Assign Tickets? • Priority determined by the number of tickets each process has: • Priority is the relative percentage of all of the tickets competing for this resource • To avoid starvation, every job gets at least one ticket (everyone makes progress)

Lottery Scheduling Example • Assume short jobs get 10 tickets, long jobs get 1 ticket

How Fair is Lottery Scheduling? • If client has probability p= (t/T) of winning, then the expected number of wins (from the binomial distribution) is np • Probabilistically fair • Variance of binomial distribution: σ2 = np(1 – p) • Accuracy improves with √n • Geometric distribution yields number of tries until first win • Advantage over strict priority scheduling: behaves gracefully as load changes • Adding or deleting a job affects all jobs proportionally, independent of how many tickets each job possesses • Big picture answer: mostly accurate, but short-term inaccuracies are possible • Stride Scheduling provides follow-on solution

Ticket Transfer • How to deal with dependencies • Basic idea: if you are blocked on someone else, give them your tickets • Example: client-server • Server has no tickets of its own • Clients give server all of their tickets during RPC • Server’s priority is the sum of the priorities of all of its active clients • Server can use lottery scheduling to give preferential service to high-priority clients • Very elegant solution to long-standing problem (not the first solution however)

Ticket Inflation • Make up your own tickets (print your own money) • Only works among mutually trusting clients • Presumably works best if inflation is temporary • Allows clients to adjust their priority dynamically with zero communication

Currencies • Set up an exchange rate with the base currency • Enables inflation just within a group • Also isolates from other groups • Simplifies mini-lotteries, such as for a mutex

Compensation Tickets • What happens if a thread is I/O bound and regular blocks before its quantum expires? • Without adjustment, this implies that thread gets less than its share of the processor • Basic idea: • If you complete fraction f of the quantum, your tickets are inflated by 1/f until the next time you win • Example: • If B on average uses 1/5 of a quantum, its tickets will be inflated 5x and it will win 5 times as often and get its correct share overall

Linux Completely Fair Scheduler (CFS) • First appeared in 2.6.23, modified in 2.6.24 • “CFS doesn't track sleeping time and doesn't use heuristics to identify interactive tasks—it just makes sure every process gets a fair share of CPU within a set amount of time given the number of runnable processes on the CPU.” • Inspired by Networking “Fair Queueing” • Each process given their fair share of resources • Models an “ideal multitasking processor” in which N processes execute simultaneously as if they truly got 1/N of the processor • Tries to give each process an equal fraction of the processor • Priorities reflected by weights such that increasing a task’s priority by 1 always gives the same fractional increase in CPU time – regardless of current priority

CFS (Continued) • Idea: track amount of “virtual time” received by each process when it is executing • Take real execution time, scale by weighting factor • Higher priority  real time divided by greater weight • Actually – multiply by sum of all weights/current weight • Keep virtual time advancing at same rate • Targeted latency (): period of time after which all processes get to run at least a little • Each process runs with quantum • Never smaller than “minimum granularity” • Use of Red-Black tree to hold all runnable processes as sorted on vruntime variable • O(log n) time to perform insertions/deletions • Cash the item at far left (item with earliest vruntime) • When ready to schedule, grab version with smallest vruntime (which will be item at the far left).

Is this a good paper? • What were the authors’ goals? • What about the evaluation/metrics? • Did they convince you that this was a good system/approach? • Were there any red-flags? • What mistakes did they make? • Does the system/approach meet the “Test of Time” challenge? • How would you review this paper today?

RealtimeSceduling Motivation: Consolidation/Energy Efficiency • Consider current approach with ECUs in cars: • Today: 50-100 individual “Engine Control Units” • Trend: Consolidate into smaller number of processors • How to provide guarantees? • Better coordination for hard realtime/streaming media • Save energy rather than “throwing hardware at it”

Recall: Non-Real-Time Scheduling • Primary Goal: maximize performance • Secondary Goal: ensure fairness • Typical metrics: • Minimize response time • Maximize throughput • E.g., FCFS (First-Come-First-Served), RR (Round-Robin)

Characteristics of a RTSSlides adapted from Frank Drew • Extreme reliability and safety • Embedded systems typically control the environment in which they operate • Failure to control can result in loss of life, damage to environment or economic loss • Guaranteed response times • We need to be able to predict with confidence the worst case response times for systems • Efficiency is important but predictability is essential • In RTS, performance guarantees are: • Task- and/or class centric • Often ensured a priori • In conventional systems, performance is: • System oriented and often throughput oriented • Post-processing (… wait and see …) • Soft Real-Time • Attempt to meet deadlines with high probability • Important for multimedia applications

Typical Realtime Workload Characteristics • Tasks are preemptable, independent with arbitrary arrival (=release) times • Times have deadlines (D) and known computation times (C) • Tasks execute on a uniprocessor system • Example Setup:

Example: Non-preemptive FCFS Scheduling

Example:Round-Robin Scheduling

Real-Time Scheduling • Primary goal: ensure predictability • Secondary goal: ensure predictability • Typical metrics: • Guarantee miss ratio = 0 (hard real-time) • Guarantee Probability(missed deadline) < X% (firm real-time) • Minimize miss ratio / maximize completion ration (firm real-time) • Minimize overall tardiness; maximize overall usefulness (soft real-time) • E.g., EDF (Earliest Deadline First), LLF (Least Laxity First), RMS (Rate-Monotonic Scheduling), DM (Deadline Monotonic Scheduling) • Real-time is about enforcing predictability, and does not equal to fast computing!!!

Task Assignment and Scheduling • Cyclic executive scheduling ( later) • Cooperative scheduling • scheduler relies on the current process to give up the CPU before it can start the execution of another process • A static priority-driven scheduler can preempt the current process to start a new process. Priorities are set pre-execution • E.g., Rate-monotonic scheduling (RMS), Deadline Monotonic scheduling (DM) • A dynamic priority-driven scheduler can assign, and possibly also redefine, process priorities at run-time. • Earliest Deadline First (EDF), Least Laxity First (LLF)

Simple Process Model • Fixed set of processes (tasks) • Processes are periodic, with known periods • Processes are independent of each other • System overheads, context switches etc, are ignored (zero cost) • Processes have a deadline equal to their period • i.e., each process must complete before its next release • Processes have fixed worst-case execution time (WCET)

Performance Metrics • Completion ratio / miss ratio • Maximize total usefulness value (weighted sum) • Maximize value of a task • Minimize lateness • Minimize error (imprecise tasks) • Feasibility (all tasks meet their deadlines)

Scheduling Approaches (Hard RTS) • Off-line scheduling / analysis (static analysis + static scheduling) • All tasks, times and priorities given a priori (before system startup) • Time-driven; schedule computed and hardcoded (before system startup) • E.g., Cyclic Executives • Inflexible • May be combined with static or dynamic scheduling approaches • Fixed priority scheduling (static analysis + dynamic scheduling) • All tasks, times and priorities given a priori (before system startup) • Priority-driven, dynamic(!) scheduling • The schedule is constructed by the OS scheduler at run time • For hard / safety critical systems • E.g., RMA/RMS (Rate Monotonic Analysis / Rate Monotonic Scheduling) • Dynamic priority scheduling • Tasks times may or may not be known • Assigns priorities based on the current state of the system • For hard / best effort systems • E.g., Least Completion Time (LCT), Earliest Deadline First (EDF), Least Slack Time (LST)

Schedulability Test • Test to determine whether a feasible schedule exists • Sufficient Test • If test is passed, then tasks are definitely schedulable • If test is not passed, tasks may be schedulable, but not necessarily • Necessary Test • If test is passed, tasks may be schedulable, but not necessarily • If test is not passed, tasks are definitely not schedulable • Exact Test (= Necessary + Sufficient) • The task set is schedulable if and only if it passes the test.

Rate Monotonic Analysis: Assumptions A1: Tasks are periodic (activated at a constant rate). Period = Interval between two consecutive activations of task A2: All instances of a periodic task have the same computation time A3: All instances of a periodic task have the same relative deadline, which is equal to the period A4: All tasks are independent (i.e., no precedence constraints and no resource constraints) Implicit assumptions: A5: Tasks are preemptable A6: No task can suspend itself A7: All tasks are released as soon as they arrive A8: All overhead in the kernel is assumed to be zero (or part of )

Rate Monotonic Scheduling: Principle • Principle: Each process is assigned a (unique) priority based on its period (rate); always execute active job with highest priority • The shorter the period the higher the priority ( 1 = low priority) • W.l.o.g. number the tasks in reverse order of priority

Example: Rate Monotonic Scheduling • Example instance • RMA - Gant chart

Example: Rate Monotonic Scheduling Deadline Miss 0 5 10 15 response time of job

Utilization 0 5 10 15

RMS: Schedulability Test Theorem (Utilization-based Schedulability Test): A periodic task set with is schedulable by the rate monotonic scheduling algorithm if: This schedulability test is “sufficient”! • For harmonic periods ( evenly divides ), the utilization bound is 100%

RMS Schedulability Example • For our failed example earlier: • The schedulability test requires: • Hence, we get Does not satisfyschedulabilitycondition!

EDF: Assumptions A1: Tasks are periodic or aperiodic. Period = Interval between two consequtive activations of task A2: All instances of a periodic task have the same computation time A3: All instances of a periodic task have the same relative deadline, which is equal to the period A4: All tasks are independent (i.e., no precedence constraints and no resource constraints) Implicit assumptions: A5: Tasks are preemptable A6: No task can suspend itself A7: All tasks are released as soon as they arrive A8: All overhead in the kernel is assumed to be zero (or part of )

EDF Scheduling: Principle • Preemptive priority-based dynamic scheduling • Each task is assigned a (current) priority based on how close the absolute deadline is. • The scheduler always schedules the active task with the closest absolute deadline. 0 5 10 15

EDF: Schedulability Test Theorem (Utilization-based Schedulability Test): A task set with is schedulable by the earliest deadline first (EDF) scheduling algorithm if: Exact schedulability test (necessary + sufficient) Proof: [Liu and Layland, 1973]

EDF Optimality EDF Properties • EDF is optimal with respect to feasibility (i.e., schedulability) • EDF is optimal with respect to minimizing the maximum lateness

EDF Example: Domino Effect EDF minimizes lateness of the “most tardy task” [Dertouzos, 1974]

Constant Bandwidth Server • Intuition: give fixed share of CPU to certain of jobs • Good for tasks with probabilistic resource requirements • Basic approach: Slots (called “servers”) scheduled with EDF, rather than jobs • CBS Server defined by two parameters: Qs and Ts • Mechanism for tracking processor usage so that no more than Qs CPU seconds used every Ts seconds (or whatever measurement you like) when there is demand. Otherwise get to use processor as you like • Since using EDF, can mix hard-realtime and soft realtime:

Comparison of CBS with EDF in Overload • If scheduled items do not meet EDF schedulability: • EDF yields unpredictable results when overload starts and stops • CBS servers provide better isolation • Isolation particularly important when deadlines crucial • I.e. hard realtime tasks Overload with EDF Overload with CBS

Is this a good paper? • What were the authors’ goals? • What about the evaluation/metrics? • Did they convince you that this was a good system/approach? • Were there any red-flags? • What mistakes did they make? • Does the system/approach meet the “Test of Time” challenge? • How would you review this paper today?

EECS 262a Advanced Topics in Computer Systems Lecture 11 Scheduling October 3 rd , 2019

EECS 262a Advanced Topics in Computer Systems Lecture 11 Scheduling October 3 rd , 2019

Presentation Transcript

EECS 262a Advanced Topics in Computer Systems Lecture 1 Introduction/UNIX August 29, 2019

EECS 262a Advanced Topics in Computer Systems Lecture 3 Filesystems September 5 th , 2019

EECS 262a Advanced Topics in Computer Systems Lecture 15 PDBMS / Spark October 15 th , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 8 Mesa September 24 th , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 16 C-Store / DB Cracking October 22 nd , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 18 Software Routers/ RouteBricks October 29 th , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 16 C-Store / DB Cracking October 28 th , 2013

EECS 262a Advanced Topics in Computer Systems Lecture 3 Filesystems ( Con’t ) September 10 th , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 24 Paxos/Megastore November 26 th , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 19 Xen /Microkernels October 31 st , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 19 Xen /Microkernels November 5 sh , 2013

EECS 262a Advanced Topics in Computer Systems Lecture 1 Introduction/UNIX August 27 th , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 3 Filesystems September 11 th , 2013

EECS 262a Advanced Topics in Computer Systems Lecture 3 Filesystems September 5 th , 2012

EECS 262a Advanced Topics in Computer Systems Lecture 15 PDBMS / Spark October 23 rd , 2013

EECS 262a Advanced Topics in Computer Systems Lecture 1 Introduction/UNIX September 3 rd , 2014

EECS 262a Advanced Topics in Computer Systems Lecture 1 Introduction/UNIX September 4 th , 2013

EECS 262a Advanced Topics in Computer Systems Lecture 19 Xen/Microkernels April 4 th , 2016