W4118 Operating Systems

W4118 Operating Systems Instructor: Junfeng Yang

Logistics • CLIC machines are ready

Last lecture: concurrency errors • Why synchronization is hard • Concurrency error patterns • Deadlock • Race • Data race • Atomicity • Order • Concurrency error detection • Deadlock detection • Lock order • Race detection • Happens-before • Eraser

Today • Eraser (cont.) • Intro to scheduling

Recall: Lockset algorithm v1: infer the locks • Intuition: it must be one of the locks held at the time of access • C(v): a set of candidate locks for protecting v • Initialize C(v) to the set of all locks • On access to v by thread t, refine C(v) • C(v) = C(v) ^ locks_held(t) • If C(v) = {}, report error • Sounds good! But …

Recall: Problems with lockset v1 • Initialization • When shared data is first created and initialized • Read-shared data • Shared data is only read (once initialized) • Read/write lock • We’ve seen it last week • Locks can be held in either write mode or read mode

Recall: State transitions • Each shared data value (memory location) is in one of the four states Virgin write, first thread Exclusive write, new thread Shared/ Modified Refine C(v) and check Read, new thread Shared Refine C(v), no check write

Read-write locks • Read-write locks allow a single writer and multiple readers • Locks can be held in read mode and write mode • read_lock(m); read v; read_unlock(m) • write_lock(m); write v; write_unlock(m) • Locking discipline • Lock can be held in some mode (read or write) for read access • Lock must be held in write mode for write access • A write access with lock held in read mode  error

Handling read-write locks • Idea: distinguish read and write access when refining lockset • On each read of v by thread t (same as before) • C(v) = C(v) ^ locks_held(t) • If C(v) = {}, report error • On each write of v by thread t • C(v) = C(v) ^ write_locks_held(t) • If C(v) = {}, report error

Results • Eraser works • Find bugs in mature software • Though many limitations • Major: benign races (intended races) • However, slow • 10-30X slow down • Instrumentation each memory access is costly • Can be made faster • With static analysis • Smarter instrumentation • Lockset algorithm is influential, used by many tools • E.g. Helgrind (a race detetection tool in Valgrind)

Benign race example • Double-checking locking • Faster if v is often 0 • Doesn’t work with compiler/hardware reordering if(v) { // benign race lock(m); if(v) …; unlock(m); }

Today • Eraser (cont.) • Intro to scheduling • Basic concepts • Different scheduling algorithms • Advantages and disadvantages

Direction within Course • Until now: Interrupts, Processes, Threads • Mostly mechanisms • From now on: Resources • Resources: things processes operate upon • E.g., CPU time, memory, disk space • Policy • We’ll look at how to manage CPU time with scheduling today

Types of Resource • Preemptible • OS can take resource away, use it for something else, and give it back later • E.g., CPU • Non-preemptible • Once given resource, it can’t be reused until voluntarily relinquished • E.g., disk space • Given set of resources and set of requests for the resources, types of resource determines how OS manages it

Decisions about Resources • Allocation: Which process gets which resources • Which resources should each process receive? • Space sharing: Control access to resource • Implication: Resources are not easily preemptible • E.g., Disk space • Scheduling: How long process keeps resource • In which order should requests be serviced? • Time sharing: More resources requested than can be granted • Implication: Resource is preemptible • E.g., Processor scheduling

Alternating Sequence of CPU And I/O Bursts

Role of Dispatcher vs. Scheduler • Dispatcher • Low-level mechanism • Responsibility: context switch • Save previous process state in PCB • Load next process state from PCB to registers • Change scheduling state of process (running, ready, or blocked) • Migrate processes between different scheduling queues • Switch from kernel to user mode • Scheduler • High-level policy • Responsibility: Deciding which process to run • Could have an Allocator for CPU as well • Parallel and distributed systems

Preemptive vs. Nonpreemptive Scheduling • When does scheduler need to make a decision? When a process • Switches from running to waiting state • Switches from running to ready state • Switches from waiting to ready • Terminates • Minimal: Nonpreemptive • When? • Additional circumstances: Preemptive • When?

Scheduling Performance Metrics • Min waiting time: don’t have process wait long in ready queue • Max CPU utilization: keep CPU busy • Max throughput: complete as many processes as possible per unit time • Min turnaround time: complete as fast as possible • Min response time: respond immediately • Fairness: give each process (or user) same percentage of CPU

Scheduling Algorithms • Next, we’ll look at different scheduling algorithms and their advantages and disadvantages

First-Come, First-Served (FCFS) • Simplest CPU scheduling algorithm • First job that requests the CPU gets the CPU • Nonpreemptive • Advantage: simple implementation with FIFO queue • Disadvantage: waiting time depends on arrival order • short jobs that arrive later have to wait long

P1 P2 P3 0 24 27 30 Example of FCFS ProcessBurst Time P1 24 P2 3 P3 3 • Suppose that the processes arrive in the order: P1 , P2 , P3 • The Gantt Chart for the schedule is: • Waiting time for P1 = 0; P2 = 24; P3 = 27 • Average waiting time: (0 + 24 + 27)/3 = 17

P2 P3 P1 0 3 6 30 Example of FCFS (Cont.) Suppose that the processes arrive in the order P2 , P3 , P1 • The Gantt chart for the schedule is: • Waiting time for P1 = 6;P2 = 0; P3 = 3 • Average waiting time: (6 + 0 + 3)/3 = 3 • Much better than previous case • Convoy effect: short process stuck waiting for long process • Also called head of the line blocking

Shortest Job First (SJF) • Schedule the process with the shortest time • FCFS if same time • Advantage: minimize average wait time • provably optimal • Moving shorter job before longer job decreases waiting time of short job more than increases waiting time of long job • Reduces average wait time • Disadvantage • Not practical: difficult to predict burst time • Possible: past predicts future • May starve long jobs

Shortest Remaining Time First (SRTF) • SRTF == SJF with preemption • New process arrives w/ shorter CPU burst than the remaining for current process • SJF without preemption: let current process run • SRTF: preempt current process • Reduces average wait time

P1 P3 P2 P4 0 3 7 8 12 16 Example of Nonpreemptive SJF ProcessArrival TimeBurst Time P1 0.0 7 P2 2.0 4 P3 4.0 1 P4 5.0 4 • SJF • Average waiting time = (0 + 6 + 3 + 7)/4 = 4

P1 P2 P3 P2 P4 P1 11 16 0 2 4 5 7 Example of Preemptive SJF (SRTF) ProcessArrival TimeBurst Time P1 0.0 7 P2 2.0 4 P3 4.0 1 P4 5.0 4 • SJF (preemptive) • Average waiting time = (9 + 1 + 0 +2)/4 = 3

Round-Robin (RR) • Practical approach to support time-sharing • Run process for a time slice, then move to back of FIFO queue • Preempted if still running at end of time-slice • Advantages • Fair allocation of CPU across processes • Low average waiting time when job lengths vary widely

P1 P2 P3 P1 P2 P3 P1 P2 P3 P1 0 20 40 60 80 100 120 140 160 180 200 Example of RR with Time Slice = 20 Process Arrival TimeBurst Time P1 0 400 P2 20 60 P3 20 60 • The Gantt chart is: • Average waiting time: • Compare to FCFS and SJF:

Disadvantage of Round-Robin • Poor average waiting time when jobs have similar lengths • Imagine N jobs each requiring T time slices • RR: all complete roughly around time N*T • Average waiting time is even worse than FCFS! • Performance depends on length of time slice • Too high  degenerate to FCFS • Too low  too many context switch, costly • How to set time-slice length?

Time slice and Context Switch Time

Priorities • A priority is associated with each process • Run highest priority ready job (some may be blocked) • Round-robin among processes of equal priority • Can be preemptive or nonpreemptive • Representing priorities • Typically an integer • The larger the higher or the lower? • Solaris: 0-59, 59 is the highest • Linux: 0-139, 0 is the highest • Question: what data structure to use for priority scheduling? One queue for all processes?

Setting Priorities • Priority can be statically assigned • Some processes always have higher priority than others • Problem: starvation • Priority can be dynamically changed by OS • Aging: increase the priority of processes that wait in the ready queue for a long time

Priority Inversion • High priority process depends on low priority process (e.g. to release a lock) • What if another process with in-between priority arrives? • Solution: priority inheritance • inherit priority of highest process waiting for a resource • Must be able to chain multiple inheritances • Must ensure that priority reverts to original value

Multilevel Queue • Ready queue is partitioned into separate queues:foreground (interactive)background (batch) • Each queue has its own scheduling algorithm • foreground – RR • background – FCFS • Scheduling must be done between the queues • Fixed priority scheduling; (i.e., serve all from foreground then from background). Possibility of starvation. • Time slice – each queue gets a certain amount of CPU time which it can schedule amongst its processes; i.e., 80% to foreground in RR • 20% to background in FCFS

Multilevel Queue Scheduling

Multilevel Feedback Queue • A process can move between the various queues; aging can be implemented this way • Multilevel-feedback-queue scheduler defined by the following parameters: • number of queues • scheduling algorithms for each queue • method used to determine when to upgrade a process • method used to determine when to demote a process • method used to determine which queue a process will enter when that process needs service

Example of Multilevel Feedback Queue • Three queues: • Q0– RR with time quantum 8 milliseconds • Q1– RR time quantum 16 milliseconds • Q2– FCFS • Scheduling • A new job enters queue Q0which is servedFCFS. When it gains CPU, job receives 8 milliseconds. If it does not finish in 8 milliseconds, job is moved to queue Q1. • At Q1 job is again served FCFS and receives 16 additional milliseconds. If it still does not complete, it is preempted and moved to queue Q2.

Multilevel Feedback Queues

Implementation How to monitor variable access? Binary instrumentation How to represent state? For each memory word, keep a shadow word First two bits: what state the word is in How to represent lockset? The remaining 30 bits: lockset index A table maps lockset index to a set of locks Assumption: not many distinct locksets

P1 P2 P3 P4 P1 P3 P4 P1 P3 P3 0 20 37 57 77 97 117 121 134 154 162 Example of RR with Time Slice = 20 ProcessBurst Time P1 53 P2 17 P3 68 P4 24 • The Gantt chart is: • Typically, higher average turnaround than SJF, but better response

W4118 Operating Systems