Process Scheduling

Process Scheduling Chapter 5

Introduction • Policy and implementation • Objectives: • Fast response time • High throughput (turnaround time) • Avoidance of process starvation • Context switching is expensive • Context is a snapshot of the values of the general-purpose, memory management, and other special registers.

Type of Scheduling • Long-term • Performed when new process is created. • The decision to add to the pool of processes to be executed. • Medium-term • Swapping • The decision to add to the number of processes that are partially or fully in main memory

Types of Scheduling • Short-term • Which ready process to execute next • The decision as to which available processes will be executed by the processor. • FCFS, Round-Robin, Shortest process next, Shortest remaining time • I/O • The decision as to which process’s pending I/O request shall be handled by available I/O device

Scheduling and Process State Transition New Long-term scheduling Long-term scheduling Ready, suspend Running Exit Ready Short-term scheduling Medium-term scheduling Blocked, suspend Blocked Medium-term scheduling

Processor Queuing Diagram for Scheduling Long-term scheduling Time-out Batch jobs Short-term scheduling Ready Queue Release Medium-term scheduling Interactive users Ready, Suspend Queue Medium-term scheduling Blocked, Suspend Queue Blocked Queue Event Wait Event Occurs

5.2 Clock Interrupt Handling • Clock interrupt is the 2nd to the power-failure interrupt. • Tasks: • Returns the hardware clock • Update CPU usage statistics • Performs scheduler-related functions • Sends a SIGXCPU signal to the current process • Updates the time-of-day and other related clocks. • Handles callouts • Wakes up system processes • Handles alarms

5.2.1 Callouts • Records a function that the kernel must invoke at a later time. • int to_ID = timeout(void(*fn), caddr_t arg, long delta) • void untimeout(int to_ID) • Tasks: • Retransmission of network packets • Certain scheduler and memory management functions • Monitoring devices to avoid losing interrupts • Polling devices that do not support interrupts

Callout in BSD UNIX

5.2.2 Alarms • Real-time: • relates to the actual elapsed time, and notifies the process via a SIGALRM signal. • Profiling: • Measures the amount of time the process has been executing and uses the SIGPROF signal for notification. • Virtual-time: • Monitors only the time spent by the process in user mode and sends the SIGVTALRM signal.

5.3 Scheduler Goals • The scheduler must ensure that the system delivers acceptable performance to each application. • Different applications: • Interactive: 50-150ms • Batch: scientific computation • Real-time: time-critical

5.4 Traditional UNIX Scheduling • To improve response times of interactive users, while ensuring that low-priority, background jobs do not starve. • Priority-based: • User-process is preempted • Kernel is strictly non-preempted

Priority • Kernel:0-49, user: 50-127 • proc fields: • p_pri: Current scheduling priority • p_usrpri: User mode priority • p_cpu: Measure of recent CPU usage • p_nice: User-controllable nice factor • Kernel: • Sleeping priority

User mode priority • Depends on two factors: • Nice: 0-39 • CPU usage • Time-sharing: equal opportunity • decay factor: for SVR3 it is 1/2, for 4.3BSD: • decay = (2*load_average)/(2*load_average+1) • p_cpu = p_cpu* decay • p_usrpri = PUSER + (p_cpu/4) +(2*p_nice)

Example : PUSER = 50 T2 T3 T1 P1 P_usrpri= 115 P_cpu = 100 Nice = 20 P1 P_usrpri= 102 P_cpu = 50 Nice = 20 P1 P_usrpri= 110 P_cpu = 80 Nice = 20 Decay=1/2 Decay=1/2 P2 P_usrpri= 120 P_cpu = 80 Nice=25 P2 P_usrpri= 110 P_cpu = 40 Nice = 25 P2 P_usrpri= 115 P_cpu = 60 Nice = 25

Scheduler Implementation • 32 run queues: doubly linked list of proc structures for runnable processes. • whichqs: bitmask for each queue, “1” means that there is a runnable process • swtch(): context switch by p_addr • Saving part of u area (pcb) • Loading the saved context. • VAX ffs & ffc : special instructions for context switch

Run Queue Manipulation • roundrobin(): for the processes with the same priority. • schedcpu(): recomputes the priority once per second • Removes the process from the run queue; • recomputes the priority • Puts it back

When to switch context • The current process blocks on a resource or exits. • The priority recomputation procedure results in the priority of another process becoming greater than that of the current one( flag runrun set). • The current process, or an interrupt handler, wake up a higher-priority process

Analysis • Not scale well • No way to let a specific process to occupy the CPU • No guarantee to real-time applications • Little control of priorities • Kernel is non-preemptive, high-priority runnable processes may have to wait for the kernel to relinquish the CPU

5.5 The SVR4 Scheduler • Support a diverse range of applications including those requiring real-time response • Separate the scheduling policy from the mechanisms that implement it • Provide applications with greater control over their priority and scheduling. • Define a scheduling framework with a well-defined interface to the kernel • Allow new scheduling policies to be added in a modular manner, including dynamic loading of scheduler implementations. • Limit the dispatch latency for time-critical applications.

The class-independent Layer • Responsible for context switching, run queue management, & preemption.

Preemption points • Places of code where the kernel data is in a steady state and is about to begin a long computation. • In the pathname parsing routine lookuppn() • In the open system call, before file creation • In the memory subsystem, before freeing the pages of a process. • Call PREEMPT() check kprunrun

Interface to the Scheduling Classes • 3 fields of proc • p_cid: class ID, an index into the global class table • p_clfuncs: pointer to the classfuncs vector for the class • p_clproc: pointer to a class-dependent private data structure • #define CL_SLEEP(procp, clprocp, …) (*(procp)-p->clfuncs->cl_sleep)(clprocp, …)

Interface cnt’d • Entry • CL_TICK: the clock interrupt handler • CL_FORK, CL_FORKRET: fork • CL_ENTERCLASS, CL_EXITCLASS: enter, exit • CL_SLEEP: sleep() • CL_WAKEUP: wakeprocs() • Priorities: • 0-59: time-sharing class • 60-99: system priority • 100-159: real-time class

The Time Sharing Class • The default class for a process. • Round-robin scheduling: • Event-driven scheduling • tsproc: • ts_timeleft: time remaining in the quantum • ts_cpupri : system part of the priority • ts_upri: user part of the priority(nice value) • ts_umpri: user mode priority (ts_cpupri+ ts_upri) • ts_dispwait: seconds since start the quantum • Dispatcher parameter table

Dispatcher parameter table New ts_cpupri to set when returning to user mode after sleeping Number of seconds to wait for quantum expiry before using ts_lwait. New ts_cpupri to set when the quantum expires. Use instead of ts_tqexp if process took longer than ts_maxwait to use up its quantum.

The Real-Time Class • 100-159: higher than any time-sharing process. • The real-time process must wait until the current process is about to return to user mode or until it reaches a kernel preemption point. • Real-time processes require bounded dispatch latency and bounded response time. • The response time = the time for interrupt handler + dispatch latency.

The priocntl System Call • Basic operations: • Changing the priority class of the process • Setting ts_upri for time-sharing processes • Resetting priority and quantum for real-time processes • Obtaining the current value of several scheduling parameters • priocntlset: perform the same operations on a set of processes - a system/ a process group/ session/ a scheduling class/ a particular user/ having the same parent.

Adding a scheduling class • Provide an implementation of each class-dependent scheduling function • Initialize a classfuncs vector to point to these functions • Provide an initialization function to perform setup tasks such as allocating internal data structures • Add an entry for this class in the class table • Rebuild the kernel

Analysis • Provides flexible approach that allows the addition of scheduling classes to a system. • Event-driven scheduling favors I/O-bound & interactive jobs over CPU-bounded ones. • No good way for a time-sharing class process to switch to a different one. priocntl is only used by the superuser. • It is difficult to tune the system properly for a mixed set of applications. • Solaris2.x improved SVR4

5.6 Solaris 2.x Enhancements • Multithreaded, symmetric-multiprocessing OS • Preemptive Kernel • Fully preemptive • Implement interrupts by special kernel threads • Interrupt threads always run at the highest priority in the system.

Multiprocessor Support • Processors can communicate by cross-processor interrupt • Per-processor data structure • Cpu_thread: currently running thread • Cpu_dispthread: last selected to run • Cpu_idle: idle thread • Cpu_runrun: preemption flag used for time-sharing threads • Cpu_kprunrun: preemption flag set by real-time threads • Cpu_chosen_level: priority of thread that is going to preempt the current thread

Multiprocessor schedulingT6 becomes runnable - preempts T3

Hidden Scheduling • The kernel schedules the work without considering the priority of the thread for which it is doing the work. • E.G. STREAMS services. • Moving STREAMS processing into kernel threads. • Callouts handled by a special callout thread (has max system priority)

Priority Inversion • A situation where a lower-priority thread holds a resource needed by a higher priority process, thereby blocking that higher-priority process.

Solution • Solved by priority inheritance or priority lending.

Priority inheritance must be transitive.

Implementation of Priority inheritance • An extra state to implement priority inheritance • A global priority & inherited priority for each thread • pi_willto(): traverses the synchronization chain and passes on the inherited priority of the calling thread. • pi_waive(): surrenders its inherited priority.

Limitations of Priority Inheritance • Can be implemented only whenit is known which thread is going to free the resource, i.e. when the resource is held by a single, known thread. • For mutexes the owner is always known, so pr. Inh. can be used, • For semaphores, and conditions variables the owner is usually indeterminate, so pr. inh. is not used, • When a reader/writer lock is used for writing there is a single, known owner; It may be held however by multiple readers, so then there is no single owner.

Limitations of Priority Inheritance • Solaris defines an owner-of record, which is the first thread that obtained the read clock. If a higher priority writer blocks on this object, the owner-of record thread will inherit its priority. If there are other readers – they cannot inherit the writer’s priority, so the solution is limited. • While reducing the time a high-priority process must block, in the worst case however this time is still much greater than what is acceptable for many real-time applications. • Alternative solutions – ceiling protocol – it requires however a priori knowledge of all processes in the system and their resource requirements – possible in embedded applications.

Turnstiles • Restrict the sleep queue to threads blocked on a particular resource – limiting the time taken to process the queue • Threads are queued in order of their priority; • To unlock turnstile: signal – for single highest priority thread, broadcast – for all blocked threads.

Solaris scheduling evaluation • Suitable for multithreaded and many real-time applications for uni- and multiprocessors; • Still missing other desirable real-time features such as gang scheduling and deadline-driven scheduling

Linux Scheduling • Scheduling classes • SCHED_FIFO: First-in-first-out real-time threads • SCHED_RR: Round-robin real-time threads • SCHED_OTHER: Other, non-real-time threads • Within each class multiple priorities may be used

Process Scheduling