CISC 3595 Operating Systems Part 2

CISC 3595 Operating SystemsPart 2 Midterm Review (Chapters 1-5, part of 6)

Chapter 4: Threads

Multithreaded Server Architecture

Benefits • Responsiveness – may allow continued execution if part of process is blocked, especially important for user interfaces • Resource Sharing – threads share resources of process, easier than shared memory or message passing • Economy – cheaper than process creation, thread switching lower overhead than context switching • Scalability – process can take advantage of multiprocessor architectures

Multicore Programming • Multicore or multiprocessor systems putting pressure on programmers, challenges include: • Dividing activities • Balance • Data splitting • Data dependency • Testing and debugging • Parallelism implies a system can perform more than one task simultaneously. Requires more than 1 processor. • Concurrency supports more than one task making progress • Single processor / core, scheduler providing concurrency • Processes are interleaved in a way that allows them all to make progress. Pseudo-parallelism.

Concurrency vs. Parallelism • Concurrent execution on single-core system (interleaved): • Parallelism on a multi-core system:

Single and Multithreaded Processes

User Threads and Kernel Threads • User threads - management done by user-level threads library • Three primary thread libraries: • POSIX Pthreads • Windows threads • Java threads • Kernel threads - Supported by the Kernel • Examples – virtually all general purpose operating systems, including: • Windows • Linux • Mac OS X • iOS • Android

Data and Task Parallelism

Amdahl’s Law • Identifies performance gains from adding additional cores to an application that has both serial and parallel components • S is serial portion; N processing cores • That is, if application is 75% parallel / 25% serial, moving from 1 to 2 cores results in speedup of 1.6 times • As N approaches infinity, speedup approaches 1 / S Serial portion of an application has a disproportionate effect on performance gained by adding additional cores. • But does the law take into account contemporary multicore systems?

Amdhal’s Law • S = .25 and N = 2 1/(.25 + (1-.25/2)) 1/(.25 + (.75/2)) 1/(.25 + .375) 1/.625 • 1.6 speedup

Multithreading Models • Multithreading Models refer to mapping user threads to kernel threads. • Many-to-One • One-to-One • Many-to-Many

Many-to-One • Many user-level threads mapped to single kernel thread • One thread blocking causes all to block • Multiple threads may not run in parallel on muticore system because only one may be in kernel at a time • Few systems currently use this model

One-to-One • Each user-level thread maps to kernel thread • Creating a user-level thread creates a kernel thread • More concurrency than many-to-one • Number of threads per process sometimes restricted due to overhead

Many-to-Many Model • Allows many user level threads to be mapped to many kernel threads • Allows the operating system to create a sufficient number of kernel threads • Windows with the ThreadFiber package

Two-level Model • Similar to M:M, except that it allows a user thread to be bound to kernel thread

Thread Libraries • Thread libraryprovides programmer with API for creating and managing threads • Two primary ways of implementing • Library entirely in user space • Kernel-level library supported by the OS

Pthreads • May be provided either as user-level or kernel-level • A POSIX standard (IEEE 1003.1c) API for thread creation and synchronization • Specification, not implementation • API specifies behavior of the thread library, implementation is up to development of the library • Common in UNIX operating systems (Solaris, Linux, Mac OS X)

Pthreads Example

Pthreads Example (Cont.)

Pthreads API • pthread_t // thread data including id. • pthread_attr_t // Opaque struct. Uses functions to get/set attrs • pthread_attr_init(pthread_attr* attrs); • Returns the default thread attrs. • pthread_create(pthread_t* thread, const pthread_attr_t* attrs, void *(*start_routine) (void *), void *arg); • pthread_join(pthread_t thread, void** retval); • Blocks until the given thread returns. If retval is non-NULL, then the pthread_exitvalue is returned in retval • pthread_exit(void* retval); • Exits the thread, if the main thread calls pthread_exit rather than exit, the children can continue executing.

Thread Pools • Create a number of threads in a pool where they await work • Advantages: • Usually slightly faster to service a request with an existing thread than create a new thread • Allows the number of threads in the application(s) to be bound to the size of the pool • Separating task to be performed from mechanics of creating task allows different strategies for running task • i.e.Tasks could be scheduled to run periodically

Threading Issues • Semantics of fork() and exec() system calls • Signal handling • Synchronous and asynchronous • Thread cancellation of target thread • Asynchronous or deferred • Thread-local storage • Scheduler Activations

Semantics of fork() and exec() • Does fork()duplicate only the calling thread or all threads? • Some UNIXes have two versions of fork • exec() usually works as normal – replace the running process including all threads

Signal Handling • Signals are used in UNIX systems to notify a process that a particular event has occurred. • A signal handleris used to process signals • Signal is generated by particular event • Signal is delivered to a process • Signal is handled by one of two signal handlers: • default • user-defined • Every signal has default handler that kernel runs when handling signal • User-defined signal handler can override default • For single-threaded, signal delivered to process

Signal Handling (Cont.) • Where should a signal be delivered for multi-threaded? • Deliver the signal to the thread to which the signal applies • Deliver the signal to every thread in the process • Deliver the signal to certain threads in the process • Assign a specific thread to receive all signals for the process

Thread Cancellation • Terminating a thread before it has finished • Thread to be canceled is target thread • Two general approaches: • Asynchronous cancellationterminates the target thread immediately • Deferred cancellationallows the target thread to periodically check if it should be cancelled • Pthread code to create and cancel a thread:

Thread Cancellation (Cont.) • Invoking thread cancellation requests cancellation, but actual cancellation depends on thread state • If thread has cancellation disabled, cancellation remains pending until thread enables it • Default type is deferred • Cancellation only occurs when thread reaches cancellation point • I.e. pthread_testcancel() • Then cleanup handler is invoked • On Linux systems, thread cancellation is handled through signals

Thread-Local Storage • Thread-local storage (TLS) allows each thread to have its own copy of data • Useful when you do not have control over the thread creation process (i.e., when using a thread pool) • Different from local variables • Local variables visible only during single function invocation • TLS visible across function invocations • Similar to static data • TLS is unique to each thread

Chapter 5 CPU Scheduling

Basic Concepts • Maximum CPU utilization obtained with multiprogramming • CPU–I/O Burst Cycle – Process execution consists of a cycle of CPU execution and I/O wait • CPU burst followed by I/O burst • CPU burst distribution is of main concern • Most programs alternate between CPU and I/O activity. • Studies show that programs either have high frequency of short CPU bursts or low frequency of long CPU bursts. • Scheduling to maximize CPU utilization.

CPU Scheduler • Short-term scheduler selects from among the processes in ready queue, and allocates the CPU to one of them • Queue may be ordered in various ways • CPU scheduling decisions may take place when a process: 1. Switches from running to waiting state 2. Switches from running to ready state 3. Switches from waiting to ready • Terminates • Scheduling under 1 and 4 is nonpreemptive • All other scheduling is preemptive • Consider access to shared data • Consider preemption while in kernel mode • Consider interrupts occurring during crucial OS activities

Diagram of Process State

Dispatcher • Dispatcher module gives control of the CPU to the process selected by the short-term scheduler; this involves: • switching context • switching to user mode • jumping to the proper location in the user program to restart that program • Dispatch latency – time it takes for the dispatcher to stop one process and start another running

CPU Switch From Process to Process

Scheduling Criteria • CPU utilization – keep the CPU as busy as possible [Maximize] • Throughput – # of processes that complete their execution per time unit [Maximize] • Turnaround time – amount of time to execute a particular process [Minimize] • Waiting time – amount of time a process has been waiting in the ready queue [Minimize] • Response time – amount of time it takes from when a request was submitted until the first response is produced, not output (for time-sharing environment) [Minimize]

CPU Scheduling Analytics

First- Come, First-Served (FCFS) Scheduling ProcessBurst Time P1 24 P2 3 P3 3 • Suppose that the processes arrive in the order: P1 , P2 , P3 The Gantt Chart for the schedule is: • Waiting time for P1 = 0; P2 = 24; P3 = 27 • Average waiting time: (0 + 24 + 27)/3 = 17 • Average throughput = [1/24 + 2/27 + 3/30 ]/3 = .072 process/msec • Average turnaround = (24 + 27 + 30)/3 = 27

FCFS Scheduling (Cont.) Suppose that the processes arrive in the order: P2 , P3 , P1 • The Gantt chart for the schedule is: • Waiting time for P1 = 6;P2 = 0; P3 = 3 • Average waiting time: (6 + 0 + 3)/3 = 3 • Much better than previous case • Convoy effect - short process backed up behind long process • Consider one CPU-bound and many I/O-bound processes • Is there a change in throughput? .2555 • Is there a change in turnaround? 13

Example of SJF ProcessArriva l TimeBurst Time P10.0 6 P2 2.0 8 P34.0 7 P45.0 3 • SJF scheduling chart • Average waiting time = (3 + 16 + 9 + 0) / 4 = 7 • Average throughput time = [1/3 + 2/9 + 3/16 + 4/24]/4 = .227proc/msec • Average turnaround time = (3 + 9 + 16 + 24)/4 = 13

Shortest-Job-First (SJF) Scheduling • Associate with each process the length of its next CPU burst • Use these lengths to schedule the process with the shortest time • SJF is optimal – gives minimum average waiting time for a given set of processes • The difficulty is knowing the length of the next CPU request • Not practical since predicting is extremely hard. • Could ask the user

Example of Shortest-remaining-time-first • Now we add the concepts of varying arrival times and preemption to the analysis ProcessAarriArrival TimeTBurst Time P10 8 P2 1 4 P32 9 P43 5 • Preemptive SJF Gantt Chart • Average waiting time = [(10-1)+(1-1)+(17-2)+5-3)]/4 = 26/4 = 6.5 msec

Priority Scheduling • A priority number (integer) is associated with each process • The CPU is allocated to the process with the highest priority (smallest integer  highest priority) • Preemptive • Nonpreemptive • SJF is priority scheduling where priority is the inverse of predicted next CPU burst time • Problem  Starvation– low priority processes may never execute • Solution  Aging– as time progresses increase the priority of the process

Example of Priority Scheduling ProcessAarriBurst TimeTPriority P1 10 3 P2 1 1 P32 4 P41 5 P5 5 2 • Priority scheduling Gantt Chart • Average waiting time = (0+1+6+16+18)/5 = 8.2 msec • Average turn-around = (1+6+16+18+19)/5 = 12 msec • Average throughput = (1/1 + 2/6 + 3/16 + 4/18 + 5/19)/5 = .402 jobs/msec

Round Robin (RR) • Each process gets a small unit of CPU time (timequantumq), usually 10-100 milliseconds. After this time has elapsed, the process is preempted and added to the end of the ready queue. • If there are n processes in the ready queue and the time quantum is q, then each process gets 1/n of the CPU time in chunks of at most q time units at once. No process waits more than (n-1)q time units. • Timer interrupts every quantum to schedule next process • Performance • q large  FIFO • q small  q must be large with respect to context switch, otherwise overhead is too high

Example of RR with Time Quantum = 4 ProcessBurst Time P1 24 P2 3 P3 3 • The Gantt chart is: • Typically, higher average turnaround than SJF, but better response • q should be large compared to context switch time • q usually 10ms to 100ms, context switch < 10 usec

Time Quantum and Context Switch Time The greater the time quantum, the fewer the number of context Switches. Number of context switches is inversely related to time quantum.

Multilevel Queue • Ready queue is partitioned into separate queues, eg: • foreground (interactive) • background (batch) • Process permanently in a given queue • Each queue has its own scheduling algorithm: • foreground – RR • background – FCFS • Scheduling must be done between the queues: • Fixed priority scheduling; (i.e., serve all from foreground then from background). Possibility of starvation. • Time slice – each queue gets a certain amount of CPU time which it can schedule amongst its processes; i.e., 80% to foreground in RR • 20% to background in FCFS

Multilevel Queue Scheduling

Multilevel Feedback Queue • A process can move between the various queues; aging can be implemented this way • Multilevel-feedback-queue scheduler defined by the following parameters: • number of queues • scheduling algorithms for each queue • method used to determine when to upgrade a process • method used to determine when to demote a process • method used to determine which queue a process will enter when that process needs service

CISC 3595 Operating Systems Part 2