CS 6560 Operating System Design

CS 6560 Operating System Design Lecture 4: Processes

Processes • LKD: Chapter 3: Process Management • Examples from Unix, and Linux • Examination of actual Linux code

References • Our textbook: Robert Love, Linux Kernel Development, 2nd edition, Novell Press, 2005. • Linux 2.4 Internals • http://www.tldp.org/LDP/lki/lki.pdf (covers 2.4) • Understanding the LINUX Kernel, 3rd. edition, O’Reilly, 2005. (covers 2.6) • Tanenbaum, Modern Operating Systems • Linux code at www.kernel.org • Viewing Linux code in html format: http://lxr.linux.no/

General Definition of Process • Definition (General): A process is a program in execution. It has a address space for access to memory and one or more threads of execution for access to a central processing unit (CPU). It is an active entity, which has a lifetime, state, and behavior. • A program is typically a file containing executable code, which is loaded into memory when the program is run. • Lifetime means that it is created, exists, and eventually terminates. • State means that it has resources such as data and that it exists in different circumstances, sometimes running, sometimes waiting. • Behavior means that it operates on its data and other parts of the system. It executes a program.

General Definition of Thread • Definition: A thread is a function in execution. It shares an address space for access to memory with other threads within a process. This process executes the program that contains the function. A thread provides access to a central processing unit (CPU). It is an active entity, which has a lifetime, state, and behavior. • Lifetime means that it is created, exists, and eventually terminates. • State means that it has data and exists in different circumstances, sometimes running, sometimes waiting. Its state is determined by a stack and a program counter. Local variables are normally located in the stack. It shares state with other threads through global variables. • Behavior means that it operates on its data and other parts of the system. It executes a function. When that function terminates or returns the thread terminates.

Process Creation and Termination • Process are created during • System initialization • When an existing processes makes a system call to create another process • When the OS accepts and then starts a new job • When a user launches a new program • During a user program as needed • Process are terminated during • Voluntary exit • Normal exit • Error exit • Involuntary termination • Termination by the system (fatal error, resources exceeded) • Termination by another process (through a system call) • Termination by the user (perhaps through terminal I/O)

Threads and Processes • Some programs call for several threads of execution. All threads run within the same address space of the process. • Processes can be grouped so that they can be better managed. Examples: • Linux & many forms of Unix have sessions, process groups, thread groups, and processes. Processes can share address space. When they do, they are assigned the same thread id. • Windows 2000 and XP have four levels of process types • Job: an object that acts as a managed collection of processes, can be named • Process: container for resources, and executing a program • Thread: active execution entity, the execution unit to which the system allocates CPU time • Fiber: lightweight unit of execution managed by the application

POSIX Standardization • POSIX (IEEE Std 1003.1-2001) specifies a thread interface. The POSIX pthread library implements threads for application programs. For example, the pthread_create function creates a new thread that runs a specified function. This corresponds to the above definition of thread. (This is available for various versions of Unix, Linux, and even Win32.)

Processes and Resources • A typical modern OS will have the following resources for each process: • Address Space: Each process has an address space consisting of the memory that it uses. Modern systems use virtual memory (VM) to create their address space. • Open Files: Each process has a collection of files that are open • Signals: Typically, each process has a collection of signal handlers to take care of asynchronous events • File system: Typically, each process has access to files by name. These names are typically organized in a hierarchy of directories.

Linux Processes and Threads • In Linux, threads and processes are merged into one concept and handled through one kernel function do_fork(). Processes can share resources such as their address space, signal handlers, open files, etc. • The rest of these slides assumes a Unix-like operating system and Linux 2.6 in particular.

Process IDs • Each process is uniquely identified by a number called its PID. • Processes are assigned to the following types of groups: • Thread group = share same address space • Process group = run the same job such as a pipeline • Session group = part of the same terminal session • Each process has a set of identification numbers (in addition to its PID) distinguish which group a process belongs to. These are TGID, PGID, and SID.

Unix & Linux Process Initialization • In traditional Unix, when the system starts, the kernel is the only program loaded into memory. The Init process is soon created as process number (PID) 1. Typically, fork_init is usedto create this initial process. • In Linux, several threads run in the kernel (called kernel threads), taking care of such things as reclaiming memory.

Typical life of a process • Users: The user logins in, getting a process to run the user’s shell. The shell calls fork() to run each external command. This generates a new process (the child) which calls an exec_ve() function to run the command. Meanwhile, the parent process waits for the child to terminate with a form of wait(). Once the child terminates, the parent continues.

The fork Family • In Unix & Linux, new processes are generated using a form of the fork system call. • For traditional Unix, the two forms are fork and vfork. Linux also has the clone system call. They all use the same kernel function (called do_fork). • If successful, these functions return twice, once to the parent, and once to the child process. The return value informs the process as to whether it is an parent or child. A return value of 0 indicates child. A positive return value indicate parent. In that case, the return value is the PID of the newly created child process. A negative return value means an error has occurred.

fork() • The fork system call makes a nearly identical copy of the existing (parent) process. • In Linux, the fork call uses “copy-on-write” to avoid duplicating the entire address space.

Vfork() • The vfork system call makes an incomplete copy of the existing process that is only good for launching new programs (see the man page for vfork). • It returns the same values as fork, but its behavior is not guaranteed if it tries to modify data (other than a variable used to store its return value), returns from the function that called it or calls any other function except _exit or a form of exec.

clone() • Linux also has the clone system call which makes a process that can share resources such as the address space (memory) with the parent process. This, in effect, makes a thread. The parameters to this function determine what is shared and what function to execute.

_exit() & exit() • In Unix & Linux, _exit and exit, terminate processes voluntarily. • Signals are used to involuntarily terminate processes.

The exec family • The execve() family loads a new program into a process’ address space. • These functions specify a path to the file for the program to be loaded, command line arguments, and optionally environmental variables (see the man page).

Unix/Linux Signals • Unix/Linux signals are a mechanism to control the execution of processes. • Signals are sent to processes by • Other processes, through a system call (kill) • User input, via the I/O system (interrupt, quit, suspend characters) • The kernel, when an error or exception occurs (illegal instruction, floating point error, illegal memory reference • Receiving a signal can cause a process to terminate or to stop. There is a signal to resume a stopped process is stopped. Signals can be ignored or caught. • Catching a signal means having the process interrupt what it is doing, jump to a designated function which returns back where the process was interrupted. • The sigaction and signal system calls control the action.

Case Study: Linux • Linux process descriptor: the task structure • Linux sys_fork

Linux Task Structure (0) • Currently located in sched.h • The first few members are accessed via offsets

fs_struct (directory name space) files_struct (Open Files) sighand_struct (signal handers) tty_struct (communications) mm_struct (memory) stack + thread_info (low_level scheduling) signal_struct (signals) Process Descriptor state (runability) stack scheduling info debugging support state (exit status) pid and tgid mm process hierarchy info pid hash table credentials fs files signal sighand … task_struct (Process Descriptor)

Explanation: runnable state • volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */ • The state can be • TASK_RUNNING = running or runnable • TASK_INTERRUPTIBLE = waiting for condition such as a signal • TASK_UNINTERRUPTIBLE = waiting for condition such as hardware • TASK_STOPPED = suspended by software • TASK_TRACED = under debug control

Kernel Stack and Thread_info • The memory page that contains the kernel stack also contains the thread-info structure (see LKD page 25) • The thread_info structure is used for low level scheduling issues where a quick address lookup is needed for things like memory and current status and state • It also points to the task structure.

Some Flags • unsigned long flags; /* per process flags, defined below */ 1158 /* 1159 * Per process flags 1160 */ 1161 #define PF_ALIGNWARN 0x00000001 /* Print alignment warning msgs */ 1162 /* Not implemented yet, only for 486*/ 1163 #define PF_STARTING 0x00000002 /* being created */ 1164 #define PF_EXITING 0x00000004 /* getting shut down */ 1165 #define PF_EXITPIDONE 0x00000008 /* pi exit done on shut down */ 1166 #define PF_FORKNOEXEC 0x00000040 /* forked but didn't exec */ 1167 #define PF_SUPERPRIV 0x00000100 /* used super-user privileges */ 1168 #define PF_DUMPCORE 0x00000200 /* dumped core */ 1169 #define PF_SIGNALED 0x00000400 /* killed by a signal */ 1170 #define PF_MEMALLOC 0x00000800 /* Allocating memory */

IDs Various IDs pid_t pid; - actual process id pid_t tgid; - thread id The tgid is returned by getpid to satisfy POSIX threads.

Process Hierarchy • Specifies the parent-child-sibling relationships 882/* 883 * pointers to (original) parent process, youngest child, younger sibling, 884 * older sibling, respectively. (p->father can be replaced with 885 * p->parent->pid) 886 */ 887 struct task_struct *real_parent; /* real parent process (when being debugged) */ 888 struct task_struct *parent; /* parent process */ 889 /* 890 * children/sibling forms the list of my children plus the 891 * tasks I'm ptracing. 892 */ 893 struct list_head children; /* list of my children */ 894 struct list_head sibling; /* linkage in my parent's children list */ 895 struct task_struct *group_leader; /* threadgroup leader */

Visualization of Parent-Child • Using doubly linked lists for children and siblings parent children.prev children.next sibling.next sibling.prev … child 3 child n child 2 child 1 sibling sibling = parent (just a pointer)

Task Structure: hashing • Hashing by PID. A hash table enables fast lookup by PID.

Pointers • The task_struct contains pointers to other structures • stack = kernel stack + thread_info • mm = memory management • fs = filesystem • files = open files • Signaling system

Credentials User and group: uid_t uid, euid, suid, fsuid gid_t gid, egid, sgid, fsgid

Case Study: Fork family

sys_fork, sys_clone 725 asmlinkage int sys_fork(struct pt_regs regs) 726 { 727 return do_fork(SIGCHLD, regs.esp, &regs, 0, NULL, NULL); 728 } 729 730 asmlinkage int sys_clone(struct pt_regs regs) 731 { 732 unsigned long clone_flags; 733 unsigned long newsp; 734 int __user *parent_tidptr, *child_tidptr; 735 736 clone_flags = regs.ebx; 737 newsp = regs.ecx; 738 parent_tidptr = (int __user *)regs.edx; 739 child_tidptr = (int __user *)regs.edi; 740 if (!newsp) 741 newsp = regs.esp; 742 return do_fork(clone_flags, newsp, &regs, 0, parent_tidptr, child_tidptr); 743 }

Do_fork • See the book for what do_fork does • It calls copy_process and then starts the new process, possibly waits, and then returns.

Flags for Linux clone Bits in the sharing_flags bitmap

Cloning Flags /* * cloning flags: */ #define CSIGNAL 0x000000ff /* signal mask to be sent at exit */ #define CLONE_VM 0x00000100 /* set if VM shared between processes */ #define CLONE_FS 0x00000200 /* set if fs info shared between processes */ #define CLONE_FILES 0x00000400 /* set if open files shared between processes */ #define CLONE_SIGHAND 0x00000800 /* set if signal handlers and blocked signals shared */ #define CLONE_PID 0x00001000 /* set if pid shared */ #define CLONE_PTRACE 0x00002000 /* set if we want to let tracing continue on the child too */ #define CLONE_VFORK 0x00004000 /* set if the parent wants the child to wake it up on mm_release */ #define CLONE_PARENT 0x00008000 /* set if we want to have the same parent as the cloner */ #define CLONE_THREAD 0x00010000 /* Same thread group? */ #define CLONE_NEWNS 0x00020000 /* New namespace group? */ #define CLONE_SIGNAL (CLONE_SIGHAND | CLONE_THREAD)

CPU Scheduling Chapter 4

Scheduling Requirements • Support a mix of tasks, run by multiple users • For users: Treat each user fairly. Don’t let any job stop others from working. Favor interactive jobs • For the system: Optimize the use processor time in a possibly multiprocessor system

Linux 2.6 Scheduler • Overhaul from 2.4: • O(1) scheduler - can determine what to run next in constant time, independent of the number of processes.

Policy • Policy is the set of rules to determine what jobs run at what time and how long. • Jobs can be classified by I/O bound or processor bound. (actually can change) • Linux • uses a dynamic priority-based scheme. • dynamically adjusts the timeslice (how long the job can run) • Preempts jobs

Linux 2.6 Scheduling Algorithm • Code is located in kernel/sched.c • Each processor has its own runqueue. • Each runqueue has two priority arrays (active=runable and expired=timeslice expired) • Each priority array has 140 lists, one for each priority level (including 100 real time levels). • The scheduler selects processes from the active list according to their priority. • When a processes timeslice expires, it is moved to the expired array. At that time the timeslice and priority are recalculated. • Once the active list is empty, the scheduler switches the active and expired lists. • If a process is sufficiently interactive, it will stay in the active array.

Timeslice and Priority Calculations • Priority is dynamic, dependent upon • Niceness level (static priority), ranges from -20 to 19 (lower = better service) • Penalty (-5 to 5) that computed from the sleep_avg • Timeslice is calculated from the niceness level.

Load Balancing • With multiple processors work needs to be evenly divided among the processors. • See the book on how this is done.

CS 6560 Operating System Design