510 likes | 531 Views
Processes in Unix and Nachos. Elements of the Unix Process and I/O Model. 1. rich model for IPC and I/O: “ everything is a file ”
E N D
Elements of the Unix Process and I/O Model • 1. rich model for IPC and I/O: “everything is a file” • file descriptors: most/all interactions with the outside world are through system calls to read/write from file descriptors, with a unified set of syscalls for operating on open descriptors of different types. • 2. simple and powerful primitives for creating and initializing child processes • fork: easy to use, expensive to implement • 3. general support for combining small simple programs to perform complex tasks • standard I/O and pipelines: good programs don’t know/care where their input comes from or where their output goes
Unix File Descriptors • Unix processes name I/O and IPC objects by integers known as file descriptors. • File descriptors 0, 1, and 2 are reserved by convention for standard input,standard output, and standard error. • “Conforming” Unix programs read input from stdin, write output to stdout, and errors to stderr by default. • Other descriptors are assigned by syscalls to open/create files, create pipes, or bind to devices or network sockets. • pipe, socket, open, creat • A common set of syscalls operate on open file descriptors independent of their underlying types. • read, write, dup, close
Unix File Descriptors Illustrated user space kernel file pipe process file descriptor table socket system open file table tty • File descriptors are a special case of kernel object handles. • The binding of file descriptors to objects is specific to each process, like the virtual translations in the virtual address space. Disclaimer: this drawing is oversimplified.
The Concept of Fork • The Unix system call for process creation is called fork(). • The fork system call creates a child process that is a clone of the parent. • Child has a (virtual) copy of the parent’s virtual memory. • Child is running the same program as the parent. • Child inherits open file descriptors from the parent. • (Parent and child file descriptors point to a common entry in the system open file table.) • Child begins life with the same register values as parent. • The child process may execute a different program in its context with a separate exec() system call.
Unix Fork/Exec/Exit/Wait Example • int pid = fork(); • Create a new process that is a clone of its parent. • exec*(“program” [, argvp, envp]); • Overlay the calling process virtual memory with a new program, and transfer control to it. • exit(status); • Exit with status, destroying the process. • int pid = wait*(&status); • Wait for exit (or other status change) of a child. fork parent fork child initialize child context exec wait exit
Example: Process Creation in Unix The fork syscall returns twice: it returns a zero to the child and the child process ID (pid) to the parent. int pid; int status = 0; if (pid = fork()) { /* parent */ ….. pid = wait(&status); } else { /* child */ ….. exit(status); } Parent uses wait to sleep until the child exits; wait returns child pid and status. Wait variants allow wait on a specific child, or notification of stops and other signals.
What’s So Cool About Fork • 1. fork is a simple primitive that allows process creation without troubling with what program to run, args, etc. • Serves some of the same purposes as threads. • 2. fork gives the parent program an opportunity to initialize the child process…especially the open file descriptors. • Unix syscalls for file descriptors operate on the current process. • Parent program running in child process context may open/close I/O and IPC objects, and bind them to stdin, stdout, and stderr. • Also may modify environment variables, arguments, etc. • 3. Using the common fork/exec sequence, the parent (e.g., a command interpreter or shell) can transparently cause children to read/write from files, terminal windows, network connections, pipes, etc.
Producer/Consumer Pipes char inbuffer[1024]; char outbuffer[1024]; while (inbytes != 0) { inbytes = read(stdin, inbuffer, 1024); outbytes = process data from inbuffer to outbuffer; write(stdout, outbuffer, outbytes); } Pipes support a simple form of parallelism with built-in flow control. input output e.g.: sort <grades | grep Dan | mail sprenkle
Unix as an Extensible System • “Complex software systems should be built incrementally from components.” • independently developed • replaceable, interchangeable, adaptable • The power of fork/exec/exit/wait makes Unix highly flexible/extensible...at the application level. • write small, general programs and string them together • general stream model of communication • this is one reason Unix has survived • These system calls are also powerful enough to implement powerful command interpreters (shell).
The Shell • The Unix command interpreters run as ordinary user processes with no special privilege. • This was novel at the time Unix was created: other systems viewed the command interpreter as a trusted part of the OS. • Users may select from a range of interpreter programs available, or even write their own (to add to the confusion). • csh, sh, ksh, tcsh, bash: choose your flavor...or use perl. • Shells use fork/exec/exit/wait to execute commands composed of program filenames, args, and I/O redirection symbols. • Shells are general enough to run files of commands (scripts) for more complex tasks, e.g., by redirecting shell’s stdin. • Shell’s behavior is guided by environment variables.
Limitations of the Unix Process Model • The pure Unix model has several shortcomings/limitations: • Any setup for a new process must be done in its context. • Separated Fork/Exec is slow and/or complex to implement. • A more flexible process abstraction would expand the ability of a process to manage another externally. • This is a hallmark of systems that support multiple operating system “personalities” (e.g., NT) and “microkernel” systems (e.g., Mach). • Pipes are limited to transferring linear byte streams between a pair of processes with a common ancestor. • Richer IPC models are needed for complex software systems built as collections of separate programs.
Two Views of Threads in Nachos • 1. Nachos is a thread library running inside a Unix (Solaris) process, with no involvement from the kernel. • SPARC interrupts and Solaris timeslicing are invisible. • the Nachos scheduler does its own pseudo-random timeslicing. • 2. Nachos is a toolkit for building a simulated OS kernel. • Threads are a basis for implementing Nachos processes; when running in kernel mode they interact/synchronize as threads. • Nachos kernel’s timeslicing is implemented in the scheduler. • - driven by timer interrupts on the “simulated machine” • A Nachos kernel could provide a kernel interface for threads.
Nachos Thread States and Transitions running (user) When running in user mode, the thread executes within the SPIM machine simulator. In Labs 1-3 we are only concerned with the states in this box. Machine::Run, ExceptionHandler interrupt or exception Thread::Yield running (kernel) Thread::Sleep Scheduler::Run blocked ready Scheduler::ReadyToRun
A Simple Page Table Each process/VAS has its own page table. Virtual addresses are translated relative to the current page table. process page table PFN 0 PFN 1 PFN i In this example, each VPN j maps to PFN j, but in practice any physical frame may be used for any virtual page. PFN i + offset page #i offset The page tables are themselves stored in memory; a protected register holds a pointer to the current page table. user virtual address physical memory page frames
data data Nachos: A Peek Under the Hood shell cp user space MIPS instructions executed by SPIM ExceptionHandler() Nachoskernel SPIM MIPS emulator Machine::Run() fetch/execute examine/deposit SaveState/RestoreState examine/deposit Rn page table process page tables Machine object SP PC memory registers
text data data BSS user stack args/env The User-Mode Context for Nachos PFN 0 PFN 1 Rn SP PC registers PFN i PFN i + offset page #i offset user virtual address boolean Machine::Translate(uva, alignment, &kva) Translate user virtual address to a kernel memory address, checking access and alignment.
Creating a Nachos Process Create a handle for reading text and initial data out of the executable file. void StartProcess(char *filename) { OpenFile *executable; AddrSpace *space; executable = fileSystem->Open(filename); if (executable == NULL) { printf("Unable to open file %s\n", filename); return; } space = new AddrSpace(executable); currentThread->space = space; delete executable; // close file space->InitRegisters(); space->RestoreState(); machine->Run(); ASSERT(FALSE); } Create an AddrSpace object, allocating physical memory and setting up the process page table. Set address space of current thread/process. Initialize registers and begin execution in user mode.
Creating a Nachos Address Space AddrSpace::AddrSpace(OpenFile *executable) { NoffHeader noffH; unsigned int i, size; executable->ReadAt((char *)&noffH, sizeof(noffH), 0); // how big is address space? size = noffH.code.size + noffH.initData.size + noffH.uninitData.size + UserStackSize; // we need to increase the size to leave room for the stack numPages = divRoundUp(size, PageSize); size = numPages * PageSize; pageTable = new TranslationEntry[numPages]; for (i = 0; i < numPages; i++) { pageTable[i].virtualPage = i; // for now, virtual page # = phys page # pageTable[i].physicalPage = i; pageTable[i].valid = TRUE; } ....
Initializing a Nachos Address Space bzero(machine->mainMemory, size); // copy in the code and data segments into memory if (noffH.code.size > 0) { noffH.code.virtualAddr, noffH.code.size); executable->ReadAt(&(machine->mainMemory[noffH.code.virtualAddr]), noffH.code.size, noffH.code.inFileAddr); } if (noffH.initData.size > 0) { noffH.initData.virtualAddr, noffH.initData.size); executable->ReadAt(&(machine->mainMemory[noffH.initData.virtualAddr]), noffH.initData.size, noffH.initData.inFileAddr); }
Join Scenarios • Several cases must be considered for join (e.g., exit/wait). • What if the child exits before the parent joins? • “Zombie” process object holds child status and stats. • What if the parent continues to run but never joins? • How not to fill up memory with zombie processes? • What if the parent exits before the child? • Orphans become children of init (process 1). • What if the parent can’t afford to get “stuck” on a join? • Unix makes provisions for asynchronous notification.
0 0x0 text data data BSS user stack args/env 2n-1 kernel text and kernel data 2n-1 0xffffffff Review: The Virtual Address Space • A typical process VAS space includes: • user regions in the lower half • V->P mappings specific to each process • accessible to user or kernel code • kernel regions in upper half • shared by all processes • accessible only to kernel code • Nachos: process virtual address space includes only user portions. • mappings change on each process switch A VAS for a private address space system (e.g., Unix) executing on a typical 32-bit architecture.
stack thread virtual address space user ID process ID parent PID sibling links children Process Internals process descriptor + + resources The address space is represented by page table, a set of translations to physical memory allocated from a kernel memory manager. The kernel must initialize the process memory with the program image to run. Each process has a thread bound to the VAS. The thread has a saved user context as well as a system context. The kernel can manipulate the user context to start the thread in user mode wherever it wants. Process state includes a file descriptor table, links to maintain the process tree, and a place to store the exit status.
header text data idata wdata symbol table relocation records What’s in an Object File or Executable? Header “magic number” indicates type of image. program instructions p Section table an array of (offset, len, startVA) immutable data (constants) “hello\n” program sections writable global/static data j, s j, s ,p,sbuf Used by linker; may be removed after final link step and strip. int j = 327; char* s = “hello\n”; char sbuf[512]; int p() { int k = 0; j = write(1, s, 6); return(j); }
data data data data data The Birth of a Program myprogram.c myprogram.o int j; char* s = “hello\n”; int p() { j = write(1, s, 6); return(j); } object file assembler libraries and other objects linker ….. p: store this store that push jsr _write ret etc. compiler program myprogram.s myprogram (executable file)
text data header BSS text data idata user stack wdata args/env symbol table kernel relocation records The Program and the Process VAS BSS “Block Started by Symbol” (uninitialized global data) e.g., heap and sbuf go here. Process text segment is initialized directly from program text section. data segments sections Process BSS segment may be expanded at runtime with a system call (e.g., Unix sbrk) called by the heap manager routines. Process data segment(s) are initialized from idata and wdata sections. process VAS program Process stack and BSS (e.g., heap) segment(s) are zero-filled. Text and idata segments may be write-protected. Args/env strings copied in by kernel when the process is created.
0 0 0 text text text data data data data data data BSS BSS BSS user stack user stack user stack args/env args/env args/env kernel area kernel area kernel area Processes and the Kernel 2n-1 2n-1 kernel 2n-1
data readyList Nachos as a Thread Library • The Nachos library implements concurrent threads. • no special support needed from the kernel (use any Unix) • thread creation and context switch are fast (no syscall) • defines its own thread model and scheduling policies • library threads are sometimes called coroutines, lightweight threads, or fibers in NT. while(1) { t = scheduler->FindNextToRun(); scheduler->Run(t); }
Fork/Exit/Wait Example fork parent fork child Child process starts as clone of parent: increment refcounts on shared resources. OS resources Parent and child execute independently: memory states and resources may diverge. On exit, release memory and decrement refcounts on shared resources. wait exit “join” Parent sleeps in wait until child stops or exits. Child enters zombie state: process is dead and most resources are released, but process descriptor remains until parent reaps exit status via wait.
user ID process ID process group ID parent PID signal state siblings children user ID process ID process group ID parent PID signal state siblings children Sharing Open File Instances shared seek offset in shared file table entry parent shared file (inode or vnode) child system open file table process file descriptors process objects
File Sharing Between Parent/Child main(int argc, char *argv[]) { char c; int fdrd, fdwt; if ((fdrd = open(argv[1], O_RDONLY)) == -1) exit(1); if ((fdwt = creat([argv[2], 0666)) == -1) exit(1); fork(); for (;;) { if (read(fdrd, &c, 1) != 1) exit(0); write(fdwt, &c, 1); } } [Bach]
Join Scenarios • Several cases must be considered for join (e.g., exit/wait). • What if the child exits before the parent joins? • “Zombie” process object holds child status and stats. • What if the parent can’t afford to get “stuck” on a join? • Unix provides for asynchronous notification via SIGCHLD. • What if the parent exits before the child? • Orphans become children of init (process 1). • What if the parent continues to run but never joins? • How not to fill up memory with zombie processes? • (Don’t create zombies if SIGCHLD ignored.)
run user interrupt kernel interrupt suspend/run fork trap/fault preempted zombie exit run kernel new sleep run blocked ready (suspend) wakeup swapout/swapin swapout/swapin Unix Process States
Example: Unix Signals • Unix systems can notify a user program of a fault with a signal. • The system defines a fixed set of signal types (e.g., SIGSEGV, SIGBUS, etc.). • A user program may choose to catch some signal types, using a syscall to specify a (user mode) signalhandler procedure. • system passes interrupted context to handler • handler may munge and/or return to interrupted context • Signals are also used for other forms of asynchronous event notifications. • E.g., a process may request a SIGALARM after some interval has passed, or signal another process using the kill syscall or command.
Unix Signals 101 • Signals notify processes of internal or external events. • the Unix software equivalent of interrupts/exceptions • only way to do something to a process “from the outside” • Unix systems define a small set of signal types • Examples of signal generation: • keyboard ctrl-c and ctrl-z signal the foreground process • synchronous fault notifications, syscall errors • asynchronous notifications from other processes via kill • IPC events (SIGPIPE, SIGCHLD) • alarm notifications signal == “upcall”
Process Handling of Signals • 1. Each signal type has a system-defined default action. • abort and dump core (SIGSEGV, SIGBUS, etc.) • ignore, stop, exit, continue • 2. A process may choose to block (inhibit) or ignore some signal types. • 3. The process may choose to catch some signal types by specifying a (user mode) handler procedure. • specify alternate signal stack for handler to run on • system passes interrupted context to handler • handler may munge and/or return to interrupted context
Delivering Signals • 1. Signal delivery code always runs in the process context. • 2. All processes have a trampoline instruction sequence installed in user-accessible memory. • 3. Kernel delivers a signal by doctoring user context state to enter user mode in the trampoline sequence. • First copies the trampoline stack frame out to the signal stack. • 4. Trampoline sequence invokes the signal handler. • 5. If the handler returns, trampoline returns control to kernel via sigreturn system call. • Handler gets a sigcontext (machine state) as an arg; handler may modify the context before returning from the signal.
When to Deliver Signals? Deliver signals when returning to user mode from trap/fault. Deliver signals when resuming to user mode. run user suspend/run fork trap/fault preempted zombie exit run kernel new sleep run blocked ready (suspend) wakeup Interrupt low-priority sleep if signal is posted. swapout/swapin swapout/swapin Check for posted signals after wakeup.
Questions About Signals • 1. What if handler corrupts the sigcontext before sigreturn? • 2. What is a process signal handler... • makes a system call? • never returns? • 3. What if a process is signalled again while it is executing in a signal handler? • 4. How to signal a process sleeping in a system call?
Process Blocking with Sleep/Wakeup • A Unix process executing in kernel mode may block by calling the internal sleep() routine. • wait for a specific event, represented by an address • kernel suspends execution, switches to another ready process • wait* is the first example we’ve seen • also: external input, I/O completion, elapsed time, etc. • Another process or interrupt handler may call wakeup (event address) to break the sleep. • search sleep hash queues for processes waiting on event • processes marked runnable, placed on internal run queue
Interruptible Sleeps • A Unix process entering a sleep specifies a scheduler priority. • Determines scheduling priority after wakeup. • Sleep priority is always higher than basepriority. • Sleeps for internal kernel resources wait at a higher priority. • Low-priority sleeps are interruptible. • A process in an interruptible sleep may awaken for a signal. • Interrupted system calls must back out all side effects. • Return errno EINTR….but the system call may be restartable. • FreeBSD uses tsleep variant for interruptible sleeps. • A process entering tsleep may specify a timeout.
Process Groups • It is sometimes useful to signal all processes in a group. • children of a common parent • ctrl-c and ctrl-z to a group of children executing together • job control facilities in BSD derivatives • Kill all children of shell on terminal hangup. • Kill children of shell if controlling process (shell) dies. • sessions • System calls: setpgrp, killpg.
Controlling Children • 1. After a fork, the parent program has complete control over the behavior of its child. • 2. The child inherits its execution environment from the parent...but the parent program can change it. • user ID (if superuser), global variables, etc. • sets bindings of file descriptors with open, close, dup • pipe sets up data channels between processes • 3. Parent program may cause the child to execute a different program, by calling exec* in the child context.
Setting Up Pipes int pfd[2] = {0, 0}; /* pfd[0] is read, pfd[1] is write */ int in, out; /* pipeline entrance and exit */ pipe(pfd); /* create pipeline entrance */ out = pfd[0] in = pfd[1]; /* loop to create a child and add it to the pipeline */ for (i = 1; i < procCount; i++) { out = setup_child(out); } /* pipeline is a producer/consumer bounded buffer */ write(in, ..., ...); read(out,...,...); parent in out children
Setting Up a Child in a Pipeline int setup_child(int rfd) { int pfd[2] = {0, 0}; /* pfd[0] is read, pfd[1] is write */ int i, wfd; pipe(pfd); /* create right-hand pipe */ wfd = pfd[1]; /* this child’s write side */ if (fork()) { /* parent */ close(wfd); close(rfd); } else { /* child */ close(pfd[0]); /* close far end of right pipe */ close(0, 1); dup(rfd); dup(wfd); close(rfd); close(wfd); ... } return(pfd[0]); } rfd wfd pfd[0] pfd[1] new right-hand pipeline segment new child
Exec, Execve, etc. • Children should have lives of their own. • Exec* “boots” the child with a different executable image. • parent program makes exec* syscall (in forked child context) to run a program in a new child process • exec* overlays child process with a new executable image • restarts in user mode at predetermined entry point (e.g., crt0) • no return to parent program (it’s gone) • arguments and environment variables passed in memory • file descriptors etc. are unchanged
text data data header sbrk() BSS text jsr data idata user stack wdata args/env symbol table kernel u-area relocation records process The Program and the Process VAS BSS “Block Started by Symbol” (uninitialized static data) Header “magic number” indicates type of image. Section table an array of (offset, len, startVA) segments sections May be removed after final link step and strip. program Args/env copied in by kernel on exec.
header header header text text text data data idata data idata idata wdata wdata wdata symbol table symbol table symbol table relocation records relocation records relocation records unresolved external unresolved external unresolved external Linking 101 header text link idata data header wdata text symbol table data idata wdata relocation records symbol table relocation records link
header header text text data idata data idata text wdata wdata text symbol table symbol table data data relocation records relocation records Shared Libraries and DLLs executable image Multiple modules attached to address space with mmap. loader shared library or DLL BSS Dynamic linker/loader dynamically imports DLLs. user stack args/env kernel u-area How to trap references to nonresident symbols? How to address external symbols from a DLL?
Questions for Exec • 1. How to copy argv, env? • what if the process passes endless strings? • use kernel stack as intermediate buffer? • 2. How to effect a return back to user mode after exec? • child stack and context • 3. What happens when the child returns from main? • 4. What about virtual caches? • 5. What about shell scripts etc.?