680 likes | 704 Views
Learn about interrupts and system calls in Linux, including interrupt handling philosophy, maximizing parallelism, and handling nested interrupts. Understand how to minimize interrupt handling time and maintain fairness to user programs.
E N D
W4118 Operating Systems Interrupt and System Call in Linux Instructor: Junfeng Yang
Logistics • Find two teammates before next Thursday • Post ads in CourseWorks
Last lecture • OS structure • Monolithic v.s. microkernel • Modern OS: modules • Virtual machine • Intro to Linux • Interrupts in Linux
Interrupts in Linux Memory Bus IRQs PIC intr # idtr CPU IDT INTR 0 intr # ISR How to handle interrupts? Mask points 255
Today • Interrupts in Linux (cont.) • Interrupt handlers • System calls in Linux • Intro to Process
Nested Interrupts • What if a second interrupt occurs while an interrupt routine is executing? • Generally a good thing to permit that — is it possible? • And why is it a good thing?
Maximizing Parallelism • You want to keep all I/O devices as busy as possible • In general, an I/O interrupt represents the end of an operation; another request should be issued as soon as possible • Most devices don’t interfere with each others’ data structures; there’s no reason to block out other devices
Handling Nested Interrupts • Hardware invokes handler with interrupt disabled • As soon as possible, unmask the global interrupt • Interrupts from the same IRQ line? • Wants to process in serial • Thus, interrupt from same IRQ is not enabled during interrupt-handling
Interrupt Handling Philosophy • To preserve IRQ order on the same line, must disable incoming interrupts on same line • New interrupts can get lost if controller buffer overflow • Interrupt preempts what CPU was doing, which may be important • Even not important, undesirable to block user program for long • So, handler must run for a very short time! • Do as little as possible in the interrupt handler • Often just: queue a work item and set a flag • Defer non-critical actions till later
Intr handlers have no process context! • Interrupts (as opposed to exceptions) are not associated with particular instructions, nor the current process. It’s like an unexpected jump. • Why? Context switch expensive • Implication • Interrupt handlers cannot call functions that may sleep (i.e. yield CPU to scheduler) ! • Why not? • Scheduler only schedules processes, so wouldn’t know to reschedule the interrupt handler • The current process may be doing something dangerous, and cannot sleep
Linux Interrupt Handler Structure • Top half (th) and bottom half (bh) • Top-half: do minimum work and return (ISR) • Bottom-half: deferred processing (softirqs, tasklets, workqueues) tasklet Top half workqueue softirq Bottom half
Top half • Perform minimal, common functions: saving registers, unmasking other interrupts. Eventually, undoes that: restores registers, returns to previous context. • Most important: call proper interrupt handler provided in device drivers (C program) • Typically queue the request and set a flag for deferred processing Top half softirq Softirq flag = 0 Softirq flag = 1
Deferrable Work • Three deferred work mechanisms: softirqs, tasklets, and work queues(tasklet built on top of softirq) • All of these use request queues • Think of requests as (function, args) • All can be interrupted tasklet Top half workqueue softirq Bottom half
Softirqs • Types are statically allocated: at kernel compile time • Limited number: Priority Type 0 High-priority tasklets (generic) 1 Timer interrupts 2 Network transmission 3 Network reception 4 SCSI disks 5 Regular tasklets (generic) • Singled out timer, net and scsi disk because these are the most important for server performance • Each type mapped to a bit in a per-CPU bitmask. To raise a softirq = simply set bit • What does a softirq handler do?
Example: Network card SoftIRQ • Parse packet header • Verify checksum • Deliver packet up to the network stack • return Linux-2.6.11/net/core/dev.c, function net_rx_action
Running Softirqs • When to execute softirq? • Run at various points by the kernel, using current process’s context • Most important: after handling IRQs and after timer interrupts • Essentially polling • Problem: while processing one softirq, another is raised. Process it? • No long delay for new irq • Always starve user program when long softirq burst • Livelock!
Livelock • 100% CPU utilization, but no progress • Why? • Need user program to eventually process requests • E.g. webserver • However, if too many interrupt requests, starve user program • Big deal for networking in 90s • Solution: • Eliminating receive livelock in an interrupt-driven kernel, Jeffrey C. Mogul, K. K. Ramakrishnan • Adopted into Linux
Avoid Livelock • Goal: provide user program fair share of CPU time despite interrupt burst • Quota + dedicated context ksoftirqd • Process up to N softirqs for one softirq hanlder invocation • Bound time spent in handler • Process the rest in ksoftirqd • ksoftirqd subject to scheduling, as user process • Provide fairness to user process
Tasklets • Problem: softirq is static • To add a new type of Softirq, need to convince Linus! • Solution: tasklets • Built on top of softirq • New types are created and destroyed dynamically • Simplified for muliticore processing: at any time, only one tasklet among all of the same type can run • Problem with softirq and tasklets: they have no process contexts either, thus cannot sleep
Work Queues • Softirqs and tasklets run in an interrupt context; work queues have a process context • The idea: • You throw work (fn, args) to a workqueue • Workqueue add to an internal FIFO queue • A dedicated workqueue process loops forever, dequeuing (fn, args), and running fn(args) • Because they have a process context, they can sleep
Monitoring Interrupt Activity • Linux has a pseudo-file system, /proc, for monitoring (and sometimes changing) kernel behavior • Run cat /proc/interrupts to see what’s going on
CPU0 0: 162 IO-APIC-edge timer 1: 0 IO-APIC-edge i8042 4: 10 IO-APIC-edge 7: 0 IO-APIC-edge parport0 8: 1232299 IO-APIC-edge rtc 9: 0 IO-APIC-fasteoi acpi 12: 1 IO-APIC-edge i8042 16: 19256781 IO-APIC-fasteoi uhci_hcd:usb1, … 17: 79 IO-APIC-fasteoi uhci_hcd:usb2, uhci_hcd:usb4 … # Columns: IRQ, count, interrupt controller, devices
Today Interrupts in Linux (cont.) System calls in Linux Intro to Process
API – System Call – OS Relationship { printf(“hello world!\n”); } libc User mode %eax = sys_write; int 0x80 syscalls table system_call() { fn = syscalls[%eax] } kernel mode IDT 0x80 sys_write(…) { // do real work }
System Calls vs. Library Calls Library calls are much faster than system calls If you can do it in user space, you should strlen? write? Learn what a library call/system call do: Documents are called “manpages,” divided into sections Library calls (section 3) e.g. man 3 strlen System calls (section 2) e.g. man 2 write
Next: Syscall Wrapper Macros { printf(“hello world!\n”); } libc User mode %eax = sys_write; int 0x80 syscalls table system_call() { fn = syscalls[%eax] } kernel mode IDT 0x80 sys_write(…) { // do real work }
Syscall Wrapper Macros • Generating the assembly code for trapping into the kernel is complex so Linux provides a set of macros to do this for you! • Macros with name _syscallN(), where N is the number of system call parameters • _syscallN(return_type, name, arg1type, arg1name, …) • in linux-2.6.11/include/asm-i386/unistd.h • Macro will expands to a wrapper function • Example: • long open(const char *filename, int flags, int mode); • _syscall3(long, open, const char *, filename, int, flags, int, mode) • NOTE: _syscallN obsolete after 2.6.18; now syscall (…), can take different # of args
Lib call/Syscall Return Codes Library calls return -1 on error and place a specific error code in the global variable errno System calls return specific negative values to indicate an error Most system calls return -errno The library wrapper code is responsible for conforming the return values to the errno convention
Next: Syscall Implementation { printf(“hello world!\n”); } libc User mode %eax = sys_write; int 0x80 syscalls table system_call() { fn = syscalls[%eax] } kernel mode IDT 0x80 sys_write(…) { // do real work }
System call handler .section .text system_call: // copy parameters from registers onto stack… call sys_call_table(, %eax, 4) jmp ret_from_sys_call ret_from_sys_call: // perform rescheduling and signal-handling… iret // return to caller (in user-mode) // File arch/i386/kernel/entry.S Why jump table? Can’t we use if-then-else?
The system-call jump-table • There are approximately 300 system-calls • Any specific system-call is selected by its ID-number (it’s placed into register %eax) • It would be inefficient to use if-else tests or even a switch-statement to transfer to the service-routine’s entry-point • Instead an array of function-pointers is directly accessed (using the ID-number) • This array is named ‘sys_call_table[]’ • Defined in file arch/i386/kernel/entry.S
System call table definition .section .data sys_call_table: .long sys_restart_syscall .long sys_exit .long sys_fork .long sys_read .long sys_write … NOTE: syscall numbers cannot be reused (why?); deprecated syscalls are implemented by a special “not implemented” syscall (sys_ni_syscall)
Syscall Naming Convention • Usually a library function “foo()” will do some work and then call a system call (“sys_foo()”) • In Linux, all system calls begin with “sys_” • Often “sys_foo()” just does some simple error checking and then calls a worker function named “do_foo()”
Tracing System Calls • Linux has a powerful mechanism for tracing system call execution for a compiled application • Output is printed for each system call as it is executed, including parameters and return codes • The ptrace() system call is used • Also used by debuggers (breakpoint, singlestep, etc) • Use the “strace” command (man strace for info) • You can trace library calls using the “ltrace” command
Passing system call parameters • The first parameter is always the syscall # • eax on Intel • Linux allows up to six additional parameters • ebx, ecx, edx, esi, edi, ebp on Intel • System calls that require more parameters package the remaining params in a struct and pass a pointer to that struct as the sixth parameter • Problem: must validate pointers • Could be invalid, e.g. NULL crash OS • Or worse, could point to OS, device memory security hole
How to validate user pointers? • Too expensive to do a thorough check • Need to check that the pointer is within all valid memory regions of the calling process • Solution: No comprehensive check • Linux does a simple check for address pointers and only determines if pointer variables are within the largest possible range of user memory (more details when talking about process) • Even if a pointer value passes this check, it is still quite possible that the specific value is invalid • Dereferencing an invalid pointer in kernel code would normally be interpreted as a kernel bug and generate an Oops message on the console and kill the offending process • Linux does something very sophisticated to avoid this situation
Handling faults due to user-pointers • Kernel code must access user-pointers using a small set of “paranoid” routines (e.g. copy_from_user) • Thus, kernel knows what addresses in its code can throw invalid memory access exceptions (page fault) • When a page fault occurs, the kernel’s page fault handler checks the faulting EIP (recall: saved by hw) • If EIP matches one of the paranoid routines, kernel will not oops; instead, will call “fixup” code • Many violations of this rule in Linux. Once built a checker and found tons of security holes
How to find “fixup” code? • Exception table • Faulting instruction address fixup code • On page fault, kernel scans exception table to find the fixup code • Typically the fixup code terminates the system call with an EINVAL error code (means: invalid arguments) • Some ELF tricks help to generate exception table and implement fixup code; see ULK Chapter 10 for gruesome details
Intel Fast System Calls int 0x80 not used any more (I lied …) Intel has a hardware optimization (sysenter) that provides an optimized system call invocation Read the gory details in ULK Chapter 10
Today Interrupts in Linux (cont.) System calls in Linux Intro to Process What are processes? Why need them?
What is a Process • “Program in execution” “virtual CPU” • Process is an execution stream in the context of a particular process state • Execution stream: a stream of instructions • Running piece of code • sequential sequence of instructions
What is a Process? (cont.) • Process state: determines the effects of the instructions. • Stuff that the running code can affect or be affected by • Registers • General purpose, floating point, EIP … • Memory: everything a process can address • Code, data, stack, heap • I/O • File descriptor table • More … stack reg heap SP IP data cpu code mem
Program v.s. Process • Process != program • Program: static code and static data • Process: dynamic instantiation of code and data • Process <-> program: no 1:1 mapping • Process > program: code + data + other things program process main() { f(x); } f(int x) { } main() { f(x); } f(int x) { } stack for f() regs IP heap
Program v.s. Process (cont.) Process <-> program: no 1:1 mapping Program > process: one program can invoke multiple processes E.g. shell can run commands in different processes Process > program: can have multiple processes of the same program E.g. Multiple users run multiple /usr/bin/tcsh
Address Space (AS) • More details when discussing memory management • AS = All memory a process can address + addresses • Virtual address space: • Really large memory to use • Linear array of bytes: [0, N), N roughly 2^32, 2^64 • Process and virtual address space: 1 : 1 mapping • Key: an AS is a protection domain • One process can’t address another process’s address space (without permission) • E.g. Value stored at 0x800abcd in p1 is different from 0x800abcd • Thus can’t read/write
Process v.s. Threads • More details when discussing threads • Process != Threads • Threads: many streams of executions in one process • Threads share address space threads process main() { f(x); } f(int x) { } main() { f(x); } f(int x) { } stack for f() stack for f() stack for f() regs regs regs IP IP IP heap heap
Why need processes? Divide and conquer Decompose a large problem into smaller ones easier to think well contained smaller problems Sequential Easier to think about Increase performance. System has many concurrent jobs going on
System categorization • Most OS support process • Uniprogramming: only one process at a time • Good: simple • Bad: low utilization, low interactivity • Multiprogramming: multiple at a time • When one proc blocks (e.g. I/O), switch to another • NOTE: different from multiprocessing (systems with multiple processors) • Good: increase utilization, interactivity • Bad: complex
Multiprogramming • OS support for multiprogramming • Policy: scheduling, what proc to run? (next week) • Mechanism: • dispatching, how to run/block process? • how to protect from one another? • Separation of policy and mechanism • Recurring theme in OS • Policy: decision making with some performance metric and workload • Scheduling (next week) • Mechanism: low-level code to implement decisions • Dispatching (today)