210 likes | 445 Views
Linux Overview. Anand Sivasubramaniam Dept. of Computer Science & Eng. The Pennsylvania State University. Why Linux?. It’s free! Open Source (modifiability, extensibility, …) Works on several platforms Robustness (after several revisions, and several people working on it) Widespread Usage
E N D
Linux Overview Anand Sivasubramaniam Dept. of Computer Science & Eng. The Pennsylvania State University
Why Linux? • It’s free! • Open Source (modifiability, extensibility, …) • Works on several platforms • Robustness (after several revisions, and several people working on it) • Widespread Usage • Compatibility with several other platforms.
Coverage from … • M. Beck, H. Bohme, M. Dziadzka, U. Kunitz, R. Magnus, D. Verworner. Linux Kernel Internals, 2nd edition, Addison-Wesley. • R. Card, E. Dumas, F. Mevel. The Linux Kernel book. John-Wiley. • Mainly Linux 2.0
Linux Features • Monolithic kernel (but well-defined interfaces) • Multi-tasking • Multi-user capability • Multi-processing Support (since 2.0) • Architecture Independence (PCs, Alpha, Sparc,…) • Demand loaded executables (on fork, shared address space, and copy-on-write) • 4K Pages, demand-paging with memory protection • Dynamic size for disk cache • Shared Libraries (dll) • Support for Posix standard • Several Executables formats • Several File Systems (Ext2) • Several network protocols
linux kernel mm fs net ipc drivers arch init include lib core pci i386 asm-arch proc unix char alpha linux ext ipv4 block sparc net ext2 ipv6 net sparc64 video fat . . . sound mips scsi minix video . . . msdos scsi nfs cdrom vfat . . . . . . /usr/src/linux hierarchy
arch/* contains architecture specific code • arch/i386/boot contains init code (in assembly) to initialize h/w, load the kernel, install ISRs, switch to protected mode and then calls start_kernel(void) in init/ • Kernel/ and arch/i386/kernel/ contain the core kernel code (fork,scheduler, timers, DMA and interrupt management, signal handling, switching protection modes) • VM system, memory allocation, paging are in mm/ and arch/i386/mm. • The virtual file system interface is in /fs, and the subdirectories contain respective file systems • drivers/ contains different drivers. • ipc/ has sources for IPC (semaphores, shared memory, message queues) • net/ has different protocol codes • lib/ contains standard C libraries • Include/ and asm/include have the necessary include files with /usr/include having links to these.
Building • Make config (reads arch/i386/config.in file, to find out which components to include, which inturn consults config.in in other directories. • Config.in contains directives on what packages to include. • Make depend • Make boot (gives a bootable kernel arch/i386/boot/zImage). • Make bzlilo (copies bootable kernel to /vmlinuz and the kernel can then be installed using LILO) • Make drivers (to just compile drivers) • Make modules (file systems and drivers not linked in can be created as modules) • Make modules_install (the created modules are installed in the /lib/modules/kernel_version directory)
Booting • On power up, CPU is reset and PC is set to a certain value in the BIOS ROM. • BIOS does hardware tests and initializations. • BIOS tries to read first sector (boot sector) of disk (first floppy then HD) • This sector is read into memory and is of a pre-determined format • 0x000 Jmp xxx • 0x003 Disk Params • 0x03E Code • 0x1BE Partition 1 entry • 0x1CE Partition 2 entry • 0x1DE Partition 3 entry • 0x1EE Partition 4 entry • 0x1FE Magic Number • PC is then set to the first location, which then goes to the code that checks which is he active partition. That entry has the sector number of the boot block of that partition which looks similar and has the code to boot that OS (part of LILO for Linux).
After loading the kernel, it jumps to arch/i386/boot/setup.S (start) • This code initializes and establishes hardware, and then switches to Protected Mode by setting a bit in the machine Status Word. • Executes “jmp 0x1000, KERNEL_CS” • Goes to arch/i386/kernel/head.S (startup_32:) • This does MMU, coprocessor and interrupt descriptor initializations, and sets the environment for the kernel C functions. Subsequently start_kernel() in init/main.c is called.
Start_kernel(void) { Memory_start = paging_init(memory_start,memory_end); Trap_init(); Init_IRQ(); Sched_init(); Time_init() Parse_options(command_line); Init_modules(); Memory_start = console_init(memory_start,memory_end); Memory_start= pci_init(memory_start,memory_end); Memory_start = kmalloc_init(memory_start,memory_end); Sti(); Memory_start=inode_init(memory_start,memory_end); Memory_start=file_table_init(memory_start,memory_end); Memory_start=name_cache_init(memory_start,memory_end); Mem_init(memory_start,memory_end); Buffer_init(); Sock_init(); Ipc_init(); …
Process 0 is now running. It generates a kernel thread which executes the init function Kernel_thread(init,NULL,0); • Process 0 executes idle process: Cpu_idle(NULL); Init() { kernel_thread(bdflush,NULL,0); // buffer cache sync daemon kernel_thread(kswapd,NULL,0); // swap daemon setup(); // init file sys, and mount root … open console, file descriptors 0, 1 and 2 … execve(“…../init”,argv_init,envp_init); // getty runs on each tty }
Adding a System Call • show_mult(x, y, *z) • Each sys call has a name and a number • Go to /asm/unistd.h: #define __NR_sched_r_get_interval 161 #define __NR_nanosleep 162 #define __NR_mremap 163 #define __NR_show_mult 164 • Go to arch/i386/kernel/entry.S .long SYMBOL_NAME(sys_nanosleep) .long SYMBOL_NAME(sys_mremap) .long SYMBOL_NAME(sys_show+mult) .space (NR_syscalls-164)*4 // padding
Now to add the code for the syscall, say in /sys.c (which has other syscalls) asmlinkage int sys_show_mult(int x, int y, int *res) { int error, compute; error = verify_area(VERIFY_WRITE, res, sizeof(*res)); if (error) return error; compute = x*y; put_user(compute,res); printk(“Value computed”); return 0; } • Compile kernel and reboot the machine.
How do you use this? If it is already defined in a library, then fine. • Else, you can use a macro definition • _syscall3 (int, show_mult, int, x, int, y, int *, resul); • Which expands as: int show_mult(int x, int y, int *resul) { long __res; __asm__ __volatile (“int $0x80” : “=a” (__res) “0” (164), “b” ((long) (x)), “c” ((long) (y)), “d” ((long) (resul))); if (__res>=0) return (int) __res; errno = -__res; return –1; } • Which places sys call # in eax register, parameters in ebx, ecx and edx, and then invokes software interrupt 0x80.
Upon this trap, the function system_call in arch/i386/kernel/entry.S is invoked. • This uses the syscall # (in eax) to index the table sys_call_table, to call the corresponding function. • User program is as follows: #include <stdio.h> #include <stdlib.h> #include <linux/unistd.h> _syscall3 (int, show_mult, int, x, int, y, int *, resul); main() { int ret=0; show_mult(2, 5, &ret); }
What happens on a syscall? • On the software interrupt (0x80), the control is transferred to system_call() in arch/i386/kernel/entry.S • Here is what goes on inside this routine … SAVE_ALL; // saves registers … *sys_call_table[sys_call_num](sys_call_args); … if (intr_count) goto exit_now // nested interrupts if (bh_mask & bh_active) { ++intr_count; Sti(); Do_bottom_half(); --intr_count; } sti();
if (need_resched) { schedule(); // return much later! goto ret_from_sys_call; } … if (current->signal & ~current->blocked) do_signal(); … exit_now: RESTORE ALL; // return using iret
Basics of Interrupt Handling • In arch/i386/kernel/irq.c and include/asm/irq.h • Three types of interrupts: Fast, Slow, System Calls (software) • Slow Interrupts (typical), turn off interrupts only for a little while. E.g. timer SLOW_IRQ(intr_num, intr_controller, intr_mask) { SAVE_ALL; // macro in include/asm/irq.h ENTER_KERNEL; // exclusive execn. In kernel – for SMP ACK(intr_controller,intr_mask); ++intr_count; // nesting depth Sti(); // enable interrupts Do_IRQ(intr_num, Register); // do actual ISR Cli(); UNBLK(intr_controller,intr_mask); --intr_count; Ret_from_sys_call(); }
FAST_IRQ(intr_num,intr_controller,intr_mask) { SAVE_MOST; // macro in include/asm/irq.h ENTER_KERNEL; ACK(intr_controller,intr_mask); ++int_count; Do_fast_IRQ(intr_num); UNBLK(intr_controller,intr_mask); --intr_count; LEAVE_KERNEL RESTORE_MOST; }
Timer Interrupt • 1 tick = 10 ms, 100 interrupts every second