2.75k likes | 3.08k Views
UNIX 내부 구조 (LINUX Kernel 을 중심으로 ). Part I. UNIX Operating System 1. Introduction 2. Process Management 3. Memory Management 4. File System 5. Synchronization & IPC 6. I/O System (Device Driver) Part II. Detailed Study: LINUX Kernel Internals 1. Where is everything?
E N D
Part I. UNIX Operating System 1. Introduction 2. Process Management 3. Memory Management 4. File System 5. Synchronization & IPC 6. I/O System (Device Driver) Part II. Detailed Study: LINUX Kernel Internals 1. Where is everything? System call Implementation Device Driver using Module Programming 2. Linux internals Contents
U. Vahalia, “Unix Internals, The New Frontiers”, Prentice Hall, 1996. H. M. Deitel, “Operating Systems”, 2nd edition, Addison-Wesley, 1990 Silberschatz and Galvin, “Operating System Concepts (5th edition)”, Addison-Wesley, 1998 Mukesh Singhal and Niranjan G. Shivaratri, “Advanced Concepts in Operating Systems”, McGraw-Hill, 1994. Maurice J. Bach, “The Design of the UNIX Operating System”, Prentice Hall, 1986. M. Beck, etc, “Linux Kernel Internals, 2nd Ed”, Addison-Wesley, 1997 Marshall K. McKusick, K. Bostic, M. Karels and J. Quarterman, “The Design and Implementation of the 4.4 BSD Operating System”, Addison-Weseley Pub. Co., 1996. Benry Goodheart and James Cox, “The Magic Garden Explained”, Prentice Hall, 1994. References
I. Introduction • What is UNIX Operating System? • Brief History • Kernel Architecture • Features of UNIX Operating System
What’s the similarity between Onion and UNIX? What is UNIX Operating System? X window csh vi du who kernel wc Network Admin. Package telnet Hardware ps grep sort a.out RDBMS gcc ls
What is UNIX Operating System? (Cont`) User Programs User Programs Trap User level Libraries Kernel level System Call Interface File System Management Process Management IPC Buffer Cache Context Device Drivers Memory Management Hardware Control (Interrupts handling, etc) HW level Hardware (Source : The design of the UNIX OS)
UNIX Operating System is a Resource Manager Physical Resource CPU, Memory, Disk, Network… Abstract Resource process, thread, page, file, inode, message, security, … UNIX Operating System is the Computing Environments provide resources’ service to users system call, API abstraction is just a set of data structure in kernel level What is UNIX Operating System? (Cont`)
Before UNIX Multics: 1965, AT&T (Bell Lab), General Electronic, MIT Epoch 1969, Ken Thompson, “Space Travel” on PDP-7 Dennis Ritche s5fs, ed, shell (Bourn shell의 조상) 1973년 “The UNIX Time Sharing System” in CACM BSD Billy Joy, Chuch Haley (대학원생) ex, csh, paging based virtual memory system, TCP/IP, ffs, socket 1993년 4.4BSD (final version, 이후 BSDI 회사 ) AT&T System V Version 1,2,…,7, System III, System V, … SVR4.2/ESMP region based virtual memory, IPC, remote file sharing, STREAM, Brief History
Commercial UNIX XENIX (MS, SCO), SCO UNIX (SCO), AIX (IBM, Journaling FS), HP-UX (HP), ULTRIX (DEC, 최초의 MP), OSF/1 (Digital), …. SunOS (Sun Microsystems, VFS, NFS), Solaris, Unixware (Novell) Mach 최초의 micro-kernel chorus, Exo-kernel, SPIN, L4, …. http://ssrnet.snu.ac.kr/~choijm/current_os.html standard SVID(System V Interface Definition), POSIX (IEEE), X/OPEN (Inc.) UI (SUN, AT&T : Solaris), OSF (OSF/1) Linux Performance oriented Philosophy of COPYLEFT Brief History (Cont`)
Monolithic Kernel traditional UNIX, SVR4, Solaris, Linux, …. Kernel Architecture process process process System Call Integrated Kernel OS Functionality OS Personality Hardware
Monolithic Kernel Kernel Architecture (Cont`) process read() process fork() System Call sys_read() sys_fork() File System Process Management bread() copy_mm() Buffer Cache OS Personality Memory Manager hd_request() Disk Device Driver copy_thread() do_hd_io() Hardware CPU
Micro-Kernel Mach, Chorus, L3/L4, SPIN, QNX, Window-NT … Kernel Architecture (Cont`) process Server Server Server OS Functionality System Call Microkernel Hardware
Micro-Kernel what is the advantage of micro kernel ? Kernel Architecture (Cont`) …. Process Server process read() File System Server System Call hd_request() sys_read() Microkernel Hardware
Windows-NT Windows-NT Architecture OS/2 Client Win32 Client POSIX Client Logon Process Applications POSIX Server OS/2 Server Message Protected Subsystem (Servers) Win32 Server Security Server User mode Trap Kernel mode I/O Manager System Services File System Cache Manager Object Manager Security Ref. Monitor Process Manager LPC Facility VM Mgt. NT Executive Device Drivers Network Drivers Kernel Hardware Abstraction Layer(HAL) HW Control Hardware (Source : Inside Windows NT)
What is Good about UNIX Open system free Small is beautiful philosophy file: just stream of bytes Simple and Coherent data, device, pipe, socket, memory, process, … can be treated as a single abstraction (file) Portability high-level language new paradigm: OO, client-server model, clustering, PDA, MM Server True Parallelism Multitasking (Time Sharing), Multiprogramming, Multiprocessor, MPP Features
What is Wrong with UNIX Too many variant dumping ground Not small and simple any more uncontrolled growth Building-block approach inappropriate for beginner Lack of GUI not now Ritche’s words, “It takes a genius to understand and appreciate the UNIX’s simplicity” Features (Cont`)
What is process? process state transition context scheduling kernel entry point interrupt, trap, system call signal Overview
Definition an instance of a running program (runnable program) an execution environment of a program scheduling entity a control flow and address space PCB (Process Control Block) : proc. table and U area Manipulation of Process create, destroy context state transition dispatch (context switch) sleep, wakeup swap What is Process?
Process State Transition user running return from syscall or interrupt syscall, interrupt fork initial (idle) wait exit zombie kernel running fork swtch sleep, lock swtch ready to run asleep wakeup, unlock swap swap suspended ready suspended asleep (Source : UNIX Internals)
Flow of execution : execution mode (cf: address space) Process State Transition (Cont`) Kernel execution process A execution Kernel execution Interrupt or Trap cause change of execution modes process B creation process C execution Kernel execution process B execution Kernel execution (Source : Magic Garden)
context : system context, address (memory) context, H/W context Context memory proc table segment table page table file table fd Registers (TSS) swap eip sp eflags eax U area …. disk cs ….
System context proc. Table identification: pid, process group id, … family relation state sleep channel: sleep queue scheduling information : p_cpu, p_pri, p_nice, .. signal handling information address (memory) information U area stores hardware context when the process is not running currently UID, GID arguments, return values, and error status for system call signal catch function file descriptor usage statistics May it be different according to the version and variant of UNIX Context : system context
fork example guess what can we get from this program? Context : address context int glob = 6; char buf[] = “a write to stdout\n”; int main(void) { int var; pid_t pid; var = 88; write(STDOUT_FILENO, buf, sizeof(buf)-1); printf(“before fork\n”); if ((pid = fork()) == 0) { /* child */ glob++; var++; } else sleep(2); /* parent */ printf(“pid = %d, glob = %d, var = %d\n”, getpid(), glob, var); exit (0); } (Source : Adv. programming in the UNIX Env., pgm 8.1)
fork internal : compile results Context : address context (Cont`) gcc … movl %eax, [glob] addl %eax, 1 movl [glob], %eax ... test.c header text glob, buf 0xffffffff data kernel 0xbfffffff bss var, pid stack stack a.out : ELF format data Executable and Linking Format text 0x0 user’s perspective (virtual address)
fork internal : before fork (after run a.out) cf) we assume that there is no paging mechanism in this figure. Context : address context (Cont`) memory proc T. segment T. text pid = 11 var, pid stack glob, buf data
fork internal : after fork address space : basic protection barrier Context : address context (Cont`) memory glob, buf proc T. segment T. data pid = 11 text var, pid stack proc T. segment T. glob, buf pid = 12 data var, pid stack
fork internal : with COW (Copy on Write) mechanism after fork with COWafter “glob++” operation Context : address context (Cont`) memory segment T. segment T. proc T. proc T. data pid = 11 text pid = 11 text stack stack segment T. segment T. proc T. proc T. pid = 12 pid = 12 data data
execve internal a.out header text data bss stack Context : address context (Cont`) memory proc T. segment T. data pid = 11 text stack text data stack
time sharing (multitasking) Context : hardware context Where am I ?? time quantum process 1 … process 2 process 3
brief reminds the 80x86 architecture Context : hardware context (Cont`) ALU Control Unit IN OUT Registers eip, eflags eax, ebx, ecx, edx, esi, edi, … cs, ds, ss, es, ... cr0, cr1, cr2, cr3, GDTR, TR, ...
context swtch TSS TSS eip eip sp sp eflags eflags eax eax cs cs Context : hardware context (Cont`) CPU restore context save context Proc T. Proc T. U area U area
context swtch : pseudo-code in UNIX trick : register (eg, eax in 80*86 CPU) Think about the difference between context switch and system call. Context : hardware context (Cont`) … /* need context swtch */ if (save_context()) { /* pick another process to run from ready queue */ …. restore_context(new process) /* The control does not arrive here, NEVER !!! */ } /* resuming process executes from here !!! */ …... (Source : The Design of the UNIX OS)
Process schedulingallocate CPU resource among the competing processes criteria : fairness, efficiency (response time vs. throughput) types of processes Interactive Batch (Computation-Intensive) Real-time video,hospital types of scheduling Preemptive scheduling other processes can take CPU away from the current running process Non preemptive scheduling(Windows98) other processes can not take CPU away from the current running process Process Scheduling
중앙처리장치 이용률(utilization) 처리율(throughput) 완료프로세스/시간 반환 (turnaround) 시간 프로세스 시작->끝 대기(waiting)시간 준비 큐에서 보낸 시간의 합 응답(response)시간 작업제출 후 응답이 시작될 때까지 걸리는 시간 스케줄링 기준
Existing Policies FCFS (First Come First Served) 은행 RR (Round-Robin) time quantum(10-100milisec) SJF (Shortest Job First) Multilevel Feedback Queue 여러 개의 큐 EDF (Earliest Deadline First) RM (Rate Monotonic) Fair Queuing Gang Scheduling Causality Scheduling Process migration Process Scheduling (Cont`)
UNIX : Round Robin with multilevel Feedback Queue Round-Robin Ready Queue P1 P2 P3 CPU Process Scheduling (Cont`)
Multilevel Feedback Queue P3 P1 P4 P2 P7 P6 P5 P8 Process Scheduling (Cont`) Ready Queue 1 CPU Ready Queue 2 CPU higher priority less time quantum ……. Ready Queue n CPU
Round-Robin : real implementation scheduling information in proc. table : p_pri, p_cpu, p_nice every clock tick : increments p_cpu for current running process every second : p_cpu = p_cpu * decay factor (generally 1/2) p_pri = PUSER + p_cpu/2 + p_nice Example of System III 3 process, PUSER=50, p_nice = 0, clock ticks 60 at every second P1 p_pri p_cpu P2 p_pri p_cpu P3 p_pri p_cpu 0 50 0 50 0 50 0 1 65 30 50 0 50 0 2 57 15 65 30 50 0 3 53 7 57 15 65 30 4 66 33 53 7 57 15 Process Scheduling (Cont`) second
Example of BSD decay factor : (2*load_average) / (2*load_average + 1) p_pri = PUSER + (p_cpu/4) + (2*p_nice) clock tick is 10msec time quantum is 10 clock ticks Example of Mach decay factor : 5/8 p_usrpri = PUSER + (3.8*(max(1,M/P)) * p_cpu )/T + 0.5 * p_nice Example of SVR4 support REAL-TIME class process class independent scheduler / class dependent scheduler Example of LINUX support REAL-TIME process select a process that has the highest value of “priority + counter” “counter” of the current process decreases at each clock tick. Process Scheduling (Cont`)
Range of Process Priorities P P P P P P P P P P P P P Process Scheduling (Cont`) Swapper Waiting for Disk I/O Waiting for Buffer Waiting for Inode Waiting for TTY IO Waiting for Child Exit User Level 0 (50) User Level 1 …… User Level n Kernel Mode Priority User Mode Priority (Source : The Design of the UNIX OS)
Interrupt Trap system call Kernel Entry Point device kernel MM HWM process PM DD FS
Interrupt a mechanism that peripheral devices inform an asynchronous event to UNIX Operating System what’s the difference between polling and interrupt? Interrupt Handling Real time Clock Kernel CPU IVT interrupt handlers 0 clock() nmi() tty_intr() disk_intr() net_intr() …. clock() 1 disk_intr() disk PIC 2 tty 3 network 4 cdrom 합격자 발표
interrupt handling mechanism similar to the step of receiving a letter while telephoning step if user mode, change kernel mode save context of current process (make new context layer) determine interrupt source find interrupt vector and call interrupt handler …. interrupt handling….. restore saved context what if another interrupt is triggered while handling a interrupt? Interrupt Handling (Cont`)
clock interrupt handler ( timer_interrupt() in Linux ) Interrupt Handling (Cont`) clock() { restart clock /* will interrupt again */ if (callout table not empty) (eg) timer_list in LINUX) adjust time and schedule callout function if necessary if (profiling on) count program counter at time of interrupt gather statistics per process and system update CPU usage for the current running process if (one second elapsed) { alarm handling calculate the p_pri for all process reschedule if necessary wake up swapper or page daemon if necessary } } (Source : The Design of the UNIX OS)
trap : an asynchronous software event Trap Handling IVT 0 div_by_zero() invalid_opcode() overflow() segment_fault () page_fault () …. 1 2 3 4 clock() nmi() tty_intr() disk_intr() net_intr() …. 20 21 22 23 24 80 system_call() ….
system call : an example of trap System Call Handling Kernel sys_call_table (sysent[]) IVT trap 0 div_by_zero() invalid_opcode() overflow() segment_fault () page_fault () …. 0 sys_no_syscall() sys_exit() sys_fork() sys_read () sys_write () …. 1 1 2 2 sys_fork() 3 3 4 4 sys_read() system_call() 80 system_call() …. 47 sys_getpid() …. 255 sys_no_syscall()
invoke system call System Call Handling (Cont`) Kernel process sys_call_table (sysent[]) main() { …. fork() } IVT 0 div_by_zero() in_opcode() overflow() seg_fault () page_fault () …. 0 sys_no_sys() sys_exit() sys_fork() sys_read () sys_write () …. 1 1 2 2 sys_fork() libc.a 3 3 4 4 sys_read() …. fork() { …. movl $2, eax trap $80 …. } …. read() { … } 80 system_call() …. 47 sys_getpid() …. 255 sys_no_sys()
how to make a new system call coding new system call function in kernel space allocate syscall_number (and an empty slot in sys_call_table[]) and registering kernel rebuild reconfigure library ar, ranlib coding your program with new system call System Call Handling (Cont`)
a mechanism to inform an asynchronous event to process types of signal : SIGKILL, SIGINT, SIGBUS, SIGUSR1, …. action : abort, exit, ignore, stop, user level catch function what’s the difference among interrupt, trap, and signal? Signal void sig_handler(signo) int signo; { signal (SIGUSR1, sig_handler); /* reinstall */ printf(“received signal %d\n”, signo); /* handle the signal */ ….. } main () { signal (SIGUSR1, sig_handler); /* install the handler */ …. for ( ; ; ) pause(); }