270 likes | 298 Views
The Performance of Micro-Kernel-Based Systems. H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park. Introduction. μ-kernels have reputation for being too slow, inflexible C an 2nd generation μ-kernel (L4) overcome limitations? Experiment:
E N D
The Performance of Micro-Kernel-Based Systems H. Haertig, M. Hohmuth, J. Liedtke, S. Schoenberg, J. Wolter Presentation by: Seungweon Park
Introduction • μ-kernels have reputation for being too slow,inflexible • Can 2nd generation μ-kernel (L4) overcome limitations? • Experiment: • Port Linux to run on L4 (Mach 3.0) • Compared to native Linux, MkLinux (Linux on 1st gen Mach derived μ-kernel)
Introduction (cont.) • Test speed of standard OS personality on top of fast μ-kernel: Linux implemented on L4 • Test extensibility of system: • pipe-based communication implemented directly on μ-kernel • mapping-related OS extensions implemented as user tasks • user-level real-time memory management implemented • Test if L4 abstractions independent of platform
L4 Essentials • Based on threads and address spaces • Recursive construction of address spaces by user-level servers • Initial address space σ0 represents physical memory • Basic operations: granting, mapping, and unmapping. • Owner of address space can grant or map page to another address space • All address spaces maintained by user-level servers (pagers)
L4Linux – Design & Implementation • Fully binary compliant with Linux/X86 • Restricted modifications to architecture-dependent part of Linux • No Linux-specific modifications to L4 kernel
L4Linux – Design & Implementation • Address Spaces • Initial address space σ0 represents physical memory • Basic operations: granting, mapping, and unmapping. • L4 uses “flexpages”: logical memory ranging from one physical page up to a complete address space. • An invoker can only map and unmap pages that have been mapped into its own address space
L4Linux – Design & Implementation • Address Spaces (cont.) • I/O ports are parts of address spaces. • Hardware interrupts are handled by user-level processes. The L4 kernel will send a message via IPC.
L4Linux – Design & Implementation • The Linux server • L4Linux will use a single-server approach. • A single Linux server will run on top of L4, multiplexing a single thread for system calls and page faults. • The Linux server maps physical memory into its address space, and acts as the pager for any user processes it creates. • The Server cannot directly access the hardware page tables, and must maintain logical pages in its own address space.
L4Linux – Design & Implementation • Interrupt Handling • All interrupt handlers are mapped to messages. • The Linux server contains threads that do nothing but wait for interrupt messages. • Interrupt threads have a higher priority than the main thread.
L4Linux – Design & Implementation • User Processes • Each different user process is implemented as a different L4 task: Has its own address space and threads. • The Linux Server is the pager for these processes. Any fault by the user-level processes is sent by RPC from the L4 kernel to the Server.
L4Linux – Design & Implementation • System Calls • Three system call interfaces: • A modified version of libc.so that uses L4 primitives. • A modified version of libc.a • A user-level exception handler (trampoline) calls the corresponding routine in the modified shared library. • The first two options are the fastest. The third is maintained for compatibility.
L4Linux – Design & Implementation • Signalling • Each user-level process has an additional thread for signal handling. • Main server thread sends a message for the signal handling thread, telling the user thread to save it’s state and enter Linux
L4Linux – Design & Implementation • Scheduling • All thread scheduling is down by the L4 kernel • The Linux server’s schedule() routine is only used for multiplexing it’s single thread. • After each system call, if no other system call is pending, it simply resumes the user process thread and sleeps.
L4Linux – Design & Implementation • Tagged TLB & Small Space. • In order to reduce TLB conflicts, L4Linux has a special library to customize code and data for communicating with the Linux Server • The emulation library and signal thread are mapped close to the application, instead of default high-memory area.
Performance • What is the penalty of using L4Linux? Compare L4Linux to native Linux • Does the performance of the underlying micro-kernel matter? Compare L4Linux to MkLinux • Does co-location improve performance? Compare L4Linux to an in-kernel version of MkLinux
Microbenchmarks • measured system call overhead on shortest system call “getpid()”
Microbenchmarks (cont.) • Measures specific system calls to determine basic performance.
Macrobenchmarks • measured time to recompile Linux server
Macrobenchmarks (cont.) • Next use a commercial test suite to simulate a system under full load.
Performance Analysis • L4Linux is, on average 8.3% slower than native Linux. Only 6.8% slower at maximum load. • MkLinux: 49% average, 60% at maximum. • Co-located MkLinux: 29% average, 37% at maximum.
Extensibility Performance • A micro-kernel must provide more than just the features of the OS running on top of it. • Specialization – improved implementation of Os functionality • Extensibility – permits implementation of new services that cannot be easily added to a conventional OS.
Pipes and RPC First five (1) use the standard pipe mechanism of the Linux kernel. (2) Is asynchronous and uses only L4 IPC primitives. Emulates POSIX standard pipes, without signalling. Added thread for buffering and cross-address-space communication. (3) Is synchronous and uses blocking IPC without buffering data. (4) Maps pages into the receiver’s address space.
Virtual Memory Operations • The “Fault” operation is an example of extensibility – measures the time to resolve a page fault by a user-defined pager in a separate address space. • “Trap” – Latency between a write operation to a protected page, and the invocation of related exception handler. • “Appel1” – Time to access a random protected page. The fault handler unprotects the page, protects some other page, and resumes. • “Appel2” – Time to access a random protected page where the fault handler only unprotects the page and resumes.
Conclusion • Using the L4 micro-kernel imposes a 5-10% slowdown to native Linux. Much faster than previous micro-kernels. • Further optimizations such as co-locating the Linux Server, and providing extensibility could improve L4Linux even further.