The performance of Micro-Kernel Based Systems H. Haertig, M.Hohmuth, J.Liedtke, S.Schoenberg, J.Wolter Presented By: Ch

The performance of Micro-Kernel Based SystemsH. Haertig, M.Hohmuth, J.Liedtke, S.Schoenberg, J.Wolter Presented By:Chandana Kannatintavida

Outline of the Paper • Introduction. • Overview Of L4. • Design and Implementation Of Linux Server. • Evaluating Compatibility Performance. • Evaluating Extensibility Performance. • Alternative concepts from a performance point of view. • Conclusion

Introduction • Motivation: Microkernel based systems found too slow • Goal: Show that microkernel based systems can be practical with good performance • Method: • Conduct experiments on L4, a lean second generation microkernel with Linux running on top of it • The resulting system called L4linux • Compare performance of L4linux to native Linux and • Mklinux- Linux running on mach derived first generation microkernel

L4 Essentials • Based on two concepts - Address spaces and threads. • Address spaces- Constructed recursively by user level servers called pagers outside the kernel. • Initial address space-Physical memory • The next address spaces created by granting, mapping and unmapping flexpages. • Flexpages- Logical pages of sizes 2^n ranging from 1 physical page to entire address space. • Pagers act as main memory managers enabling the implementation of memory management policies • Threads- • An activity executing inside the address space • can dynamically associate with individual pagers. • IPC refers to cross address space communication • I/O ports treated as part of address space • Hardware interrupts handled as IPC

Linux Design and Implementation • L4 implemented on Pentium, Alpha and MIPS architectures • Linux has architecture dependent and independent parts. • All modifications done to architecture dependent part. • Application binary interface in Linux unmodified. • No Linux-specific modifications done to L4.

The Linux Kernel • On booting, the Linux server requests memory from its pager which maps physical memory in to the server’s address space. • The Linux server then acts as the pager for the user processes it creates. • Hardware page tables are kept inside L4 and cannot be accessed directly by user processes leading to additional logical page tables kept by Linux kernel. • A single L4 thread is multiplexed by L4linux to handle system calls and page faults. • Interrupts disabled for synchronization and critical sections.

Interrupt Handling and Device Drivers • Interrupt handlers in native Linux are subdivided in to top halves(Run immediately) and bottom halves(Run later). • L4 maps hardware interrupts in to messages. • Top half interrupt handlers are implemented as threads waiting for such messages, one thread per interrupt source. • Another thread handles all bottom halves when top half is completed.

Linux User Processes • Linux user processes implemented as a task. • The task is created by the Linux server and associates it with a pager. • L4 converts any Linux user page fault in to an RPC and sends it to Linux kernel. • The kernel then replies by mapping/unmapping the pages from its address space of the process.

System Call Mechanisms • L4linux system call implemented as RPCs between user processes and Linux server. • There are three system call interfaces: • A modified version of libc.so which uses L4 IPC primitives to call Linux kernel • A corresponding libc.a • A user level exception handler which does the system call trap instruction by calling the corresponding routine in the modified shared library. • TLB flushes avoided • L4linux uses physical copyin and copyout to exchange data between kernel and user processes instead of address translation by hardware

Signaling • Linux kernel signals the user processes by manipulating their stack, SP and PC. • In L4, each user process has a signal handler thread. • Upon receiving signal from the Linux server, the signal handler causes the user process’s main thread to save its state and enter Linux and resumes the main thread.

Scheduling • All threads are scheduled by L4’s internal scheduler. • The native Linux server’s schedule() operation is used only for multiplexing Linux server thread across cross routines when concurrent calls are made. • The number of co routine switches are minimized by sleeping until a new system call or wakeup call is received.

Supporting Tagged TLBs or Small spaces • Tagged TLB used to avoid TLB flushes in native Linux • However TLB conflicts have the same effect as TLB flushes due to extensive use of libraries and identical, virtual allocation of code and data in address spaces. • In L4linux, a special library permits the customization of code and data • The emulation library and signal thread can also be mapped closely to the application. • Thus, servers executing in small address spaces can be built.

Compatibility Performance • Three questions: • What is the penalty of using L4linux instead of native Linux? - Explained by running benchmarks on native and L4linux using the same hardware. • Does the performance of the underlying microkernel matter?- Explained by comparing L4linux to Mklinux. • How much does co-location improve performance?- Explained by comparing user mode L4linux to in-kernel version of Mklinux.

Micro Benchmarks • Used to analyze the detailed behavior of L4linux mechanisms • getpid – the shortest system call was repeated in a loop.

Micro Benchmarks • The Imbench benchmark suite measures system calls, context switches, memory accesses, pipe operations, networking operations etc. • Hbench is the revised version of Imbench.

Macro Benchmarks • Measure the system’s overall performance • The time needed to recompile the L4linux server was 6-7% slowr than native Linux and 10-20% faster than both Mklinux versions. • Commercial AIM multiuser benchmark used for a more systematic evaluation • The system performance under different application loads was measured.

Compatibility Performance Analysis • The current implementation of L4linux comes close to native Linux even under high load with penalties ranging from 5-10%. • Both the macro and micro benchmarks shows that performance of microkernel matters. • All benchmarks suggests that co-location itself does not improve performance

Extensibility Performance • Main advantage of microkernel- Extensibility/specialization • Three questions: • Can we add services outside L4linux to improve performance by specializing in Unix? • Can we improve certain applications by using native microkernel mechanisms in addition to the classical API? • Can we achieve high performance for non-classical Unix compatible systems coexisting with L4linux? • These three questions are answered by specific examples.

Pipes and RPC Four variants of data exchange are compared. • Standard pipe mechanism • Asynchronous pipes on L4- runs only on L4 and needs no Linux kernel. • Synchronous RPC- Uses blocking IPC directly without buffering data. • Synchronous mapping RPC- Sender maps pages in to receiver’s address space • Imbench used to measure latency and bandwidth.

Cache Partitioning • L4’s hierarchical user level pagers allow both L4linux memory system and a dedicated real time system to run in parallel. • The worst case execution time is considered the optimization criteria in real time systems. • A memory manager on top of L4 is used to partition cache between multiple real time tasks to minimize cache interference costs. • Time for matrix multiplication was measured with: • Uninterrupted cache conflicts- 10.9ms • Interrupted cache conflicts- 96.1ms • Cache partitioning avoiding secondary cache interference-24.9 ms

Virtual Memory Operations • The time taken(in microseconds) for selected memory operations in native Linux and L4linux are compared.

Extensibility Performance Analysis • Unix compatible functionality can be improved by microkernel primitives. Eg: pipes, VM operations. • Unix compatible or partially compatible functions can be added to the system that outperforms implementations based on unix API. Eg: RPC, User level pagers for VM operations. • Microkernel offers possibilities for coexisting systems based on different paradigms. Eg: Real time systems and MMU.

Alternative Basic Concepts • Can a mechanism at a lower level than IPC or a grafting model improve performance of a microkernel? • Protected Control Transfer • A parameter less cross address space procedure call via a callee defined gate. • Time taken for PCT and IPC were compared and PCT does not offer significant improvement. • Grafting • Downloading extensions in to the kernel. • Performance impact is still an open question.

Conclusion • The performance of L4 is significantly better than the first generation microkernel. • The throughput for L4 is only 5% less than native Linux whereas first generation microkernel were 5-7 times worse than native Linux. • The overall system performance does depend on the performance of the microkernel. • Modifications to Linux to suit L4 will further improve performance. • L4 provides an apt platform to build specialized systems.

The performance of Micro-Kernel Based Systems H. Haertig, M.Hohmuth, J.Liedtke, S.Schoenberg, J.Wolter Presented By: Ch

The performance of Micro-Kernel Based Systems H. Haertig, M.Hohmuth, J.Liedtke, S.Schoenberg, J.Wolter Presented By: Ch

Presentation Transcript

Ch 2. Getting Started with the Kernel

Database Systems Kernel

Knowledge-Based Kernel Approximation

Performance Specifications Presented By:

Presented by: Blue Green Systems

Multiprocessor Kernel Performance Profiling

H-60 Performance-Based Logistics (PBL) Initiatives

Ch 8, 9 Kernel Synchronization

Kernel – Based Methods

Presented by: H. Westley Clark, M.D.

Performance evaluation of component-based software systems

The Performance of μ -Kernel-Based Systems

Micro-kernel

Performance Evaluation of Commodity iSCSI-based Storage Systems

The Performance of µ-Kernel-Based Systems

Kernel based data fusion

Micro Ch 21

Gene Expression Classification by Kernel-based PLM

Steam Turbine Performance Remote Monitoring Presented by Deborah H. Cioffi, P.E.

The Performance of Micro-Kernel-Based Systems

Gene Expression Classification by Kernel-based PLM

Ch 2. Getting Started with the Kernel