250 likes | 269 Views
Explore the performance of L4Linux, a Linux "personality" built on top of the L4 microkernel. Discover the impact of microkernel architecture on system performance compared to traditional monolithic kernels like Mach. Delve into the implementation details, upcalls, and optimizations for efficient address space handling and interrupt management.
E N D
What is a microkernel • kernels that only provide address spaces, threads, and IPC • kernel does not handle e.g. the file system or interrupts
Mircokernel abstraction level • some researchers feel abstraction level is too high • kernel should map more directly to hardware • some researchers feel abstraction is too low • focus on extensibility (Mach)
L4 • so called 2nd generation microkernel • built from scratch as opposed to a developed from earlier monolithic kernel approaches (e.g. Mach)
L4 essentials • threads, address spaces and cross-address-space communication (IPC) • other kernel operations e.g. RPC and address-space crossing thread migration are built up from IPC primitives
Address spaces • recursive construction • granting, mapping, unmapping • a page owner can grant or map any of its pages to another address space with the receiver permission. That page is then accessible to both address spaces
Address spaces (cont) • only the grant, map, and unmap are implemented in the kernel • user-level “pagers” handle page faults
Interrupts, exceptions, and traps • all handled at user level • interrupts are transformed, by kernel, into IPC messages and sent to appropriate user level thread • exceptions and traps are synchronous to associated thread and kernel mirrors them to that thread
Implementation of L4Linux • develop L4Linux, a linux “personality” on top of the L4 microkernel • due to time restrictions the linux kernel was not fine tuned in L4Linux, so results are only an upper bound on the performance penalty
L4Linux • linux 2.0.21 on top of L4 • linux kernel is a user level server • 100% binary compatible • modified versions of shared C library libc.so and libc.a • user level “trampoline” exception • 14 engineer months and 6500 rewritten lines out of a total of ~340,000
Trampoline • 100% binary compatible means that a program statically linked against the native linux library must run, unmodified, on L4Linux • the trampoline “bounces” the system-call trap that on native linux went into the kernel back into the modified shared library on L4Linux • Microkernel upcalls into user level handler, handler than makes an RPC (read, invokes kernel again) to OS personality to invoke system call
L4Linux (cont) • L4 maps the entire initial address space to kernel server • single thread in L4, acts as a single virtual processor to the linux server • Linux server occupies a small memory region, which utilizes Pentium’s segment feature to protect its TLB entries, so the TLB always has the linux server’s translations (small-address-space optimization)
L4Linux (cont) • L4 allows user level processes to disable interrupts, so uniprocessor version of linux did not need modification of critical sections
L4Linux (cont) • interrupt threads have a priority above the server itself, so they don’t execute concurrently • signals are forwarded to a co-located signal handler inside each user process, since only a thread in the same address space can manipulate another thread’s state
L4Linux (cont) • scheduling is mostly done by L4 scheduler • Four priority levels: top half interrupts, bottom half interrupts, the linux server, user processes. No priority decay. • so L4 interrupts the linux server in the same way the hardware would interrupt a native linux kernel
L4Linux (cont) • “user level schedulers can dynamically change priority and time slice of any thread”?
Experiments • micro- and macro- benchmarks used to compare native linux and MkLinux (Mach derived variant) to L4Linux • linux vs L4Linux demonstrates performance penalty for using microkernel • L4Linux vs MkLinux demonstrates influence of mircokernel on overall system including the influence of colocation • extensibility experiments • functionality specialized for L4Linux
PerformanceL4Linux, MkLinux and Linux • microbenchmarks • getpid: L4Linux 2.4 or 3.4 times slower than linux; MkLinux 3.9 or 28 times slower than L4Linux • lmbench and hbench: L4Linux 1 to 3 times slower than linux; MkLinux 1 to 32 times slower than L4Linux
Performance (cont)L4Linux, MkLinux and Linux • macrobenchmarks • recompiling linux server: L4Linux 6%-7% slower than linux; MkLinux 10%-20% slower than L4Linux • AIM multiuser benchmark suite: job throughput in L4Linux is 7%-8% lower than linux; MkLinux is 30%-52% lower than L4Linux
Conclusions • At application level there is a 5%-10% performance penalty for using L4Linux vs bare linux • The particular microkernel used matters • Colocation it secondary to microkernel implementation
Extensibility • “Can we add services outside L4Linux to improve performance by specializing Unix functionality” • “Can we improve certain applications by using native microkernel mechanisms in addition to the classical API” • “Can we achieve high performance for non-classical, Unix-incompatible systems coexisting with L4Linux”
Virtual Memory • measure user level page fault that maps a page from one address space to another (not available on Unix) • measured traps and two different trap, protect, unprotect patterns which performed on average ~4 times faster than native linux
Cache Partitioning • User level main-memory manager can coordinate with L4 to allocate specific L2 cache pages to certain processor • matrix multiplication example with a four times speed-up of worst case performance
Possible Alternatives • Protected Control Transfers • Grafting