1 / 25

The Performance of Microkernel-Based Systems

Explore the performance of L4Linux, a Linux "personality" built on top of the L4 microkernel. Discover the impact of microkernel architecture on system performance compared to traditional monolithic kernels like Mach. Delve into the implementation details, upcalls, and optimizations for efficient address space handling and interrupt management.

kevinl
Download Presentation

The Performance of Microkernel-Based Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Performance of Microkernel-Based Systems L4Linux

  2. What is a microkernel • kernels that only provide address spaces, threads, and IPC • kernel does not handle e.g. the file system or interrupts

  3. Mircokernel abstraction level • some researchers feel abstraction level is too high • kernel should map more directly to hardware • some researchers feel abstraction is too low • focus on extensibility (Mach)

  4. L4 • so called 2nd generation microkernel • built from scratch as opposed to a developed from earlier monolithic kernel approaches (e.g. Mach)

  5. L4 essentials • threads, address spaces and cross-address-space communication (IPC) • other kernel operations e.g. RPC and address-space crossing thread migration are built up from IPC primitives

  6. Address spaces • recursive construction • granting, mapping, unmapping • a page owner can grant or map any of its pages to another address space with the receiver permission. That page is then accessible to both address spaces

  7. Address spaces (cont) • only the grant, map, and unmap are implemented in the kernel • user-level “pagers” handle page faults

  8. Interrupts, exceptions, and traps • all handled at user level • interrupts are transformed, by kernel, into IPC messages and sent to appropriate user level thread • exceptions and traps are synchronous to associated thread and kernel mirrors them to that thread

  9. Implementation of L4Linux • develop L4Linux, a linux “personality” on top of the L4 microkernel • due to time restrictions the linux kernel was not fine tuned in L4Linux, so results are only an upper bound on the performance penalty

  10. L4Linux • linux 2.0.21 on top of L4 • linux kernel is a user level server • 100% binary compatible • modified versions of shared C library libc.so and libc.a • user level “trampoline” exception • 14 engineer months and 6500 rewritten lines out of a total of ~340,000

  11. Trampoline • 100% binary compatible means that a program statically linked against the native linux library must run, unmodified, on L4Linux • the trampoline “bounces” the system-call trap that on native linux went into the kernel back into the modified shared library on L4Linux • Microkernel upcalls into user level handler, handler than makes an RPC (read, invokes kernel again) to OS personality to invoke system call

  12. L4Linux (cont) • L4 maps the entire initial address space to kernel server • single thread in L4, acts as a single virtual processor to the linux server • Linux server occupies a small memory region, which utilizes Pentium’s segment feature to protect its TLB entries, so the TLB always has the linux server’s translations (small-address-space optimization)

  13. L4Linux (cont) • L4 allows user level processes to disable interrupts, so uniprocessor version of linux did not need modification of critical sections

  14. L4Linux (cont) • interrupt threads have a priority above the server itself, so they don’t execute concurrently • signals are forwarded to a co-located signal handler inside each user process, since only a thread in the same address space can manipulate another thread’s state

  15. L4Linux (cont) • scheduling is mostly done by L4 scheduler • Four priority levels: top half interrupts, bottom half interrupts, the linux server, user processes. No priority decay. • so L4 interrupts the linux server in the same way the hardware would interrupt a native linux kernel

  16. L4Linux (cont) • “user level schedulers can dynamically change priority and time slice of any thread”?

  17. Experiments • micro- and macro- benchmarks used to compare native linux and MkLinux (Mach derived variant) to L4Linux • linux vs L4Linux demonstrates performance penalty for using microkernel • L4Linux vs MkLinux demonstrates influence of mircokernel on overall system including the influence of colocation • extensibility experiments • functionality specialized for L4Linux

  18. PerformanceL4Linux, MkLinux and Linux • microbenchmarks • getpid: L4Linux 2.4 or 3.4 times slower than linux; MkLinux 3.9 or 28 times slower than L4Linux • lmbench and hbench: L4Linux 1 to 3 times slower than linux; MkLinux 1 to 32 times slower than L4Linux

  19. Performance (cont)L4Linux, MkLinux and Linux • macrobenchmarks • recompiling linux server: L4Linux 6%-7% slower than linux; MkLinux 10%-20% slower than L4Linux • AIM multiuser benchmark suite: job throughput in L4Linux is 7%-8% lower than linux; MkLinux is 30%-52% lower than L4Linux

  20. Conclusions • At application level there is a 5%-10% performance penalty for using L4Linux vs bare linux • The particular microkernel used matters • Colocation it secondary to microkernel implementation

  21. Extensibility • “Can we add services outside L4Linux to improve performance by specializing Unix functionality” • “Can we improve certain applications by using native microkernel mechanisms in addition to the classical API” • “Can we achieve high performance for non-classical, Unix-incompatible systems coexisting with L4Linux”

  22. Pipes and RPC

  23. Virtual Memory • measure user level page fault that maps a page from one address space to another (not available on Unix) • measured traps and two different trap, protect, unprotect patterns which performed on average ~4 times faster than native linux

  24. Cache Partitioning • User level main-memory manager can coordinate with L4 to allocate specific L2 cache pages to certain processor • matrix multiplication example with a four times speed-up of worst case performance

  25. Possible Alternatives • Protected Control Transfers • Grafting

More Related