1 / 26

Improving IPC by Kernel Design Jochen Liedtke

Improving IPC by Kernel Design Jochen Liedtke. Shane Matthews Portland State University. Summary. Review Performance improved Architecture Level Algorithmic Level Interface Level Coding Level. Micro-kernels.

woody
Download Presentation

Improving IPC by Kernel Design Jochen Liedtke

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving IPC by Kernel DesignJochen Liedtke Shane Matthews Portland State University

  2. Portland State University Summary • Review • Performance improved • Architecture Level • Algorithmic Level • Interface Level • Coding Level

  3. Micro-kernels • Minimal OS, providing a set of primitives used to implement thread/address space management and IPC [1] • Everything else is moved to user-space (servers)

  4. Terminology (L3) • Dataspace • Memory object, mapped into address space • Task • Composed of threads, dataspaces, and an address space • Message • String/memory object

  5. L3 Architecture & IPC • Active components communicate via messages • Applies to: • Device drivers • Implemented as user level tasks • Hardware Interrupts • Interrupt message from micro-kernel to thread

  6. L3 Redesign Principles • IPC performance is the master • Security and performance must not be affected • Synergetic effects taken into consideration • (Think combined effects) • May lead to reinforcement or diminution • Design must aim at performance goal • Per short message transfer • 350 cycles (7 micro-seconds)

  7. Portland State University Architectural Level • Messages • Process Structure • Control Blocks

  8. Compound Messages • Multiple send/receive -> 1 send/receive • Messages consists of direct/indirect strings, and memory objects Portland State University

  9. Twofold message copy • [A space] -> [kernel] -> [B space] • O(20 + .75n) cycles, n:= bytes • Good for small messages • Need something better as n grows

  10. LRPC and SRC RPC • Client/server share user level memory • sender -> shared buffer • Problems • When server to client is 1 to many, shared regions of address space become critical resources • Shared regions require explicit opens (unlike L3) • Message change during/after checking

  11. Direct Message Copy Via Windows • L3's method • Destination mapped into window • Message copied to window • Window • per address space • Accessed exclusivly by kernel

  12. Communication Windows • Problems • Must be fast • Different threads coxisting within address space • L3 Implementation • One word page directory B to A.

  13. Process Structure • Threads running kernel mode have 1 kernel stack per thread • Efficient since interupts, page faults, IPC, already save state on kernel stack • Continuations • Pro: • Reduce kernel stack • Cons: • Require additional copies between kernel and continutation • Interfere with other optimizations

  14. Tread Control Blocks • Implemented as large array in kernel • fast tcb access • Array base + tcb # + tcb size • Saves TLB misses (IPC) • kernel stacks of sender and reciever located in TCB page • Locking done via unmapping on TCB

  15. Portland State University Algorithmic Level • Thread Identifier • Lazy Scheduling • Short Messages Via Registers

  16. Portland State University Thread Identifier • Thread addressed by 64-bit UID in user-mode • Thread number in lower 32-bits of UID • AND with bit mask, add to TCB’s array base

  17. Portland State University Lazy Scheduling • IPC operation call or reply & receive next • Delete sending thread from ready queue • Insert into waiting queue • Delete receiving thread from waiting queue • Insert into ready queue • Too many queue operations!

  18. Portland State University Lazy Scheduling cont. • L3 queue invariants • Ready queue contains all ready threads • Waiting queue contains at least all threads waiting • TCB contains threads state (ready/waiting) • Scheduler removes all threads not belonging to queue during queue parsing

  19. Portland State University Short Messages Via Registers • High proportion of messages are short • Ex. Driver ack/error, hardware interrupts • 486 • 7 general registers • 3 needed: sender ID, result code • 4 available • 8-byte messages using coding scheme

  20. Portland State University Interface Level • Simple RPC stubs • Load registers, system call, check success • Compiler generates stubs inline • Parameter Passing • Use registers when possible

  21. Portland State University Coding Level • Reduce cache and TLB misses • Short kernel code • Short jumps, use registers, short address displacements • IPC kernel code in one page • Handle save/restore of coprocessor lazily • Delayed until different thread needs to use it

  22. Results • 100% would indicate double the time increase • Removal of all increase IPC time by 134% for 8 byte message Portland State University

  23. Results • L3 VS Mach • System • Intel 486 DX-50 • 256 KB external cache • 16 MB memory Portland State University

  24. Portland State University Results cont.

  25. Portland State University Conclusions • IPC improved by applying • Performance based reasoning • Synergetic effects • Architecture -> coding

  26. References • [1] http://en.wikipedia.org/wiki/Micro_kernel • [2] Improving IPC by Kernel Design - Jochen Liedtke

More Related