260 likes | 347 Views
Improving IPC by Kernel Design Jochen Liedtke. Shane Matthews Portland State University. Summary. Review Performance improved Architecture Level Algorithmic Level Interface Level Coding Level. Micro-kernels.
E N D
Improving IPC by Kernel DesignJochen Liedtke Shane Matthews Portland State University
Portland State University Summary • Review • Performance improved • Architecture Level • Algorithmic Level • Interface Level • Coding Level
Micro-kernels • Minimal OS, providing a set of primitives used to implement thread/address space management and IPC [1] • Everything else is moved to user-space (servers)
Terminology (L3) • Dataspace • Memory object, mapped into address space • Task • Composed of threads, dataspaces, and an address space • Message • String/memory object
L3 Architecture & IPC • Active components communicate via messages • Applies to: • Device drivers • Implemented as user level tasks • Hardware Interrupts • Interrupt message from micro-kernel to thread
L3 Redesign Principles • IPC performance is the master • Security and performance must not be affected • Synergetic effects taken into consideration • (Think combined effects) • May lead to reinforcement or diminution • Design must aim at performance goal • Per short message transfer • 350 cycles (7 micro-seconds)
Portland State University Architectural Level • Messages • Process Structure • Control Blocks
Compound Messages • Multiple send/receive -> 1 send/receive • Messages consists of direct/indirect strings, and memory objects Portland State University
Twofold message copy • [A space] -> [kernel] -> [B space] • O(20 + .75n) cycles, n:= bytes • Good for small messages • Need something better as n grows
LRPC and SRC RPC • Client/server share user level memory • sender -> shared buffer • Problems • When server to client is 1 to many, shared regions of address space become critical resources • Shared regions require explicit opens (unlike L3) • Message change during/after checking
Direct Message Copy Via Windows • L3's method • Destination mapped into window • Message copied to window • Window • per address space • Accessed exclusivly by kernel
Communication Windows • Problems • Must be fast • Different threads coxisting within address space • L3 Implementation • One word page directory B to A.
Process Structure • Threads running kernel mode have 1 kernel stack per thread • Efficient since interupts, page faults, IPC, already save state on kernel stack • Continuations • Pro: • Reduce kernel stack • Cons: • Require additional copies between kernel and continutation • Interfere with other optimizations
Tread Control Blocks • Implemented as large array in kernel • fast tcb access • Array base + tcb # + tcb size • Saves TLB misses (IPC) • kernel stacks of sender and reciever located in TCB page • Locking done via unmapping on TCB
Portland State University Algorithmic Level • Thread Identifier • Lazy Scheduling • Short Messages Via Registers
Portland State University Thread Identifier • Thread addressed by 64-bit UID in user-mode • Thread number in lower 32-bits of UID • AND with bit mask, add to TCB’s array base
Portland State University Lazy Scheduling • IPC operation call or reply & receive next • Delete sending thread from ready queue • Insert into waiting queue • Delete receiving thread from waiting queue • Insert into ready queue • Too many queue operations!
Portland State University Lazy Scheduling cont. • L3 queue invariants • Ready queue contains all ready threads • Waiting queue contains at least all threads waiting • TCB contains threads state (ready/waiting) • Scheduler removes all threads not belonging to queue during queue parsing
Portland State University Short Messages Via Registers • High proportion of messages are short • Ex. Driver ack/error, hardware interrupts • 486 • 7 general registers • 3 needed: sender ID, result code • 4 available • 8-byte messages using coding scheme
Portland State University Interface Level • Simple RPC stubs • Load registers, system call, check success • Compiler generates stubs inline • Parameter Passing • Use registers when possible
Portland State University Coding Level • Reduce cache and TLB misses • Short kernel code • Short jumps, use registers, short address displacements • IPC kernel code in one page • Handle save/restore of coprocessor lazily • Delayed until different thread needs to use it
Results • 100% would indicate double the time increase • Removal of all increase IPC time by 134% for 8 byte message Portland State University
Results • L3 VS Mach • System • Intel 486 DX-50 • 256 KB external cache • 16 MB memory Portland State University
Portland State University Results cont.
Portland State University Conclusions • IPC improved by applying • Performance based reasoning • Synergetic effects • Architecture -> coding
References • [1] http://en.wikipedia.org/wiki/Micro_kernel • [2] Improving IPC by Kernel Design - Jochen Liedtke