330 likes | 590 Views
Exokernel : An Operating System Architecture for Application-Level Resource Management. Dawson Engler , Frans Kaashoek , James O’Toole MIT Laboratory for Computer Science. Function of Traditional Kernel. Provides abstraction(s) of the hardware Processes Virtual Memory File System
E N D
Exokernel: An Operating System Architecture for Application-Level Resource Management Dawson Engler, FransKaashoek, James O’Toole MIT Laboratory for Computer Science
Function of Traditional Kernel • Provides abstraction(s) of the hardware • Processes • Virtual Memory • File System • Provides Protection • Hardware • Kernel Itself • Users From Each Other
Motivation: A Database • I/O Abstraction: Cooked I/O • Operating System buffers I/O • Database Requirement • Cannot tell a Database user that transaction has committed until log pages have hit the surface of the disk • Database may need to sequence writes • Database better at predicting future I/O
The Ever Shrinking Kernel • Linux Windows –VM,FS.. • MicroKernels – Fewer Abstractions: rm FS • Mach • L4 • Virtual Machines (VMM is between OS and hardware) -- Virtualization • DISCO • Xen • ExoKernel -- Multiplexing • Aegis • XOK
Exokernel Architecture Environments Request Revoke
Securely Expose Hardware • Hardware: • Disks, Physical Memory, TLB, Frame Buffer, Network Access • Less Tangible Resources: • CPU Time Slices • Interrupts, Exceptions, Cross Domain Calls • DMA • Privileged Instructions • Exokernel Exports (readonly): • Freelists, cached TLBentries, disk arm positions
Exokernel Functions • Resource Allocation (Inter-environment) • Grant (or not) Resource Requests (Policy <- SysAd) • Process Release (Dealloc) Requests • Revoke Resources • Visible Revocation (May get to chose which to free) • Abort • Note: Usually some resources exempt: page table mem • Track Resource Ownership • Guard all resource usage or binding points
Resource Allocation • Allocation (almost always explicit) • Alloc system call • Deallocation • Dealloc System Call • Visible Revocation • E.g.: Loss of the CPU when time slices expires: • Library OS must save required processor state • Abort Protocol • Break all existing secure bindings • Library OS gets a Repossession Exception – includes a Repossession Vector
Secure Bindings • Break up protection into bind and access • Can be implemented in: • Hardware • TLB • Frame Buffer Ownership Tag • Software • STLB • Downloading Code into ExoKernel • Dynamic Packet Filter
Examples • Physical Page • Bind: Get Exokernel to Load Mapping into TLB • Page allocation • Exokernel grants self-authenticating capability (R/W) • LibOS stores capability in Page Table • Passes Capability, Mapping on TLB write request • Access: LibOS/Application code uses TLB • Network Access • Bind: Download DPF (Dynamic Packet Filter) • Access: Exokernel Runs DPF on every incoming pkt • Sends packets to correct Environment
m = malloc (3000); . . . emacs strcpy(m, “The Ever Shrinking Kernel”); Virtual Physical CAP Library OS 17 2 R only freelist Req Alloc 2 2 2 5 STLB v RW ExoKernel freelist Check 2 5 Miss TLB Hardware MIPs 0 1 2 3 4 5
Downloading Code • Advantages: • Avoid Kernel Crossing • Executed when environment is not scheduled • Allowed because execution time is bounded • Specification • High Level Language • Individual DPF code can be merged • Safety by Language • C • Application Specific Handlers • Dynamic Message Vectoring • Message Initiation • Protection: SFI (Sandboxing), Infinite Loop??
TLB Miss in Aegis • Aegis checks if mapping is in STLB. If so, load into TLB. • If the virtual address is one of the pinned pages, Aegis loads the mapping into the TLB. • Environment checks its page tables for segmentation fault. If not, use page tables to get physical page and associated capability. • Aegis checks the capability. If valid, loads mapping into TLB. • Control returned to the environment.
Protected Control Transfer • Two Properties Use Registers to Pass Msg • Operation is Atomic • No overwrite of environment-visible registers • Acall • Donate remainder of Current Timeslice • Scall • Donate all timeslices
Performance Summary • Microbenchmarks: 10X • Cheetah web server (XOK) 8X
Persistent Storage • Disk Block Shadowing • Disk Block tag • Low level metadata language • Untrusted Deterministic Function
Persistent storage PhD Thesis emacs ExOS Library OS ExOS Library OS XOK crash Disk
Conclusions • Microbenchmarks and #Kernel Crossings not critical • Power (E.g. downloaded code) is critical factor • Top Down vs. Bottom Up • Encourages Innovation • Writing an OS is like writing a compiler • Operating System is Untrusted • Untrusted Code Evolves Faster than Trusted
…and Caveats • Hardware Specific: MIPs vs. 486 • Persistent Storage is Complex • MultiCPU and scaleability?? • Are all of the DISCO tricks available here??
Additional References • Application Performance and Flexibility on Exokernel Systems, FransKaashoek, Dawson Engler, Gregory Ganger et al • Pdos.csail.mit.edu/exo/exo-slides/sld001.htm
Overriding Abstractions • OS Extensions • How to override generic abstractions implemented in protected kernel, with better application specific abstractions in user space • Even if possible, won’t be efficient