Presenter: Phillip Sitbon

Presenter: Phillip Sitbon Exokernel An Operating System Architecture for Application-Level ManagementDawson R. Engler, M. Frans Kaashoek, and James O’Toole Jr.M.I.T. Laboratory for Computer Science Spring 2006

Motivation • Interface between OS and applications are typically hard-coded as high-level abstractions • Processes, files, address spaces, IPC, etc. • These abstractions define a kind of VM for applications • Hard-coded into the kernel means that generality sacrifices flexibility and optimization opportunities • Part of the motivation for SPIN as well

Motivation • Whether microkernel or monolithic, abstractions must be used system-wide • All apps use the same kernel or user-level server features • There isn’t a “best way” to implement for all applications • Abstractions hide information • Upcalls were used before to address this • Hard to add new features that break abstractions, because all applications are dependent

Solution • Allow abstractions to be implemented at application (non-kernel) level • Specialization is now a matter of using different libraries • Do all the work in library operating systems that communicate with the exokernel • Do secure multiplexing in the exokernel so the users of resources don’t need to

Design

Design • Securely exposed hardware resources • Low-level primitves to access hardware as directly as possible • This includes less tangible resources, such as interrupts and privileged instructions • Exposed allocation • Can request specific physical resources • Library OS participates in every allocation decision • Names • Physically named access to resources such as freelists, disk arm positions, etc. • Visible Revocation • Allows Library OSes to choose which resources to relinquish and allows the exokernel to gracefully reclaim resources.

Secure Bindings • Protection mechanism to decouple authorization from actual use of a resource • Allows access checks to be efficiently be implemented at access time • Done by doing actual authorization at bind time • Allows the kernel to protect resources without understanding them • Implemented by • Hardware: Set appropriate access from the kernel • Software caching: i.e. software TLB • Downloading code: Code runs in the kernel upon events

Multiplexing Memory • Library OSes specify capabilities when allocating physical pages • Becomes owner, able to set r/w access • Bind-time authorization • Capabilities presented to the kernel for access • Example of a low-overhead access of a secure binding • Allows applications to grant rights to others without kernel intervention

Multiplexing the Network • Protocol-specific knowledge required to determine message recipient • Some hardware support, such as binding virtual ATM circuits securely to applications • Only packet filters available in Aegis/ExOS • Special language used to compile packet filter rules to machine code • Allows applications to specify packet filters • Basic checks disallow “stealing” of packets

Downloading Code • Eliminates kernel crossings • Can be run when application is not scheduled • Bounded runtime for situations where context switches aren’t a good idea • Packet filters: without this, each potential packet consumer must be scheduled • Can also react: ASHes • Mostly run as machine-readable object code, but the instruction set is extended in a few places

Visible Resource Revocation • Must have a way to break secure bindings • Notifying library OSes allows specific action to be taken • For example, processor is revoked at the end of a time slice; can react by saving only needed registers • Must also be able to take resources by force • After some time, requests become an imperative • Secure bindings then broken forcefully • The offending library OS can update/recover when it gets a reposession exception

Aegis • Uses a set of primitives as pseudo-instructions to encapsulate privileged instructions • Applications responsible for general-purpose context switching • Gives applications all the control • Fairness achieved by tacking time usage • Extra time slices paid for by subseqent time slices being taken away • Too much excess causes environment to be destroyed

Aegis • Four kinds of events • Exceptions • Interrupts • Protected entry • Addressing • Each event has an associated context and resource • Exceptions are given to applications so they can resume execution

Aegis • Application’s virtual address space split into two segments • First holds normal data and code, can be pinned using guaranteed mappings • On a TLB miss, • Guaranteed mappings are handled automatically • Otherwise, the application looks up the address in its page table • Aegis checks the application’s capability to determine access

Protected Control Transfers • For efficient implementation of IPC • Time slice of call donated to callee • Synchronous and asynchronous • Asynchronous: donates only remainder of current time slice to callee • Synchronous: donates all time slices until control is returned to the caller • Minimal requirements allow high performance • Results: 6.6x faster than scaled L3 results

ASH • Application-specific safe handlers can perform general computations • Allows reaction to messages without leaving the kernel • Untrusted; made safe by sandboxing and code inspection • Allows for: • Direct dynamic message vectoring (eliminates copying) • Integrated processing (i.e. checksumming) • Message and control initialtion

Performance with ASH

Conclusion • Different angle: export hardware interface rather than virtualize • Supports specialized OS (like Disco) • Packet filtering for network multiplexing (like Xen) • More efficient than virtual machines • Only real abstractions are in the library operating systems • Shows substantial improvement while allowing traditional abstractions; modifications trivial

Presenter: Phillip Sitbon