250 likes | 262 Views
CS 5204 OPERATING SYSTEMS. Michael M. Swift, Brian Bershard and Henry M. Levy. Improving the Reliability of Commodity Operating Systems. Presented by: Suraj Menon. RELIABILITY!!!. Reliability refers to degree of tolerance against errors and component failures in a system. [Stankovic 1984].
E N D
CS 5204 OPERATING SYSTEMS Michael M. Swift, Brian Bershard and Henry M. Levy Improving the Reliability of Commodity Operating Systems Presented by: Suraj Menon
RELIABILITY!!! • Reliability refers to degree of tolerance against errors and component failures in a system. [Stankovic 1984]. • Motivation to achieve reliable systems: • Crucial but unsolved problem. • Increasing cost of failures. • Increasing number of OS Extensions. • Extensions causing most problems.
RELIABILITY??? • Problems persist in spite of having highly reliable Core OS kernels • Demand for reliability solution: • BACKWARD COMPATIBLE • EFFICIENT
……………..NOOKS!!!! • Different approaches to improve reliability: • Capability architectures and ring and segment architectures • Transaction based systems • Software fault isolation • Virtual machine technologies • Recursive recovery • NOOKS!!!
NOOKS!!! • Just an isolation service!!! • Mission: To reduce crashes from device drivers and other extensions
NOOKS • Relies on: • Conventional processor architecture • Conventional programming language • Conventional Operating systems architecture • Existing extensions
NOOKS • IT WILL MAKE SYSTEM FAULT RESISTANT • BUT NOT FAULT TOLERANT • IT TRUSTS THE EXTENSIONS TO AN EXTENT (BUGGY BUT NOT KILLER)
Goals of NOOKS • ISOLATION • RECOVERY • BACKWARD COMPATIBILITY
OPERATIONS WITHIN NOOKS • Isolation • Interposition • Object Tracking • Recovery
ISOLATION • Memory Management - implement lightweight protection domains with virtual memory protection
ISOLATION • Extension Procedure call (XPC) - control transfer safety between extensions and kernel - transparency ( achieved by WRAPPERS) • Deferred Call mechanism - extension domain queue - kernel domain queue
INTERPOSITION • Transparency in integration of Extensions and Kernel • Ensures: - Kernel-to-Extension and Extension-to-Kernel control flow is done by XPC mechanism only - Object tracker views and manages data transfer between kernel and extension • Interface between Kernel – Nooks Isolation Manager – Extension is done by set of wrapper stubs.
WRAPPER STUBS • Basic task: • Checks parameters for validity • Object tracker code within Wrapper implements call-by-value-result semantics for XPC • Wrappers perform an XPC into kernel or extension to execute the desired function
WRAPPER STUBS • Two types of wrappers - kernel wrappers - extension wrappers • Writing a wrapper body is a one time task required to support kernel extension interface for a specific OS • Wrapper Code sharing!!!!
OBJECT TRACKING • Takes care of kernel objects manipulation by the extensions • Basic Tasks: • Records all objects that are used by an extension • Records associations between kernel and extension version of the object
RECOVERY • As soon as a fault is detected: • Recovery manager releases resources in use by the extension • User Mode agent coordinates recovery and determines what course of action to take
RECOVERY • Nooks suspends running extension and notifies Recovery manager • Involves disabling interrupts for device • User mode recovery agent facilitates flexible recovery • Recovery Agent - unloads extension - releases all its kernel and physical resources - reloads and restarts the exception
MISSION ACCOMPLISHED????? • NOOKS gave totally better results in improving reliability • It could recover almost all System Crashes • Non Fatal Errors were also reduced in cases where extensions were process oriented. • Revealed several latent bugs in existing extensions
YES……………….BUT • Not all errors are eliminated by NOOKS • Performance does drop (BLAME IT ON XPC’s)
PERFORMANCE ISSUE • Cost of execution with NOOKS vs Native OS - Sound Benchmarks Imperceptible Overhead (150 XPC per second) - Network Benchmarks The overhead is more in sending packets rather than receiving them
PERFORMANCE ISSUE • Cost of execution with NOOKS vs Native OS - Compile Benchmarks - more code more delay - XPC Overheads - TLB misses - Web Server Benchmarks
SUMMARY • Nooks provide substantial reliability improvement • Focus on backward compatibility • Modest engineering effort • Major emphasis on device drivers • Performance considerations on case-by-case basis
My view on paper • The paper for the first time implemented hardware as well as software based isolation that too with moderate engineering efforts. • The idea of cornering the device drivers and creating a new subsystem that isolates the kernel from the mal-functions of drivers. • The paper proves experimentally whatever it preached. • It however also gives us a food for thought regarding performance factor after such a system is induced.