1 / 25

Enhancing OS Reliability with NOOKS: A Case Study

Learn how NOOKS offers isolation and recovery for improved OS reliability, addressing crucial system errors and failures while maintaining backward compatibility and efficiency. Explore the implementation, performance considerations, and successful results.

bettief
Download Presentation

Enhancing OS Reliability with NOOKS: A Case Study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 5204 OPERATING SYSTEMS Michael M. Swift, Brian Bershard and Henry M. Levy Improving the Reliability of Commodity Operating Systems Presented by: Suraj Menon

  2. RELIABILITY!!! • Reliability refers to degree of tolerance against errors and component failures in a system. [Stankovic 1984]. • Motivation to achieve reliable systems: • Crucial but unsolved problem. • Increasing cost of failures. • Increasing number of OS Extensions. • Extensions causing most problems.

  3. RELIABILITY??? • Problems persist in spite of having highly reliable Core OS kernels • Demand for reliability solution: • BACKWARD COMPATIBLE • EFFICIENT

  4. ……………..NOOKS!!!! • Different approaches to improve reliability: • Capability architectures and ring and segment architectures • Transaction based systems • Software fault isolation • Virtual machine technologies • Recursive recovery • NOOKS!!!

  5. NOOKS!!! • Just an isolation service!!! • Mission: To reduce crashes from device drivers and other extensions

  6. NOOKS • Relies on: • Conventional processor architecture • Conventional programming language • Conventional Operating systems architecture • Existing extensions

  7. NOOKS • IT WILL MAKE SYSTEM FAULT RESISTANT • BUT NOT FAULT TOLERANT • IT TRUSTS THE EXTENSIONS TO AN EXTENT (BUGGY BUT NOT KILLER)

  8. Goals of NOOKS • ISOLATION • RECOVERY • BACKWARD COMPATIBILITY

  9. OPERATIONS WITHIN NOOKS • Isolation • Interposition • Object Tracking • Recovery

  10. ISOLATION • Memory Management - implement lightweight protection domains with virtual memory protection

  11. ISOLATION • Extension Procedure call (XPC) - control transfer safety between extensions and kernel - transparency ( achieved by WRAPPERS) • Deferred Call mechanism - extension domain queue - kernel domain queue

  12. INTERPOSITION • Transparency in integration of Extensions and Kernel • Ensures: - Kernel-to-Extension and Extension-to-Kernel control flow is done by XPC mechanism only - Object tracker views and manages data transfer between kernel and extension • Interface between Kernel – Nooks Isolation Manager – Extension is done by set of wrapper stubs.

  13. WRAPPER STUBS • Basic task: • Checks parameters for validity • Object tracker code within Wrapper implements call-by-value-result semantics for XPC • Wrappers perform an XPC into kernel or extension to execute the desired function

  14. WRAPPER STUBS • Two types of wrappers - kernel wrappers - extension wrappers • Writing a wrapper body is a one time task required to support kernel extension interface for a specific OS • Wrapper Code sharing!!!!

  15. OBJECT TRACKING • Takes care of kernel objects manipulation by the extensions • Basic Tasks: • Records all objects that are used by an extension • Records associations between kernel and extension version of the object

  16. RECOVERY • As soon as a fault is detected: • Recovery manager releases resources in use by the extension • User Mode agent coordinates recovery and determines what course of action to take

  17. RECOVERY • Nooks suspends running extension and notifies Recovery manager • Involves disabling interrupts for device • User mode recovery agent facilitates flexible recovery • Recovery Agent - unloads extension - releases all its kernel and physical resources - reloads and restarts the exception

  18. MISSION ACCOMPLISHED????? • NOOKS gave totally better results in improving reliability • It could recover almost all System Crashes • Non Fatal Errors were also reduced in cases where extensions were process oriented. • Revealed several latent bugs in existing extensions

  19. YES……………….BUT • Not all errors are eliminated by NOOKS • Performance does drop (BLAME IT ON XPC’s)

  20. PERFORMANCE ISSUE • Cost of execution with NOOKS vs Native OS - Sound Benchmarks Imperceptible Overhead (150 XPC per second) - Network Benchmarks The overhead is more in sending packets rather than receiving them

  21. PERFORMANCE ISSUE • Cost of execution with NOOKS vs Native OS - Compile Benchmarks - more code more delay - XPC Overheads - TLB misses - Web Server Benchmarks

  22. SUMMARY • Nooks provide substantial reliability improvement • Focus on backward compatibility • Modest engineering effort • Major emphasis on device drivers • Performance considerations on case-by-case basis

  23. My view on paper • The paper for the first time implemented hardware as well as software based isolation that too with moderate engineering efforts. • The idea of cornering the device drivers and creating a new subsystem that isolates the kernel from the mal-functions of drivers. • The paper proves experimentally whatever it preached. • It however also gives us a food for thought regarding performance factor after such a system is induced.

  24. Thank You

More Related