1 / 28

The Synergy Between Non-blocking Synchronization and Operating System Structure

The Synergy Between Non-blocking Synchronization and Operating System Structure. By Michael Greenwald and David Cheriton Presented by Jonathan Walpole. The Cache Kernel. Yet Another Attempt at a µ-kernel Minimal privileged mode code: Caching model of kernel functionality

blaise
Download Presentation

The Synergy Between Non-blocking Synchronization and Operating System Structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton Presented by Jonathan Walpole CS510 – Concurrent Systems

  2. The Cache Kernel • Yet Another Attempt at a µ-kernel • Minimal privileged mode code: Caching model of kernel functionality • OS functionality in user-mode class libraries allowing application-customizability of OS • Shared memory and signal-based communication are the only communication mechanisms • Goal: scalable, robust, flexible operating system design CS510 – Concurrent Systems

  3. Synergy of NBS and kernel structure • Claim: NBS and Good Design go together in the Cache Kernel design and implementation • NBS allows better OS structuring • Good OS structure simplifies NBS CS510 – Concurrent Systems

  4. Non-blocking Synchronization • Basic idea of NBS • Associate version number with data structure • Read version number at start of critical section • Atomically check and increment version number at end of critical section at the same time as applying the update(s), using a Double Compare and Swap (DCAS) instruction. • Roll back and retry on failure (i.e., if version number changed while you were in the critical section). • An optimistic synchronization technique similar to that of the Synthesis Kernel CS510 – Concurrent Systems

  5. Example: Deletion from Linked List do { retry: backoffIfNeeded(); version = list->version; /* Read version # */ for (p = list->head;(p->next != elt); p = p->next) { if (p == NULL) { /* Not found */ if (version != list->version) { goto retry; } /* List changed */ return NULL; /* Really not found */ } } } while( !DCAS( &(list->version), &(p->next), version, elt, version+1, elt->next ) ) CS510 – Concurrent Systems

  6. Double Compare and Swap int DCAS( int *addr1, int *addr2, int old1, int old2, int new1, int new2){<begin atomic> if ((*addr1 == old1) && (*addr2 == old2)) { *addr1 = new1; *addr2 = new2; return(TRUE); } else { return(FALSE); }<end atomic>} CS510 – Concurrent Systems

  7. Hardware Support for DCAS • Could be supported through extension to LL/SC (load-linked/store-conditional) sequence • LLP/SCP • LLP (load-linked-pipelined) • Load and link a second address after an earlier LL • LLP is linked to the following SCP • SCP (store-conditional-pipelined) • Store depends on both previous LLP and LL not having been invalidated in this CPU’s cache • If LLP/SCP fails, so does enclosing LL/SC CS510 – Concurrent Systems

  8. Software Support for DCAS • Basic idea: make DCAS operations atomic by putting them in a critical section protected by a lock! • OS manages the lock (how many locks do you need?) • Presumably, DCAS is a system call • Must make sure that delayed lock holders release the lock and roll back • Ensures that DCAS still has non-blocking properties • Problems implementing this efficiently • How to support readers? • How to avoid lock contention? CS510 – Concurrent Systems

  9. Back to the List Example do { retry: backoffIfNeeded(); version = list->version; /* Read version # */ for (p = list->head;(p->next != elt); p = p->next) { if (p == NULL) { /* Not found */ if (version != list->version) { goto retry; } /* List changed */ return NULL; /* Really not found */ } } } while( !DCAS( &(list->version), &(p->next), version, elt, version+1, elt->next ) ) CS510 – Concurrent Systems

  10. Problem … • What happens if another thread deletes an element while we are traversing the list? • We may end up with an invalid pointer in p • What if we traverse it into the free pool? • What if the memory has been reused? • What if we end up in a different data structure? • What if the memory is reused for a different type? • How can we prevent this kind of reader hijacking? • Or at least, how can we prevent it from crashing the reader? CS510 – Concurrent Systems

  11. Type-stable memory management (TSM) • Descriptor that is allocated as type T is guaranteed to remain of type T for at least tstable • A generalization of existing technique – allocation pools: • e.g. process-descriptors are statically allocated at system init, and are type-stable for lifetime of system • But is this sufficient to ensure that read-side code will reach the DCAS in the presence of concurrent modification? • And how long is tstable ? • When is it safe to reuse memory? CS510 – Concurrent Systems

  12. Other solutions to reader hijacking • If memory is reclaimed immediately upon updates, readers must protect themselves from hijacking • But how? • Must ensure that a pointer is valid before dereferencing it • Additional checks at the end of the read side critical section are required to ensure no concurrent write access took place CS510 – Concurrent Systems

  13. Claim: TSM aids NBS • Type stability ensures safety of pointer dereferences. • Without TSM, delete example is too big to fit on slide • And very expensive to execute • Need to check for changes on each loop iteration • TSM makes NBS code simpler and faster CS510 – Concurrent Systems

  14. Other benefits of NBS in Cache Kernel • Signals are only form of IPC in Cache Kernel • NBS simplifies synchronization in signal handlers • Makes it easier to write efficient signal-safe code CS510 – Concurrent Systems

  15. Contention Minimizing Data Structures (CMDS) • Locality-based structuring used in Cache kernel: • Replication: Per-processor structures (ie. run queue) • Hierarchical data structures with read-mostly high levels (ie. hash tables with per bucket synchronization) • Cache block alignment of descriptors to minimize false sharing and improve cache efficiency • Well-known techniques used in other systems (Tornado, Synthesis, …) CS510 – Concurrent Systems

  16. Benefits of CMDS • Minimizes logical and physical contention • Minimizes (memory) consistency overhead • Minimizes false sharing • Reduces lock conflicts/convoys in lock-based system • Reduces synchronization contention with NBS • fewer retries from conflict at point of DCAS • CMDS is good for locks, cache consistency and NBS! • NBS needs CMDS CS510 – Concurrent Systems

  17. Minimizing the Window of Inconsistency • Delaying writes, and grouping them together at the end minimizes the window of inconsistency • Advantages of a small window of inconsistency • Less probability of failure leaving inconsistency • Preemption-safe: easy to back-out of critical section • Reduces lock hold time & contention (in lock-based systems) • Its good for system design whether you use blocking or non-blocking synchronization! CS510 – Concurrent Systems

  18. How Does NBS Affect Contention? • What is the relationship between window of inconsistency and probability of conflict (contention)? • NBS approaches have a critical section just like locking approaches • NBS critical section is defined by the scope of the version number • This is NOT the same as the window of inconsistency! CS510 – Concurrent Systems

  19. Priority Inversion Issues • NBS allows synchronization to be subordinate to scheduling • It avoids priority inversion problem of locks • Also, page fault,I/O, and other blocking operations • Does the highest priority process always make progress? • Is there a retry-based equivalent to priority inversion? CS510 – Concurrent Systems

  20. Why NBS is good for OS structure • Fail-stop safe: OS class libraries tolerant of user threads being terminated. • Most Cache Kernel OS functionality implemented in user-mode class libraries • Portable: same code on uniprocessor, multiprocessor, and signal handlers • Deadlock free (almost) • See examples of deletion and insertion without going through reclamation CS510 – Concurrent Systems

  21. Implementing non-blocking Synchronization • Basic approach • Read version number • Rely on TSM for type safety of pointers • Increment version number and check with every modification (abort if changed). • Straightforward transformation from locking • … so long as you have DCAS and can tolerate TSM • Replace acquire with read of version number • Replace release with DCAS … CS510 – Concurrent Systems

  22. Implementing non-blocking Synchronization (cont.) • Variants of the basic approach: • N reads, 1 write: No backout • 2 reads, 2 writes: No version number • In the Cache kernel, every case of synchronization falls into the special cases CS510 – Concurrent Systems

  23. Complexity and Correctness • DCAS reduces size of algorithms (lines of code) by 75% compared to CAS, for linked lists, queues and stacks • Special case uses of DCAS reduce complexity further • Relatively straightforward transformation from locking • Similar code size CS510 – Concurrent Systems

  24. Performance Issues • Simulation-based study • With non-preemptive scheduling: • DCAS-based NBS almost as fast as spin-locks • CAS-based NBS slower • With preemptive scheduling: • DCAS and CAS-based NBS better than spin-locks • Note, this was using mid 1990’s CPUs • The cost of CAS is much higher on today’s CPUs • What are the implications of this? CS510 – Concurrent Systems

  25. Hardware-Based Optimizations • A hardware-based optimization using advisory locking • Cache based advisory locking can help avoid contention in the form of “useless parallelism” • Instead of just checking at the end and forcing retries, check at the beginning and back off if retry is inevitable • Uses Cload instruction (conditional load) which succeeds only if the location does not have an advisory lock set, and sets the lock on success • Initially load version # with Cload, wait and retry on failure (seems like using TSL in a spin-lock?) CS510 – Concurrent Systems

  26. Conclusions • Claim: “Good OS structure” can support Non-blocking synchronization • Type-Stable Mem. Mgmt (TSM) • Data Structures that Minimize Contention (CMDS) • Minimizing the window of inconsistency • Claim: Non-blocking synchronization can support convenient OS structure • Avoids deadlock; allows signals as sole IPC mechanism • Fault tolerant; kernel functionality can be moved to user space • Performance; isolates scheduling from synchronization • Claim: Strong synergy between non-blocking synchronization & good OS structure. CS510 – Concurrent Systems

  27. Advantages of Non-blocking Synchronization • Non-blocking: • No deadlock! Especially useful for signal handlers. • Portability: • same code on uni and multiprocessors and in signal handlers • Performance: • Minimizes interference between synchronization and process scheduling • Recovery: • Insulation from process failures • So why isn’t it universally deployed? CS510 – Concurrent Systems

  28. Obstacles to deployment • Complexity: • confusing to design and write efficient algorithms, especially in the absence of a DCAS primitive • Correctness: • Its hard to be convinced that there are no subtle bugs, especially on weak memory consistency architectures. • Is TSM enough to preserve critical section invariants? • Performance: • Poor performance under contention due to excessive retry • high overhead due to expense of CAS on modern CPUs CS510 – Concurrent Systems

More Related