1 / 42

Memory Management Issues in Non-Blocking Synchronization

Memory Management Issues in Non-Blocking Synchronization. Maged Michael IBM T J Watson Research Center ISMM 2009. Non-blocking synchronization. Outline. Dynamic memory solves problems in non-blocking algorithms. Dynamic memory raises problems in non-blocking algorithms.

lilika
Download Presentation

Memory Management Issues in Non-Blocking Synchronization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Memory Management Issuesin Non-Blocking Synchronization Maged Michael IBM T J Watson Research Center ISMM 2009

  2. Non-blocking synchronization Outline • Dynamic memory solves problems in non-blocking algorithms • Dynamic memory raises problems in non-blocking algorithms • Memory management solutions and tradeoffs Maged Michael Memory Management Issues in Non-Blocking Synchronization

  3. Scheduler Threads ... Write Compare-and-swap Memory access primitives Read Shared memory System Model Maged Michael Memory Management Issues in Non-Blocking Synchronization

  4. zzzzzzzzz The Scheduler • The scheduler decides when and if to let a ready thread take a step • Bad decisions by the scheduler can lead to the indefinite prevention of active threads from making progress In some cases (e.g., real-time applications, signal handlers, OS kernels) this is unacceptable as it may lead to deadlock, livelock, or delay of high priority operations • The scheduler does not know all dependencies among threads • The scheduler can make very bad decisions Maged Michael Memory Management Issues in Non-Blocking Synchronization

  5. zzzzzzzzz What? I need Example: Deadlock in Signal Handling Can’t run Can’t finish DEADLOCK • A thread acquires a lock to operate on some shared data • The scheduler decides to interrupt the thread to deliver a signal • The signal handler runs • The signal handler needs to acquire the lock The interrupted thread will not run until the signal handler completes The signal handler will not complete until the interrupted thread releases the lock NO LOCKS IN SIGNAL HANDLERS Maged Michael Memory Management Issues in Non-Blocking Synchronization

  6. obstruction-free lock-free wait-free no starvation no livelock no blocking Non-Blocking Progress Guarantees • Three levels of non-blocking guarantees • An operation is wait-free, ifwhenever a thread executing the operation takes a finite number of steps,the thread must have completed the operation,regardless of the actions/inaction of other threads. • An operation is lock-free, ifwhenever a thread executing the operation takes a finite number of steps,some thread must have completed the operation,regardless of the actions/inaction of other threads. • An operation is obstruction-free, ifwhenever a thread executing the operation takes a finite number of steps alone,the thread must have completed the operation,regardless of where the other threads stopped. Maged Michael Memory Management Issues in Non-Blocking Synchronization

  7. Non-blocking is a property of operations • Non-blocking progress is a property of an operation in an implementation of an abstract shared data type E.g., The lookup operation in a hash table implementation of a shared set is wait-free, while the insert and remove operations are blocking. • If all operations in an implementation of an abstract shared data type are non-blocking, then the whole implementation is non-blocking E.g., A lock-free hash table implementation of a shared set Maged Michael Memory Management Issues in Non-Blocking Synchronization

  8. Non-blocking synchronization is not about ... • Non-blocking progress is not about fairness Fair Non-blocking • Non-blocking synchronization is not just about not using locks No locks Non-blocking Non-blocking synchronization is all about ... • Delay of any number of threads does not prevent active threads from making progress Maged Michael Memory Management Issues in Non-Blocking Synchronization

  9. Simple Non-Blocking Example Lock-Free Counter Read() return X CAS(X,expval,newval) atomically r := (X == expval) if r X := newval return r operations Read() : integer FetchAndIncrement(): integer FetchAndIncrement() do oldval := X until CAS(X,oldval,oldval+1) return oldval Structures X : integer • Read is wait-free. Completes in one step. • FetchAndIncrement is lock-free.Whenever one loop iteration (two steps) is executed, some operation must have completed. Maged Michael Memory Management Issues in Non-Blocking Synchronization

  10. Dynamic memory solves problems Atomic access to large blocks ABA problem Dynamic Memory and Non-Blocking Algorithms • Dynamic memory causes problems ABA problem Memory reclamation problem Persistent pointers Non-blocking allocation and deallocation Maged Michael Memory Management Issues in Non-Blocking Synchronization

  11. P Atomic Access to Multiple Words Some algorithms need to operate atomically on multiple or large locations that exceed the size of HW atomic primitives E.g., Wide CAS X atomically ret := X == u if ret X := v u A common solution in non-blocking algorithms ABA problem unsafe access unsafe reclamation • Place multi-word data in a dynamic block allocation deallocation • Updates replace the block ptr := P ret := (*ptr == u) if ret newb := new Block(v) ret := CAS(P,ptr,newb) delete ret ? ptr : newb u v Solved one problem Created more problems Maged Michael Memory Management Issues in Non-Blocking Synchronization

  12. A u B w A z C v The ABA Problem Example 1 Thread i reads A from P 1 ptr := P ret := (*ptr == u) if ret newb := new Block(v) ret := CAS(P,ptr,newb) delete ret ? ptr : newb 2 2 Thread i reads u from *A 6 3 Thread j sets P to B 7 4 Thread j reuses block A to hold value z P 5 Thread j sets P to A again 6 Thread i allocates block C to hold value v Thread i checks that P is equal to ACAS succeeds although *P == z != u 7 INCORRECT OUTCOME Problem: CAS cannot tell if P changed or not Maged Michael Memory Management Issues in Non-Blocking Synchronization

  13. The ABA Problem • A thread i reads a value A from a shared variable X • Other threads change X to a different value B and then back to A again • Thread i checks X using a primitive that cannot tell if X changed, finds X equal to A, and acts as if X never changed Primitives susceptible to the ABA problem include read and variants of CAS This interleaving of events is a necessary but not sufficient condition for the ABA problem. In some cases, the effect is benign. Maged Michael Memory Management Issues in Non-Blocking Synchronization

  14. Anchor A B C LIFO Linked List: Classic ABA Example Introduced in IBM System 370 documentation in the 1970s Pop do first := Anchor next := *first until CAS(Anchor,first,next) return first 1 2 5 1 Thread i reads A from Anchor 2 Thread i reads B from *A 3 Thread j pops A and B 4 Thread j pushes A back 5 Thread i checks that Anchor is equal to A, sets Anchor to B The List is corrupted Maged Michael Memory Management Issues in Non-Blocking Synchronization

  15. Anchor Anchor 102 100 A B C Classic Solution: ABA Tags Introduced in IBM System 370 documentation in 1983 • Pack a tag with the shared variable. Increment tag upon every pop. • Use double-width primitives Pop do [first,tag] := Anchor next := *first until CASD(Anchor,[first,tag],[next,tag+1]) return first 1 2 5 1 Thread i reads [A,tag] from Anchor 2 Thread i reads B from *A 3 Thread j pops A and B 4 Thread j pushes A back, sets Anchor to [A,tag+2] 5 Thread i finds Anchor != [A,tag] and CAS fails as it should ABA problem prevented Maged Michael Memory Management Issues in Non-Blocking Synchronization

  16. Pros and Cons of ABA Tags Pros • Wait-free • Low time and space overheads Cons • Not portable: Requires wide primitives when packed with a full word. • A theoretical chance of exact wraparound if tag size is exceeded • Complicates/prevents reclamation of dynamic memory Maged Michael Memory Management Issues in Non-Blocking Synchronization

  17. ABA-Immune Primitives • LoadLinked (LL), Validate (VL), StoreConditional (SC) LL(X) : value atomically return X VL(X) : boolean atomically return X not written by others since last LL SC(X,v) : boolean atomically r := VL(X) if (r) X := v return r Pop do first := LL(Anchor) next := *first until SC(Anchor,next) return first • Inherently immune to the ABA problem • Only partially supported on real architectures • ABA solutions are often represented as LL/SC/VL implementations using practical primitives Maged Michael Memory Management Issues in Non-Blocking Synchronization

  18. Anchor Benign ABA Cases AtomicAdd(X,v) • Example: Between the read of X and a successful CAS, the value of X might have changed and returned back to its old value, but the outcome is still correct do old := X until CAS(X,old,old+v) • Another example is Push in a LIFO list Push(block) do first := Anchor.ptr *block := first until CAS(Anchor.ptr,first,block)    • LL/SC/VL are unnecessarily strong as they prevent benign cases Maged Michael Memory Management Issues in Non-Blocking Synchronization

  19. A u B w The Memory Reclamation Problem Example 1 Thread i reads pointer value A from P 1 ptr := P ret := (*ptr == u) if ret newb := new Block(v) ret := CAS(P,ptr,newb) delete ret ? ptr : newb 3 2 Thread j sets P to B and frees A to OS 3 Thread i accesses free memory ACCESS VIOLATION P returned to OS Maged Michael Memory Management Issues in Non-Blocking Synchronization

  20. The Memory Reclamation Problem • A thread i reads a pointer to a dynamic memory location • Another thread j removes the block and frees it • Thread i dereferences the pointer to access the freed block • Thread i might read/write unmapped memory  access violation • Thread i might read unrelated data from the recycled block  return incorrect result • Thread i might write into the recycled node corrupt some shared structure • How to be able to reclaim dynamic memory blocks removed from non-blocking structures and guarantee that no thread will access the contents of free blocks? Maged Michael Memory Management Issues in Non-Blocking Synchronization

  21. Memory Reclamation and the ABA Problem Two different but related problems • Memory reclamation is all about dynamic memory No dynamic memory use No memory reclamation problem • The ABA problem can occur even when no dynamic memory is used at all • E.g., array-based structures No dynamic memory use No ABA problem • Solving the memory reclamation problem often prevents some but not all cases of the ABA problem • Complete ABA solutions can be constructed by using memory reclamation solutions Maged Michael Memory Management Issues in Non-Blocking Synchronization

  22. Pop correct under GC inserted removed do first := Anchor next := *first until CAS(Anchor,first,next) return first allocated reclaimed P How does GC help? • Completely solves the memory reclamation problem • Prevents the ABA problem if • The ABA problem only involves pointers to dynamic blocks • The contents of a dynamic block are never changed while it is globally reachable • Once a dynamic block is removed, it is not reinserted (in the same structure) before going through GC • Other ABA cases can use an extra level of indirection to be preventable by GC may be reinserted always reclaimed before reuse Maged Michael Memory Management Issues in Non-Blocking Synchronization

  23. Memory Reclamation Approaches • Epoch-based • Reference counting • Hazard pointers Maged Michael Memory Management Issues in Non-Blocking Synchronization

  24. Epoch-Based Solutions • E.g., RCU (read-copy-update) heavily-used in the Linux kernel • Depend on the notion of quiescence points, where a thread is guaranteed not to hold references to removable memory blocks • Typically use per-thread timestamp • A removed block is removed only after each thread (that could have had access to it) has gone through at least one quiescence point after the block was removed Pros: • Fast reading (no time overhead per dereference) • No reader interference. No writer starvation by readers. Cons: • In user level, either blocking or can result in an unbounded number of not-yet-reclaimed removed blocks Maged Michael Memory Management Issues in Non-Blocking Synchronization

  25. Per-Block Reference Counting • Threads increment or decrement a per-block reference counter whenever they create or destroy references to the block Pros: • O(n) bound on not-yet-reclaimed removed blocks • Lock-free Cons: • Reader-reader contention • Writer starvation by readers possible • To reclaim blocks for arbitrary reuse, requires either • DCAS (CAS on two locations), or • Extra level of indirection and extra space per pointer Maged Michael Memory Management Issues in Non-Blocking Synchronization

  26. As long as *myHP remains equal to first safe access: first will not be freed no ABA: first will not be inserted Hazard Pointers • A hazard pointer is single-writer multi-reader pointer • Each hazard pointer has one owner (that can write to it) • By setting a hazard pointer to the address of a dynamic block, the owner thread is telling other threads: “if any of you remove this block after the last time I set this hazard pointer to this block don’t reclaim this block until I change my hazard pointer” Pop do do first := Anchor *myHP := first until Anchor == first next := *first until CAS(Anchor,first,next) *myHP := null return first Maged Michael Memory Management Issues in Non-Blocking Synchronization

  27. Reclaiming Blocks under Hazard Pointers • After accumulating a number of removed nodes 1. Read active hazard pointers. Keep private copy of non-null values • Private copy can be arranged in an efficient search structure e.g., hast table with constant expected lookup time 2. For each removed block, do a lookup in the private structure • Found? Keep block for next scan of hazard pointers • Not found? It is safe to reclaim the Maged Michael Memory Management Issues in Non-Blocking Synchronization

  28. Hazard Pointers Pros: • Wait-free • No atomic instructions needed • Even reads and writes to hazard pointers can be nonatomic • Constant expected time per reclaimed block • No reader interference, and no writer starvation Cons: • Worst case O(m.n) not-yet-reclaimed removed blocks m is number of active removing threads (readers of hazards pointers)n is max. num. of active traversing threads (writers of hazard pointers) • O(m) bound possible, but at the cost of O(n) time per reclaimed block Maged Michael Memory Management Issues in Non-Blocking Synchronization

  29. zzzzz The Persistent Pointers Problem • Some non-blocking algorithms require some pointers in removed blocks to retain their values (as long as there are direct or indirect references to the blocks) • This is done for simplicity • But it can lead to unbounded memory use Example: • Simple linked list traversal • But pointers in removed blocks cannot be nullified • Unbounded memory Maged Michael Memory Management Issues in Non-Blocking Synchronization

  30. zzzzz Avoiding Persistent Pointers • Just don’t use persistent pointers in algorithms • Algorithms should be designed such that pointers in removed blocks are immediately nullifiable • But traversal becomes a bit more complicated • Double-check that previous node still points to the current one before moving on to the next Maged Michael Memory Management Issues in Non-Blocking Synchronization

  31. Persistent Pointers and Memory Reclamation • Algorithms with persistent pointers are limited in the memory reclamation solutions/approaches that they can use • The restricted reuse (no reclamation) approach? NO • No, because the approach implies the possibility of immediate reuse of removed blocks. • Hazard pointers? NO in general • No, because hazard pointers allow the reclamation of blocks that are indirectly reachable from private references • GC/reference counting/epoch-based solutions? YES • Yes, because these methods do not reclaim blocks that are indirectly reachable from a private reference. • But this same feature can lead to unbounded memory use with persistent pointers Maged Michael Memory Management Issues in Non-Blocking Synchronization

  32. Dynamic Memory Allocation and Deallocation Non-blocking algorithms that use dynamic memory need a non-blocking allocator to manage the reuse of reclaimed blocks The key challenge in building a non-blocking allocator is the capability to coalesce free blocks for arbitrary reuse or to be returned to the OS Maged Michael Memory Management Issues in Non-Blocking Synchronization

  33. state superblock descriptor High-Level Design of A Non-Blocking Allocator • Use coalescing units (superblocks) rather than arbitrary coalescing • Keep track of each superblock’s state to detect when its blocks become fully free. • Use a separate descriptor to avoid memory reclamation problems. • Manage the free blocks in a superblock as a linked list • Manage both the free blocks list and the superblock state together atomically superblock Maged Michael Memory Management Issues in Non-Blocking Synchronization

  34. Allocation • First try a fast path of allocation from the active superblock (of the appropriate heap) • If there is no active superblock, then try to find a partially allocated superblock to make it active • If not, then allocate a new superblock of an appropriate size, divide it, and make it the active superblock after taking a block Maged Michael Memory Management Issues in Non-Blocking Synchronization

  35. 5 ACTIVE 5 new block Malloc (common case) Identify heap based on requested block size and thread id 1. Read header heap header ptr 6 2. Read descriptor packed state 3. Recheck header descriptor superblock 6 ACTIVE 4. Read next pointer of first block head head 5. CAS changes to packed state count 0 allocated 1 state 2 allocated 3 ABA tag 4 5 6 7 Done Active superblock Maged Michael Memory Management Issues in Non-Blocking Synchronization

  36. Deallocation • Push the freed block back into its superblock • If the superblock was fully allocated, now it becomes partially allocated and needs to be added to the set of partially allocated superblocks • If the superblock was partially allocated and now fully free, then remove it from the set of partially allocated superblocks and coalesce it Maged Michael Memory Management Issues in Non-Blocking Synchronization

  37. 6 ACTIVE 5 free Free (common case) heap header The block header points to the descriptor of the original superblock descriptor descriptor superblock 5 ACTIVE head unreserved count 0 allocated 1 1. Read descriptor packed state state 2 allocated 3 2. Set next pointer of freed block 4 3. CAS changes to packed state 5 to be freed 6 7 Done Active superblock Maged Michael Memory Management Issues in Non-Blocking Synchronization

  38. Superblock Lifecycle Activeany count not Active count = 0 Taking the last block ACTIVE BUSY New superblock Freeing the first block No Active superblock Unmap or reuse arbitrarily FREE PARTIAL Freeing the last block not Activecount = total not Active 0 < count < total Maged Michael Memory Management Issues in Non-Blocking Synchronization

  39. Dealing with Memory Managementin Non-Blocking Algorithms • First, abstract away memory management problems to focus on the core algorithm • But, avoid abstractions that limit the memory management solutions or hide problems • Memory reclamation: Assume perfect GC but with explicit deallocation • ABA: Think in terms of ABA just not happening rather than LL/SC/VL • After designing the core algorithm under these assumptions, the options for dealing with memory management remains open and it is easier to weigh the trade-offs among the solutions • Consider ABA and memory reclamation solutions together Maged Michael Memory Management Issues in Non-Blocking Synchronization

  40. Non-Blocking GC and Its Challenges • Can we build a pure user-level non-blocking GC without special scheduler support? • Yes. One can use memory reclamation methods as a foundation. But it will be slow • The biggest challenge for a non-blocking GC is for the collector to find out the private references of the mutators at any arbitrary point, and to do so efficiently • Non-blocking memory reclamation methods add per-reference overheads • Adding these overheads to basically every load and store that may create or destroy a private reference may be prohibitively high Maged Michael Memory Management Issues in Non-Blocking Synchronization

  41. Concluding Remarks • Non-blocking synchronization is intertwined with memory management • Memory management solves problems and creates problems in the design of non-blocking algorithms • There were many advances in non-blocking memory management in this decade but there is space for more • The memory reclamation and ABA problems occur under blocking optimistic concurrency Maged Michael Memory Management Issues in Non-Blocking Synchronization

  42. THANK YOU Maged Michael Memory Management Issues in Non-Blocking Synchronization

More Related