1 / 51

Wait-Free Queues with Multiple Enqueuers and Dequeuers

Wait-Free Queues with Multiple Enqueuers and Dequeuers. Alex Kogan Erez Petrank Computer Science, Technion , Israel. FIFO queues. One of the most fundamental and common data structures. enqueue. dequeue. 5. 3. 2. 9. Concurrent FIFO queues.

totie
Download Presentation

Wait-Free Queues with Multiple Enqueuers and Dequeuers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Wait-Free Queues with Multiple Enqueuers and Dequeuers Alex KoganErezPetrank Computer Science, Technion, Israel

  2. FIFO queues • One of the most fundamental and common data structures enqueue dequeue 5 3 2 9

  3. Concurrent FIFO queues • Concurrent implementation supports “correct” concurrent adding and removing elements • correct = linearizable • The access to the shared memory should be synchronized enqueue dequeue 3 2 9 dequeue dequeue empty! dequeue

  4. Non-blocking synchronization • No thread is blocked in waiting for another thread to complete • e.g., no locks / critical sections • Progress guarantees: • Obstruction-freedom • progress is guaranteed only in the eventual absence of interference • Lock-freedom • among all threads trying to apply an operation, one will succeed • Wait-freedom • a thread completes its operation in a bounded number of steps

  5. Lock-freedom • Among all threads trying to apply an operation, one will succeed • opportunistic approach • make attempts until succeeding • global progress • all but one threads may starve • Many efficient and scalable lock-free queue implementations

  6. Wait-freedom • A thread completes its operation in a bounded number of steps • regardless of what other threads are doing • A highly desired property of any concurrent data structure • but, commonly regarded as inefficient and too costly to achieve • Particularly important in several domains • real-time systems • operating under SLA • heterogeneous environments

  7. Related work: existing wait-free queues • Limited concurrency • one enqueuer and one dequeuer • multiple enqueuers, one concurrent dequeuer • multiple dequeuers, one concurrent enqueuer • Universal constructions • generic method to transform any (sequential) object into lock-free/wait-free concurrent object • expensive impractical implementations • (Almost) no experimental results [Lamport’83] [David’04] [Jayanti&Petrovic’05] [Herlihy’91]

  8. Related work: lock-free queue [Michael & Scott’96] • One of the most scalable and efficient lock-free implementations • Widely adopted by industry • part of Java Concurrency package • Relatively simple and intuitive implementation • Based on singly-linked list of nodes 12 4 17 head tail

  9. MS-queue brief review: enqueue CAS 12 17 9 4 CAS head tail enqueue 9

  10. MS-queue brief review: enqueue CAS 12 17 9 5 4 CAS head tail CAS enqueue enqueue 9 5

  11. MS-queue brief review: dequeue 12 17 9 4 12 CAS head tail dequeue

  12. Our idea (in a nutshell) • Based on the lock-free queue by Michael & Scott • Helping mechanism • each operation is applied in a bounded time • “Wait-free” implementation scheme • each operation is applied exactly once

  13. Helping mechanism • Each operation is assigned a dynamic age-based priority • inspired by the Doorway mechanism used in Bakery mutex Each thread accessing a queue • chooses a monotonically increasing phase number • writes down its phase and operation info in a special state array • helps all threads with a non-larger phase to apply their operations phase: long pending: boolean state entry per thread enqueue: boolean node: Node

  14. Helping mechanism in action 4 phase 9 3 9 false true true pending false true enqueue true true true null ref ref ref node

  15. Helping mechanism in action 4 phase 9 10 9 I need to help! true true false pending true enqueue true true true true null ref ref ref node

  16. Helping mechanism in action 4 phase 10 9 9 I do not need to help! true pending false true true true true true true enqueue ref null ref node ref

  17. Helping mechanism in action 4 phase 10 11 9 I need to help! I do not need to help! false true pending true true true true true enqueue false ref null ref null node

  18. Helping mechanism in action • The number of operations that may linearize before any given operation is bounded • hence, wait-freedom phase 10 9 11 4 pending true true false true true true enqueue false true ref null null ref node

  19. Optimized helping • The basic scheme has two drawbacks: • the number of steps executed by each thread on every operation depends on n (the number of threads) • even when there is no contention • creates scenarios where many threads help same operations • e.g., when many threads access the queue concurrently • large redundant work • Optimization: help one thread at a time, in a cyclic manner • faster threads help slower peers in parallel • reduces the amount of redundant work

  20. How to choose the phase numbers • Every time tichooses a phase number, it is greater than the number of any thread that made its choice before ti • defines a logical order on operations and provides wait-freedom • Like in Bakery mutex: • scan through state • calculate the maximal phase value + 1 • requires O(n) steps • Alternative: use an atomic counter • requires O(1) steps 4 3 5 true false true true true true ref null ref 6!

  21. “Wait-free” design scheme • Break each operation into three atomic steps • can be executed by different threads • cannot be interleaved • Initial change of the internal structure • concurrent operations realize that there is an operation-in-progress • Updating the state of the operation-in-progress as being performed (linearized) • Fixing the internal structure • finalizing the operation-in-progress

  22. Internal structures 1 2 4 head tail 9 phase 4 9 false false false pending true false enqueue true null node null null state

  23. Internal structures these elements were enqueued by Thread 0 this element was enqueued by Thread 1 enqTid: int 2 4 1 holds ID of the thread that performs / has performed the insertion of the node into the queue 0 1 0 -1 1 -1 head tail 9 phase 9 4 false pending false false true true false enqueue node null null null state

  24. Internal structures this element was dequeued by Thread 1 deqTid: int 1 4 2 holds ID of the thread that performs / has performed the removal of the node into the queue 0 1 0 1 -1 -1 head tail 9 4 phase 9 false false pending false true false true enqueue null null null node state

  25. enqueue operation Creating a new node 12 6 4 17 0 1 2 0 -1 -1 -1 -1 head tail phase 4 9 9 false false pending false enqueue true true false enqueue node null null null 6 state ID: 2

  26. enqueue operation Announcing a new operation 6 17 4 12 2 0 0 1 -1 -1 -1 -1 head tail 4 9 10 phase pending true false false enqueue false true enqueue true null null node 6 state ID: 2

  27. enqueue operation Step 1: Initial change of the internal structure CAS 17 4 12 6 0 0 1 2 -1 -1 -1 -1 head tail 4 10 9 phase true false false pending enqueue true true false enqueue node null null 6 state ID: 2

  28. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 6 17 4 12 0 2 0 1 -1 -1 -1 -1 head tail CAS 4 9 phase 10 pending false false false enqueue enqueue true true false null null node 6 state ID: 2

  29. enqueue operation Step 3: Fixing the internal structure 6 17 4 12 1 0 2 0 -1 -1 -1 -1 CAS head tail phase 4 9 10 pending false false false enqueue false true true enqueue null null node 6 state ID: 2

  30. enqueue operation Step 1: Initial change of the internal structure 6 17 4 12 2 0 0 1 -1 -1 -1 -1 head tail 10 4 phase 9 false pending false true enqueue enqueue true enqueue true false null null node 3 6 state ID: 2 ID: 0

  31. enqueue operation Creating a new node Announcing a new operation 3 6 17 4 12 0 1 0 2 0 -1 -1 -1 -1 -1 head tail phase 10 11 4 true false true pending enqueue enqueue true true enqueue true node null 3 6 state ID: 0 ID: 2

  32. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 3 17 4 12 6 0 0 0 1 2 -1 -1 -1 -1 -1 head tail phase 10 11 4 true false true pending enqueue enqueue true enqueue true true null node 3 6 state ID: 0 ID: 2

  33. enqueue operation Step 2: Updating the state of the operation-in-progress as being performed 3 12 4 17 6 0 1 0 0 2 -1 -1 -1 -1 -1 head tail CAS 4 10 11 phase false true pending false enqueue enqueue true enqueue true true node null 3 6 state ID: 0 ID: 2

  34. enqueue operation Step 3: Fixing the internal structure 3 12 17 4 6 0 0 0 1 2 -1 -1 -1 -1 -1 CAS head tail phase 11 10 4 pending false false true enqueue enqueue true enqueue true true null node 3 6 state ID: 0 ID: 2

  35. enqueue operation Step 1: Initial change of the internal structure CAS 12 17 3 4 6 0 0 1 0 2 -1 -1 -1 -1 -1 head tail 11 4 phase 10 pending false false true enqueue enqueue true enqueue true true node null 3 6 state ID: 0 ID: 2

  36. dequeue operation 17 4 12 0 1 0 -1 -1 -1 head tail 9 4 phase 9 pending false false false dequeue true false enqueue true null node null null state ID: 2

  37. dequeue operation Announcing a new operation 17 4 12 0 1 0 -1 -1 -1 head tail 4 10 phase 9 pending true false false dequeue true false enqueue false null null node null state ID: 2

  38. dequeue operation Updating state to refer the first node 4 17 12 0 1 0 -1 -1 -1 head tail phase 10 9 4 pending true false false dequeue false enqueue true false CAS null null node state ID: 2

  39. dequeue operation Step 1: Initial change of the internal structure 17 4 12 CAS 0 1 0 -1 2 -1 head tail 9 phase 10 4 pending false true false dequeue false true false enqueue null null node state ID: 2

  40. dequeue operation Step 2: Updating the state of the operation-in-progress as being performed 17 4 12 0 1 0 -1 -1 2 head tail CAS 10 4 9 phase false false false pending dequeue true false false enqueue null node null state ID: 2

  41. dequeue operation Step 3: Fixing the internal structure 17 4 12 0 1 0 -1 -1 2 head CAS tail 9 10 phase 4 false false false pending dequeue false enqueue true false null null node state ID: 2

  42. Performance evaluation

  43. Benchmarks • Enqueue-Dequeue benchmark • the queue is initially empty • each thread iteratively performs enqueue and then dequeue • 1,000,000 iterations per thread • 50%-Enqueuebenchmark • the queue is initialized with 1000 elements • each thread decides uniformly and random which operation to perform, with equal odds for enqueue and dequeue • 1,000,000 operations per thread

  44. Tested algorithms Compared implementations: • MS-queue • Base wait-free queue • Optimized wait-free queue • Opt 1: optimized helping (help one thread at a time) • Opt 2: atomic counter-based phase calculation • Measure completion time as a function of # threads

  45. Enqueue-Dequeuebenchmark • TBD: add figures

  46. The impact of optimizations • TBD: add figures

  47. Optimizing further: false sharing • Created on accesses to state array • Resolved by stretching the state with dummy pads • TBD: add figures

  48. Optimizing further: memory management • Every attempt to update state is preceded by an allocation of a new record • these records can be reused when the attempt fails • (more) validation checks can be performed to reduce the number of failed attempts • When an operation is finished, remove the reference from state to a list node • help garbage collector

  49. Implementing the queue without GC • Apply Hazard Pointers technique [Michael’04] • each thread is associated with hazard pointers • single-writer multi-reader registers • used by threads to point on objects they may access later • when an object should be deleted, a thread stores its address in a special stack • once in a while, it scans the stack and recycle objects only if there are no hazard pointers pointing on it • In our case, the technique can be applied with a slight modification in the dequeue method

  50. Summary • First wait-free queue implementation supporting multiple enqueuers and dequeuers • Wait-freedom incurs an inherent trade-off • bounds the completion time of a single operation • has a cost in a “typical” case • The additional cost can be reduced and become tolerable • Proposed design scheme might be applicable for other wait-free data structures

More Related