1 / 29

A Methodology for Creating Fast Wait-Free Data Structures

Alex Kogan and Erez Petrank Computer Science Technion , Israe l. A Methodology for Creating Fast Wait-Free Data Structures. Concurrency & (Non-blocking) synchronization. Concurrent data-structures require (fast and scalable) synchronization Non-blocking synchronization:

marinel
Download Presentation

A Methodology for Creating Fast Wait-Free Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Alex Kogan and ErezPetrank Computer Science Technion, Israel A Methodology for Creating Fast Wait-Free Data Structures

  2. Concurrency & (Non-blocking) synchronization • Concurrent data-structures require (fast and scalable) synchronization Non-blocking synchronization: • No thread is blocked in waiting for another thread to complete • no locks / critical sections

  3. Lock-free (LF) algorithms Among all threads trying to apply operations on the data structure, one will succeed • Opportunistic approach • read some part of the data structure • make an attempt to apply an operation • when failed, retry • Many scalable and efficient algorithms • Global progress • All but one threads may starve

  4. Wait-free (WF) algorithms • A thread completes its operation a bounded #steps • regardless of what other threads are doing • Particularly important property in several domains • e.g., real-time systems and operating systems • Commonly regarded as too inefficient and complicated to design

  5. The overhead of wait-freedom • Much of the overhead is because of helping • key mechanism employed by most WF algorithms • controls the way threads help each other with their operations Can we eliminate the overhead? • The goal: average-case efficiency of lock-freedom and worst-case bound of wait-freedom

  6. Why is helping slow? • A thread helps others immediately when it starts its operation • All threads help others in exactly the same order  contention  redundant work • Each operation has to be applied exactly once • usually results in a higher # expensive atomic operations

  7. Reducing the overhead of helping Main observation: • “Bad” cases happen, but are very rare • Typically a thread can complete without any help • if only it had a chance to do that … Main ideas: • Ask for help only when you really need it • i.e., after trying several times to apply the operation • Help others only after giving them a chance to proceed on their own • delayed helping

  8. Fast-path-slow-path methodology • Start operation by running its (customized) lock-free implementation • Upon several failures, switch into a (customized) wait-free implementation • notify others that you need help • keep trying • Once in a while, threads on the fast path check if their help is needed and provide help Fast path Slow path Delayed helping

  9. Fast-path-slow-path generic scheme Do I need to help ? yes Start Help Someone Apply my opusing fast path(at most N times) no Success? Apply my op using slow path (until success) no Different threads may run on two paths concurrently! yes Return

  10. Fast-path-slow-path: queue example Fast path (MS-queue) Slow path (KP-queue)

  11. Fast-path-slow-path: queue exampleInternal structures 1 2 0 Thread ID state 4 9 9 phase pending true true false false enqueue false true null node null null

  12. Fast-path-slow-path: queue exampleInternal structures Counts # ops on the slow path 1 2 0 Thread ID state 9 4 phase 9 pending false true true true false false enqueue node null null null

  13. Fast-path-slow-path: queue exampleInternal structures Is there a pending operation on the slow path? 1 2 0 Thread ID state 9 4 phase 9 pending false true true true false false enqueue node null null null

  14. Fast-path-slow-path: queue exampleInternal structures 1 2 0 Thread ID What is the pending operation? state 9 4 phase 9 pending false true true true false false enqueue node null null null

  15. Fast-path-slow-path: queue exampleInternal structures Thread ID 0 1 2 helpRecords 0 1 0 curTid 5 4 9 lastPhase nextCheck 8 0 3

  16. Fast-path-slow-path: queue exampleInternal structures ID of the next thread that I will try to help Thread ID 0 1 2 helpRecords 1 curTid 0 0 4 5 lastPhase 9 0 3 8 nextCheck

  17. Fast-path-slow-path: queue exampleInternal structures Phase # of that thread at the time the record was created Thread ID 0 1 2 helpRecords 1 curTid 0 0 4 5 lastPhase 9 0 3 8 nextCheck

  18. Fast-path-slow-path: queue exampleInternal structures HELPING_DELAY controls the frequency of helping checks Thread ID 0 1 2 helpRecords Decrements with every my operation. Check if my help is needed when this counter reaches 0 1 0 0 curTid 4 5 lastPhase 9 nextCheck 0 8 3

  19. Fast-path-slow-path: queue exampleFast path 1. help_if_needed() 2. int trials = 0 while (trials++ < MAX_FAILURES) { apply_op_with_customized_LF_alg (finish if succeeded) } 3. switch to slow path • LF algorithm customization is required to synchronize operations run on two paths MAX_FAILURES controls the number of trials on the fast path

  20. Fast-path-slow-path: queue exampleSlow path 1. my phase ++ 2. announce my operation (in state) 3. apply_op_with_customized_WF_alg (until finished) • WF algorithm customization is required to synchronize operations run on two paths

  21. Performance evaluation • 32-core Ubuntu server with OpenJDK 1.6 • 8 2.3 GHz quadcore AMD 8356 processors • The queue is initially empty • Each thread iteratively performs (100k times): • Enqueue-Dequeue benchmark: enqueueand then dequeue • Measure completion time as a function of # threads

  22. Performance evaluation

  23. Performance evaluation MAX_FAILURES HELPING_DELAY

  24. Performance evaluation

  25. The impact of configuration parameters MAX_FAILURES HELPING_DELAY

  26. The use of the slow path HELPING_DELAY MAX_FAILURES

  27. Tuning performance parameters • Why not just always use large values for both parameters (MAX_FAILURES, HELPING_DELAY)? • (almost) always eliminate slow path • Lemma: The number of steps required for a thread to complete an operation on the queue in the worst-case is O(MAX_FAILURES + HELPING_DELAY * n2) • Tradeoff between average-case performance and worst-case completion time bound

  28. Summary • A novel methodology for creating fast wait-free data structures • key ideas: two execution paths + delayed helping • good performance when the fast path is extensively utilized • concurrent operations can proceed on both paths in parallel • Can be used in other scenarios • e.g., running real-time and non-real-time threads side-by-side

  29. Thank you! Questions?

More Related