1 / 12

Who’s Afraid of a Big Bad Lock

Who’s Afraid of a Big Bad Lock. Nir Shavit Sun Labs at Oracle Joint work with Danny Hendler , Itai Incze , and Moran Tzafrir. Fine grained parallelism has huge performance benefit. The reason we get only 2.9 speedup. c. c. c. c. c. c. c. c. c. c. c. c. c. c. c.

paco
Download Presentation

Who’s Afraid of a Big Bad Lock

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Who’s Afraid of a Big Bad Lock NirShavit Sun Labs at Oracle Joint work with Danny Hendler, ItaiIncze, and Moran Tzafrir

  2. Fine grained parallelism has huge performance benefit The reason we get only 2.9 speedup c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c Amdahl and Shared Data Structures Fine Grained Coarse Grained 25% Shared 25% Shared 75% Unshared 75% Unshared

  3. But… • Can we always draw the right conclusions from Amdah’s law? • Claim: sometimes the overhead of using fine-grained synchronization is so high…that it is better to have a single thread do all the work sequentially in order to avoid it

  4. Flat Combining • Have single lock holder collect and perform requests of all others • Without using CAS operations to coordinate requests • With combining of requests (if cost of k batched operations is less than that of k operations in sequence  we win)

  5. Flat-Combining Most requests do not involve a CAS, in fact, not even a memory barrier object lock counter Collect requests Again try to collect requests 54 Head object CAS() Publication list Deq() 53 Deq() 54 12 54 null null Enq(d) Enq(d) Deq() Deq() Enq(d) Apply requests to object

  6. a b c d Fine-Grained FIFO Queue JDK6.0 (on > 10 million desktops) lock-free Alg by Michael and Scott Head Tail CAS() CAS() CAS() Q: Enqueue(d) P: Dequeue() => a

  7. d c b a Flat-Combining FIFO Queue OK, but can do better…combining: collect all items into a “fat node”, enqueue in one step Head Tail object lock counter 54 Head CAS() Publication list 54 Enq(b) 12 54 null Enq(b) Deq() Enq(a) Enq(b) Sequential FIFO Queue

  8. c c Flat-Combining FIFO Queue OK, but can do better…combining: collect all items into a “fat node”, enqueue in one step “Fat Node” easy sequentially but cannot be done in concurrent alg without CAS object lock counter 54 Head CAS() Publication list Head Tail Deq() 54 Enq(b) 12 54 Enq(b) Deq() Enq(a) Enq(b) e a b b Sequential “Fat Node” FIFO Queue

  9. better Linearizable FIFO Queue Flat Combining Combining tree MS queue, Oyama, and Log-Synch

  10. better log Benefit’s of Flat Combining Flat Combining in Red

  11. better log log log Why? Parallel Flat Combining in Blue

  12. Summary • FC is provides superior linearizable implementations of quite a few structures • Parallel FC, when applicable, allows FC + scalability • Good fit with heterogeneous architectures • But FC and Parallel FC not always applicable, for example, for search trees (we tried )

More Related