120 likes | 216 Views
Who’s Afraid of a Big Bad Lock. Nir Shavit Sun Labs at Oracle Joint work with Danny Hendler , Itai Incze , and Moran Tzafrir. Fine grained parallelism has huge performance benefit. The reason we get only 2.9 speedup. c. c. c. c. c. c. c. c. c. c. c. c. c. c. c.
E N D
Who’s Afraid of a Big Bad Lock NirShavit Sun Labs at Oracle Joint work with Danny Hendler, ItaiIncze, and Moran Tzafrir
Fine grained parallelism has huge performance benefit The reason we get only 2.9 speedup c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c Amdahl and Shared Data Structures Fine Grained Coarse Grained 25% Shared 25% Shared 75% Unshared 75% Unshared
But… • Can we always draw the right conclusions from Amdah’s law? • Claim: sometimes the overhead of using fine-grained synchronization is so high…that it is better to have a single thread do all the work sequentially in order to avoid it
Flat Combining • Have single lock holder collect and perform requests of all others • Without using CAS operations to coordinate requests • With combining of requests (if cost of k batched operations is less than that of k operations in sequence we win)
Flat-Combining Most requests do not involve a CAS, in fact, not even a memory barrier object lock counter Collect requests Again try to collect requests 54 Head object CAS() Publication list Deq() 53 Deq() 54 12 54 null null Enq(d) Enq(d) Deq() Deq() Enq(d) Apply requests to object
a b c d Fine-Grained FIFO Queue JDK6.0 (on > 10 million desktops) lock-free Alg by Michael and Scott Head Tail CAS() CAS() CAS() Q: Enqueue(d) P: Dequeue() => a
d c b a Flat-Combining FIFO Queue OK, but can do better…combining: collect all items into a “fat node”, enqueue in one step Head Tail object lock counter 54 Head CAS() Publication list 54 Enq(b) 12 54 null Enq(b) Deq() Enq(a) Enq(b) Sequential FIFO Queue
c c Flat-Combining FIFO Queue OK, but can do better…combining: collect all items into a “fat node”, enqueue in one step “Fat Node” easy sequentially but cannot be done in concurrent alg without CAS object lock counter 54 Head CAS() Publication list Head Tail Deq() 54 Enq(b) 12 54 Enq(b) Deq() Enq(a) Enq(b) e a b b Sequential “Fat Node” FIFO Queue
better Linearizable FIFO Queue Flat Combining Combining tree MS queue, Oyama, and Log-Synch
better log Benefit’s of Flat Combining Flat Combining in Red
better log log log Why? Parallel Flat Combining in Blue
Summary • FC is provides superior linearizable implementations of quite a few structures • Parallel FC, when applicable, allows FC + scalability • Good fit with heterogeneous architectures • But FC and Parallel FC not always applicable, for example, for search trees (we tried )