180 likes | 305 Views
Pessimistic Software Lock-Elision. Nir Shavit (Joint work with Yehuda Afek Alexander Matveev ). Read-Write Locks. One of the most prevalent lock forms in concurrent applications 80/20 rule applies to reading vs writing of data
E N D
Pessimistic Software Lock-Elision Nir Shavit (Joint work with Yehuda Afek Alexander Matveev)
Read-Write Locks • One of the most prevalent lock forms in concurrent applications • 80/20 rule applies to reading vs writing of data • Mutex between write calls and between writes and read-only calls • Allow read-only calls to proceed in parallel with one another
Speculative Lock Elision (SLE) Thread 1 Thread 2 • Rajwar and Goodman: speculative execution of locks by optimistic hardware transactions (Haswell) • Roy, Hand, and Harris: software implementation of SLE, transactions executed speculatively in software. Start Acquire Start Acquire Speculate: try to execute the critical sections concurrently using transactions Lock Elided Lock Elided On failure: revert back to the lock Start Release Start Release
SLE: Good and Bad • Advantages: Concurrency among writes and among reads and writes -- as long as they do not share/contend for memory • Disadvantages: • Contention implies defaulting to lock • Reads delayed by writes • System calls and I/O cannot be used • will cause trans to fail • Debugging hard due to the speculative non-deterministic behavior Speculative execution breaks the lock semantics – you need to rewrite the code
Pessimistic Lock Elision (PLE) • Non-speculatively replace read-write locks • By pessimistic software transactions • In a way that: • Preserves the lock semantics • No code rewriting • Allows I/O in transactions • Allows read-write concurrency always! • Disadvantage: • Does not allow concurrency among writes • How important is this for RW-locked code?
Pessimistic STM [MatveevShavit2011] • A commit-time privatizing STM in which all transactions execute once and never abort • And read-only transactions run in parallel with themselves and writes • To create PLE, we designed a new encounter-order version of this pessimistic STM that wait-free read-only trans
Encounter Order Pessimistic STM • Quiescence mechanism [MatveevShavit2010] to tell when reads terminate • Write transactions execute sequentially (commits are serialized) by “passing a baton” • Writes maintain a public undo log • Wait-free reads collect a snapshot of the memory using undo log
Pessimistic Read-Write Interaction • Write transactions must not write to locations being read by overlapping reads • Solution: • On a write, the old value is logged publically before writing the new value • In read phase, logged values of concurrent writes are read • In the commit-phase, the old values are discarded after it is ensured using the quiesencemechanism that no-one reads them
Why does this work well? • No need for CAS or even memory barriers in common case • Even though logging is public, its only by one transaction at a time so very easy to implement
Applying Pessimistic Lock-Elision Point 1 The semantics are not changed with PLE addition Program with RW-Locks input STM Compiler (Intel STM Compiler with PLE Transactions) Point 2 Concurrency between read and write critical sections output Program with PLE Point 3 HLE has limitations, but HLE + PLE does not have execute execute Processor with HLE (Intel’s Haswell) (HLE code is executed with software fallback to PLE) Point 4 PLE works on current processors Standard Processor (PLE code is executed)
HYPERTHREADS NUMA NORMAL Performance • We empirically evaluated our algorithm on an Intel 40-way machine with 2 Xeon E7-4870 chips in a NUMA setup. • PLE:Our fully pessimistic encounter-time STM • RW_Lock_Egress:An ingress-egress counter based reader-writer mutex implementation for Intel platform. • MCS-Lock: Michael and Scott's MCS Lock • RW_Lock_SPAA: The new RWLock proposal from SPAA 2012
Three Ways to Elide Locks • Software-only lock elision • If you don’t have hardware support • A fall back (slow path) for the hardware HLE • Intel’s SLE • A fall back using HTM • Intel’s RTM
If Your Machine Doesn’t Have Hardware Support • Automatically replace at compile time all read-write locked code with PLE STM code • As easy as STM in new C++ compiler • This will improve on your RW-locks because it will allow read-only calls to proceed in parallel with writes • Write calls are sequential, but they were sequential anyhow…
If Your Machine Has SLE • There is an XTEST instruction which returns true if the thread is currently executing in SLE • Execute XTEST after the XACQUIRE instruction (the HLE transaction start instruction) • At compile time create a duplicate PLE code path. If the XTEST fails, then the duplicate PLE path is executed
If Your Machine Has RTM • Two copies: one copy is PLE path, the other is RTM code path: • RTM Hardware fall-back routine is PLE code path start • After the XBEGINadd a read (load) instruction of is_abortvariable • PLE code path first executes small RTM transaction that updates is_abort • Causing all concurrently executing RTM transactions will fail
Lock-Elision Theory • We are going to see a lot of use of lock elision in industry… • So, what are the inherent costs of lock-elision using STMs? • What are the inherent costs of pessimistic STM implementations? • Can we quantify the interaction between hardware and software transactions (or with locks)