Transactional Memory: Architectural Support for Lock-Free Data Structures

Transactional Memory: Architectural Support for Lock-Free Data Structures Herlihy & Moss Presented by Robert T. Bauer

Problem • Software implementations of lock-free (not using locks) data structures do not perform as well as locking-based implementations. • Qualifications: • Lock based implementations can suffer from: • Priority Inversion; • Convoying; • Deadlock; and • Contention & Synchronization (Memory Barrier) • In the absence of these, lock based implementations can out perform lock-free approaches.

“Solution” • If software is the problem, perhaps the solution is hardware. • In this case the solution tendered is transactional memory • Modify cache-coherence protocol • Provide new instructions • Goal: • Make lock-free approaches as efficient and easy to use as conventional lock-based approaches

Results • Demonstrate that transactional memory can be more efficient than: • Test and Test and Set (TTS) • MCS (Software Queueing – instead of spinning wait on a queue) • LL/SC • Hardware Queueing – uses cache-lines to maintain the “list” • Important: The reported results were obtained from a simulator

About the simulator • 32 processors • Regular cache: direct-mapped with 2048 eight byte lines • Transaction cache 64 eight byte lines • Simulator based on Proteus – doesn’t capture effects of I or D caches. • Simulation: • Cache (regular or transaction) access = 1 cycle • Single cycle commit (is this realistic???)

Cache • Memory Bus Cycles • Read – (cache line access: shared) • Read For Ownership (RFO) – private read – (cache line access: exclusive) • Write – (cache line access: exclusive) • T_Read • T_RFO • rfo is usually issued by a compiler. Read a cache line and gain ownership over the line in anticipation of a subsequent write. • Busy • Abort and retry

Transaction Operations: General • Transaction operations cache two entries • XCommit (discard on commit) [old value] • XAbort (discard on abort) [new value] • Transaction Commits • XCommit  Empty (contains no data) • XAbort  Normal (contains committed data) • Transaction Aborts • XCommit  Normal • XAbort  Empty • New Entry • Search for Empty entry • Search for Normal entry  If dirty, needs to be “evicted” • Search for XCommit (error in paper, this can never be “dirty”, but might be invalid)

Transaction Operations: LT • LT operation • Exists XAbort in Trans. Cache  return value • Exists Normal in Trans. Cache  • Change Normal to XAbort • Allocate second entry with same data tag XCommit • Otherwise issue a T_Read cycle • Create Trans. Cache entry tagged XCommit • Create Trans. Cache entry tragged XAbort • If read returns busy (cache line is being updated) • Drop all XAbort, set all XCommit  Normal, TStatus = False

Transaction Operations: ST • ST Operation • Cache hit • XAbort entry is updated • Cache miss • Set up two cache lines as before • XCommit • XAbort • Use T_RFO, set cache line state to reserved “Exclusive” (so T_READ, T_RFO from other processors will return “BUSY”) • As before, if read cycle (T_RFO) returns “Busy” we abort the transaction

Transaction Operations: LTX • LTX Operation • Use T_RFO on cache miss

Transaction Operations: Validate • Validate • TStatus (false means trans has been aborted)

An Example (Counting) Read (exclusive access) Write Commit In multiprocessor environment, it is possible for all writes to be lost except for one. If each of M processors add “N” to counter (initially 0), the final value of counter is in the range: N ≤ counter ≤ M*N

Performance (Counting) Locking: read lock, write lock, read counter, write counter, write lock == 5 mem ref Trans. Mem Trans. Mem LL/SC (single word mem) No commit (cache write)

Another Example (Double Linked List) Read (exclusive); Plan to write If no other processor has modifed anything In the transaction set (read ‘u’ write) Commit fails if another processor/transaction modified anything in the transaction set

Performance (Double Linked List) LL/SC MCS Trans Mem

Observations • Many simplifications • Small data sets • Single cycle updates • S.C. Memory (no barriers) • Write back cache • More complex cache control logic • Can only snoop on a write, but in transaction system write-first won’t work; so need to “propagate” ownership.

Transactional Memory: Architectural Support for Lock-Free Data Structures