540 likes | 674 Views
Transactional Memory CDA6159. Outline. Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04). Introduction. Transaction
E N D
Outline • Introduction • Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) • Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)
Introduction • Transaction A sequence of actions that appears indivisible and instantaneous to an outside observer. Four specific attributes: atomicity, consistency, isolation, and durability — collectively known as the ACID properties.
Introduction • Concurrency control Lock? Bad performance, deadlock, etc. lock-free, optimistic cc Herlihy and Moss in 1993 proposed hardware-supported transactional memory as a mechanism for building lock-free data structures.
Basic Transactional Mechanisms • Isolation • Detect when transactions conflict • Track read and write sets • Version management • Record new and old values • Atomicity • Commit new values • Abort back to old values
H/W Transactional Memory Systems • Knight’s Lisp Work • Transactional Memory • Oklahoma Update • SLE/TLR • Transactional Coherence and Consistency • Unbounded TM • Virtual TM • Thread-level TM
Outline • Introduction • Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) • Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)
Lock and Problems • Lock is commonly used with shared data • Priority Inversion • Lower priority process hold a lock needed by a higher priority process • Convoy Effect • When lock holder is interrupted, other is forced to wait • Deadlock • Circular dependence between different processes acquiring locks, so everyone just wait for locks
H&M’s Transactional Memory [’93] • Intended to replace short critical sections • Motivated by lock-free data structures • Transactions: • Read and write multiple locations • Commit in arbitrary order • Implicit begin, explicit commit operations • Abort affects memory, not registers • Software manages restarting execution • Validate instruction detects pending abort • Implementation extends cache coherence • Read/Write locks correspond to MESI states • Add orthogonal transaction states
Transactional Hardware State • processor state • transaction active flag (TACTIVE)whether a transaction is in progress; implicitly set by 1st xactional op • transaction status flag (TSTATUS)whether the transaction is active (true) or aborted (false) • small, fully-associative xactional cachedisjoint from the L1 cache (data can only be one or the other) • hold tentative writes before propagationinvalidated if aborted, snooped and/or written back if committed • 2 copies of each xactional lineto avoid writebacks to memory; this enables xactional writes to hold both old & new value • abort another xaction that will cause conflict • aborted by interrupts & xactional cache overflows • act like regular cache if not in xaction • fast commit and abort (in a single cache cycle)
TM Instructions • Instructions for accessing memory • Load-transactional (LT) Reads from shared memory into private register • Load-transactional-exclusive (LTX) LT+ hinting write is coming up • Store-transactional (ST) Tentatively write from private register to shared memory, new value is not visible to other processors till commit • Instructions for manupulating xaction state • Commit Tries to make tentative write permanent. Successful if no other processor read its write set or write its read/write set. Write set visible to others. When fails, discard all updates to write set • Abort Discard all updates to write set • Validate Return current transaction status. Indicating whether it’s aborted. If current status is false, discard all updates to write set
Transaction Example /* keep trying */ While ( true ) { /* read variables */ v1 = LT ( V1 ); …; vn = LT ( Vn ); /* check consistency */ if ( ! VALIDATE () ) continue; /* compute new values */ compute ( v1, … , vn); /* write tentative values */ ST (v1, V1); … ST(vn, Vn); /* try to commit */ if ( COMMIT () ) return result; else backoff; }
Transactional Cache • Extend cache coherency protocols • any protocol capable of detecting accessibility conflicts can also detect transaction conflict at no extra cost. • Includes bus snoopy, directory • Additional transactional tag • EMPTY, NORMAL, XCOMMIT, XABORT • Two entries per xaction data XCOMMIT, XABORT • Allocation policy EMPTY>NORMAL>XCOMMIT • Bus cycles • T_READ and T_RFO(read for ownership) • BUSY Request can be refused by responding BUSY; When BUSY is received, xaction is aborted; This prevents deadlock and continual mutual aborts
Processor Operations • LT • Check for XABORT entry • If false, check for NORMAL entry • Switch NORMAL to XABORT and allocate XCOMMIT • If false, issue T_READ on bus, then allocate XABORT and XCOMMIT • If T_READ receive BUSY, abort • Set TSTATUS to false • Drop all XABORT entries • Set all XCOMMIT entries to NORMAL • Return random data • LTX, ST • Same as LT Except • Use T_RFO on a miss rather than T_READ, cache line state to RESERVED • For ST, XABORT entry is updated
Processor Operations • VALIDATE • Return TSTATUS flag • If false, set TSTATUS true, TACTIVE false • ABORT • Set TSTATUS true, TACTIVE false • Change XABORT to EMPTY, XCOMMIT to NORMAL • COMMIT • Return TSTATUS, set TSTATUS true, TACTIVE false • Drops all XCOMMIT and changes all XABORT to NORMAL
Snoopy Cache Actions • Regular cache • acts as MESI, treats READ as T_READ, RFO as T_RFO • Transactional cache • Non-xactional cycle: Acts like regular cache, NORMAL entries only • T_READ: If the the entry is valid (share), returns the value • All other cycle: BUSY • Memory • Responds to READ, T_READ, RFO, T_RFO when no cache responds; • WRITE
Advantage and disadvantage • Single cache for both reg/xaction data • Set size would determine the max xaction size; • Parallel commit/abort logic for a larger cache • Xaction size is limited by the xactional cache size • Overflow, traps into software • Xaction data set is small • Cannot survive interrupt
Simulation • Proteus Simulator • 32 processors • Regular cache • Direct mapped, 2048 8-byte lines • Transactional cache • Fully associative, 64 8-byte lines • Single cycle caches access • 4 cycle memory access • Both snoopy bus and directory are simulated • 2 stage network with switch delay of 1 cycle each
Benchmarks • Counter • n processors, each increment a shared counter (2^16)/n times • Producer/Consumer buffer • n/2 processors produce, n/2 processor consume through a shared FIFO • end when 2^16 items are consumed • Doubly-linked list • N processors tries to rotate the content from tail to head • End when 2^16 items are moved • Variables shared are conditional • Traditional locking method can introduce deadlock
Comparisons • Competitors • Transactional memory • Load-locked/store-cond (Alpha) • Spin lock with backoff • Software queue • Hardware queue
Conclusion • Avoid extra lock variable and lock problems • Trade dead lock for possible live lock/starvation • Comparable performance to lock technique when shared data structure is small • Relatively easy to implement
Outline • Introduction • Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) • Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)
Basic TCC Transaction Control Bits • In each local cache • Read bits (per cache line, or per word to eliminate false sharing) • Set on speculative loads • Snooped by a committing transaction (writes by other CPU) • Modified bits (per cache line) • Set on speculative stores • Indicate what to rollback if a violation is detected • Different from dirty bit
During A Transaction Commit • Need to collect all of the modified caches together into a commit packet • Potential solutions • A separate write buffer, or • An address buffer maintaining a list of the line tags to be committed • Size? • Broadcast all writes out as one single (large) packet to the rest of the system
Other • Rollback is needed when a transaction cannot commit • Checkpoints needed prior to a transaction • Checkpoint register state • Hardware approach: Flash-copying rename table / arch register file • Software approach: extra instruction overheads • Overflow issue • Conflict or capacity misses require all the victim lines to be kept somewhere (e.g. victim cache) • Stall temporarily, request for commit