1 / 54

Transactional Memory CDA6159

Transactional Memory CDA6159. Outline. Introduction Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04). Introduction. Transaction

leanne
Download Presentation

Transactional Memory CDA6159

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transactional MemoryCDA6159

  2. Outline • Introduction • Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) • Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)

  3. Introduction • Transaction A sequence of actions that appears indivisible and instantaneous to an outside observer. Four specific attributes: atomicity, consistency, isolation, and durability — collectively known as the ACID properties.

  4. Introduction • Concurrency control Lock? Bad performance, deadlock, etc. lock-free, optimistic cc Herlihy and Moss in 1993 proposed hardware-supported transactional memory as a mechanism for building lock-free data structures.

  5. Basic Transactional Mechanisms • Isolation • Detect when transactions conflict • Track read and write sets • Version management • Record new and old values • Atomicity • Commit new values • Abort back to old values

  6. H/W Transactional Memory Systems • Knight’s Lisp Work • Transactional Memory • Oklahoma Update • SLE/TLR • Transactional Coherence and Consistency • Unbounded TM • Virtual TM • Thread-level TM

  7. Outline • Introduction • Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) • Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)

  8. Lock and Problems • Lock is commonly used with shared data • Priority Inversion • Lower priority process hold a lock needed by a higher priority process • Convoy Effect • When lock holder is interrupted, other is forced to wait • Deadlock • Circular dependence between different processes acquiring locks, so everyone just wait for locks

  9. H&M’s Transactional Memory [’93] • Intended to replace short critical sections • Motivated by lock-free data structures • Transactions: • Read and write multiple locations • Commit in arbitrary order • Implicit begin, explicit commit operations • Abort affects memory, not registers • Software manages restarting execution • Validate instruction detects pending abort • Implementation extends cache coherence • Read/Write locks correspond to MESI states • Add orthogonal transaction states

  10. Transactional Hardware State • processor state • transaction active flag (TACTIVE)whether a transaction is in progress; implicitly set by 1st xactional op • transaction status flag (TSTATUS)whether the transaction is active (true) or aborted (false) • small, fully-associative xactional cachedisjoint from the L1 cache (data can only be one or the other) • hold tentative writes before propagationinvalidated if aborted, snooped and/or written back if committed • 2 copies of each xactional lineto avoid writebacks to memory; this enables xactional writes to hold both old & new value • abort another xaction that will cause conflict • aborted by interrupts & xactional cache overflows • act like regular cache if not in xaction • fast commit and abort (in a single cache cycle)

  11. TM Instructions • Instructions for accessing memory • Load-transactional (LT) Reads from shared memory into private register • Load-transactional-exclusive (LTX) LT+ hinting write is coming up • Store-transactional (ST) Tentatively write from private register to shared memory, new value is not visible to other processors till commit • Instructions for manupulating xaction state • Commit Tries to make tentative write permanent. Successful if no other processor read its write set or write its read/write set. Write set visible to others. When fails, discard all updates to write set • Abort Discard all updates to write set • Validate Return current transaction status. Indicating whether it’s aborted. If current status is false, discard all updates to write set

  12. Transaction Example /* keep trying */ While ( true ) { /* read variables */ v1 = LT ( V1 ); …; vn = LT ( Vn ); /* check consistency */ if ( ! VALIDATE () ) continue; /* compute new values */ compute ( v1, … , vn); /* write tentative values */ ST (v1, V1); … ST(vn, Vn); /* try to commit */ if ( COMMIT () ) return result; else backoff; }

  13. Transactional Cache • Extend cache coherency protocols • any protocol capable of detecting accessibility conflicts can also detect transaction conflict at no extra cost. • Includes bus snoopy, directory • Additional transactional tag • EMPTY, NORMAL, XCOMMIT, XABORT • Two entries per xaction data XCOMMIT, XABORT • Allocation policy EMPTY>NORMAL>XCOMMIT • Bus cycles • T_READ and T_RFO(read for ownership) • BUSY Request can be refused by responding BUSY; When BUSY is received, xaction is aborted; This prevents deadlock and continual mutual aborts

  14. Processor Operations • LT • Check for XABORT entry • If false, check for NORMAL entry • Switch NORMAL to XABORT and allocate XCOMMIT • If false, issue T_READ on bus, then allocate XABORT and XCOMMIT • If T_READ receive BUSY, abort • Set TSTATUS to false • Drop all XABORT entries • Set all XCOMMIT entries to NORMAL • Return random data • LTX, ST • Same as LT Except • Use T_RFO on a miss rather than T_READ, cache line state to RESERVED • For ST, XABORT entry is updated

  15. Processor Operations • VALIDATE • Return TSTATUS flag • If false, set TSTATUS true, TACTIVE false • ABORT • Set TSTATUS true, TACTIVE false • Change XABORT to EMPTY, XCOMMIT to NORMAL • COMMIT • Return TSTATUS, set TSTATUS true, TACTIVE false • Drops all XCOMMIT and changes all XABORT to NORMAL

  16. Snoopy Cache Actions • Regular cache • acts as MESI, treats READ as T_READ, RFO as T_RFO • Transactional cache • Non-xactional cycle: Acts like regular cache, NORMAL entries only • T_READ: If the the entry is valid (share), returns the value • All other cycle: BUSY • Memory • Responds to READ, T_READ, RFO, T_RFO when no cache responds; • WRITE

  17. Advantage and disadvantage • Single cache for both reg/xaction data • Set size would determine the max xaction size; • Parallel commit/abort logic for a larger cache • Xaction size is limited by the xactional cache size • Overflow, traps into software • Xaction data set is small • Cannot survive interrupt

  18. Simulation • Proteus Simulator • 32 processors • Regular cache • Direct mapped, 2048 8-byte lines • Transactional cache • Fully associative, 64 8-byte lines • Single cycle caches access • 4 cycle memory access • Both snoopy bus and directory are simulated • 2 stage network with switch delay of 1 cycle each

  19. Benchmarks • Counter • n processors, each increment a shared counter (2^16)/n times • Producer/Consumer buffer • n/2 processors produce, n/2 processor consume through a shared FIFO • end when 2^16 items are consumed • Doubly-linked list • N processors tries to rotate the content from tail to head • End when 2^16 items are moved • Variables shared are conditional • Traditional locking method can introduce deadlock

  20. Comparisons • Competitors • Transactional memory • Load-locked/store-cond (Alpha) • Spin lock with backoff • Software queue • Hardware queue

  21. Counter Result

  22. Producer/Consumer Result

  23. Doubly Linked List Result

  24. Conclusion • Avoid extra lock variable and lock problems • Trade dead lock for possible live lock/starvation • Comparable performance to lock technique when shared data structure is small • Relatively easy to implement

  25. Outline • Introduction • Paper 1: Architectural Support for Lock-Free Data Structures (Maurice Herlihy, ISCA ‘93) • Paper 2: Transactional Memory Coherence and Consistency (Lance Hammond, ISCA ‘04)

  26. Basic TCC Transaction Control Bits • In each local cache • Read bits (per cache line, or per word to eliminate false sharing) • Set on speculative loads • Snooped by a committing transaction (writes by other CPU) • Modified bits (per cache line) • Set on speculative stores • Indicate what to rollback if a violation is detected • Different from dirty bit

  27. During A Transaction Commit • Need to collect all of the modified caches together into a commit packet • Potential solutions • A separate write buffer, or • An address buffer maintaining a list of the line tags to be committed • Size? • Broadcast all writes out as one single (large) packet to the rest of the system

  28. Other • Rollback is needed when a transaction cannot commit • Checkpoints needed prior to a transaction • Checkpoint register state • Hardware approach: Flash-copying rename table / arch register file • Software approach: extra instruction overheads • Overflow issue • Conflict or capacity misses require all the victim lines to be kept somewhere (e.g. victim cache) • Stall temporarily, request for commit

More Related