250 likes | 262 Views
Transactional Memory. Yujia Jin. Lock and Problems. Lock is commonly used with shared data Priority Inversion Lower priority process hold a lock needed by a higher priority process Convoy Effect When lock holder is interrupted, other is forced to wait Deadlock
E N D
Transactional Memory Yujia Jin
Lock and Problems • Lock is commonly used with shared data • Priority Inversion • Lower priority process hold a lock needed by a higher priority process • Convoy Effect • When lock holder is interrupted, other is forced to wait • Deadlock • Circular dependence between different processes acquiring locks, so everyone just wait for locks
Lock-free • Shared data structure is lock-free if its operations do not require mutual exclusion - Will not prevent multiple processes operating on the same object + avoid lock problems - Existing lock-free techniques use software and do not perform well against lock counterparts
Transactional Memory • Use transaction style operations to operate on lock free data • Allow user to customized read-modify-write operation on multiple, independent words • Easy to support with hardware, straight forward extensions to conventional multiprocessor cache
Transaction Style • A finite sequence of machine instruction with • Sequence of reads, • Computation, • Sequence of write and • Commit • Formal properties • Atomicity, Serializability (~ACID)
Access Instructions • Load-transactional (LT) • Reads from shared memory into private register • Load-transactional-exclusive (LTX) • LT + hinting write is coming up • Store-transactional (ST) • Tentatively write from private register to shared memory, new value is not visible to other processors till commit
State Instructions • Commit • Tries to make tentative write permanent. • Successful if no other processor read its read set or write its write set • When fails, discard all updates to write set • Return the whether successful or not • Abort • Discard all updates to write set • Validate • Return current transaction status • If current status is false, discard all updates to write set
Typical Transaction /* keep trying */ While ( true ) { /* read variables */ v1 = LT ( V1 ); …; vn = LT ( Vn ); /* check consistency */ if ( ! VALIDATE () ) continue; /* compute new values */ compute ( v1, … , vn); /* write tentative values */ ST (v1, V1); … ST(vn, Vn); /* try to commit */ if ( COMMIT () ) return result; else backoff; }
Warning… • Not intended for database use • Transactions are short in time • Transactions are small in dataset
Idea Behind Implementation • Existing cache protocol detects accessibility conflicts • Accessibility conflicts ~ transaction conflicts • Can extended to cache coherent protocols • Includes bus snoopy, directory
Bus Snoopy Example processor Regular cache 2048 8-byte lines Direct mapped bus • Caches are exclusive • Transaction cache contains tentative writes without propagating them to other processors Transaction cache 64 8-byte lines Fully associative
Transaction Cache • Cache line contains separate transactional tag in addition to coherent protocol tag • Transactional tag state: empty, normal, xcommit, xabort • Two entries per transaction • Modification write to xabort, set to empty when abort • Xcommit contains the original, set to empty when commits • Allocation policy order in decreasing favor • Empty entries, normal entries, xcommit entries • Must guarantee a minimum transaction size
Bus Actions • T_READ and T_RFO(read for ownership) are added for transactional requests • Transactional request can be refused by responding BUSY • When BUSY response is received, transaction is aborted • This prevents deadlock and continual mutual aborts • Can subject to starvation
Processor Actions • Transaction active (TACTIVE) flag indicate whether a transaction is in progress, set on first transactional operation • Transaction status (TSTATUS) flag indicate whether a transaction is aborted
LT Actions • Check for XABORT entry • If false, check for NORMAL entry • Switch NORMAL to XABORT and allocate XCOMMIT • If false, issue T_READ on bus, then allocate XABORT and XCOMMIT • If T_READ receive BUSY, abort • Set TSTATUS to false • Drop all XABORT entries • Set all XCOMMIT entries to NORMAL • Return random data
LTX and ST Actions • Same as LT Except • Use T_RFO on a miss rather than T_READ • For ST, XABORT entry is updated
More Exciting Actions • VALIDATE • Return TSTATUS flag • If false, set TSTATUS true, TACTIVE false • ABORT • Update cache, set TSTATUS true, TACTIVE false • COMMIT • Return TSTATUS, set TSTATUS true, TACTIVE false • Drops all XCOMMIT and changes all XABORT to NORMAL
Snoopy Cache Actions • Regular cache acts like MESI invalidate, treats READ same as T_READ, RFO same as T_RFO • Transactional cache • Non-transactional cycle: Acts like regular cache with NORMAL entries only • T_READ: If the the entry is valid (share), returns the value • All other cycle: BUSY
Simulation • Proteus Simulator • 32 processors • Regular cache • Direct mapped, 2048 8-byte lines • Transactional cache • Fully associative, 64 8-byte lines • Single cycle caches access • 4 cycle memory access • Both snoopy bus and directory are simulated • 2 stage network with switch delay of 1 cycle each
Benchmarks • Counter • n processors, each increment a shared counter (2^16)/n times • Producer/Consumer buffer • n/2 processors produce, n/2 processor consume through a shared FIFO • end when 2^16 items are consumed • Doubly-linked list • N processors tries to rotate the content from tail to head • End when 2^16 items are moved • Variables shared are conditional • Traditional locking method can introduce deadlock
Comparisons • Competitors • Transactional memory • Load-locked/store-cond (Alpha) • Spin lock with backoff • Software queue • Hardware queue
Conclusion • Avoid extra lock variable and lock problems • Trade dead lock for possible live lock/starvation • Comparable performance to lock technique when shared data structure is small • Relatively easy to implement