110 likes | 297 Views
Transactional Memory. The Transactional Koolaid Acid Test. Herlihy & Moss -or-. Presented by Chris Rossbach. Compulsory Outline Slide. Motivation Transactional Memory Concept/Design Evaluation Discussion Conclusion/Questions. Motivation--Lock-based synchronization is hard.
E N D
Transactional Memory The Transactional Koolaid Acid Test Herlihy & Moss -or- Presented by Chris Rossbach
Compulsory Outline Slide • Motivation • Transactional Memory Concept/Design • Evaluation • Discussion • Conclusion/Questions
Motivation--Lock-based synchronization is hard • Priority Inversion • Convoying • Deadlock • Locks don’t compose • Course grain locks lose opportunities for parallelism • Fine-grain locking can be fast, but is hard to get right • More recently, impending CMP era increases need to exploit thread-level-parallelism
void my_noncomposable_func() { acquire_lock(&bad_dangerous_lock); // enjoy the mutual exclusion! party_on_my_data_structure(); release_lock(&bad_dangerous_lock); } Transactional Memory as a programming model BECOMES: void my_new_improved_func() { begin_transaction(); // enjoy the mutual exclusion! party_on_my_data_structure(); end_transaction(); }
Transactions in Hardware • Serializable • Atomic • Obey ACI properties (not D!) ISA Enhancements: LT - transactional load LTX - transactional load before ST STX - transactional store COMMIT - make transactional changes permanent VALIDATE - check current transaction status for violations/conflicts ABORT - discard transactional changes
Shared Counter Example my_silly_func() { atomic { shared_counter++; } } my_silly_func() { spin_lock(&shrc_lock); shared_counter++; spin_unlock(&shrc_lock); } LOCK-BASED 1: lock; decb shrc_lock jns 3f 2: pause cmpb $0, shrc_lock jle 2b jmp 1b 3: mov $3,shared_counter add $3, 1 mov shared_counter,$3 movb $1, shrc_lock TRANSACTIONAL LTX R1, shared_counter VALIDATE ADD R1, 1 ST shared_counter, R1 COMMIT Is this actually better?
Implementation • Generalization of LL/SC mechanisms • Extend cache coherence protocol: if you can detect access conflicts (think MSI or variants), you can detect TX conflicts • Need additional TX caches with augmented state [EMPTY, NORMAL,XCOMMIT,XABORT] exclusive w/non-TX cache • Per-processor TACTIVE and TSTATUS flags • Commit/Abort local to a cache, buffer writes in the cache until commit
Evaluation Shared Counter Benchmark Performance / Bus Usage TTS = test-and-test and set, MCS = queuing lock, QOSB = queueing lock, LL/SC = load-locked/store-conditional
Why Isn’t Everyone using this? • benchmarks: shared counter, linked list, producer consumer!? • How do you virtualize this? Cache overlflow? • What happens on a context switch? • What happens on an interrupt? • Is this really easier to program?
Conclusion I hereby conclude. Questions?