230 likes | 384 Views
Extending Open64 with Transactional Memory features. Jiaqi Zhang Tsinghua University. Contents. Background Design Implementation Optimization Experiment Conclusion. Transactional Memory Background. Trend to concurrent programming Current solution: Lock Flaws:
E N D
Extending Open64 withTransactional Memory features Jiaqi Zhang Tsinghua University
Contents • Background • Design • Implementation • Optimization • Experiment • Conclusion
Transactional Memory Background • Trend to concurrent programming • Current solution: • Lock • Flaws: • Association between locks and data • Deadlock • Not composable
Transactional Memory Background bool credit(int amount){ acquire(mylock); balance+=amount; release(mylock); } bool debit(int amount){ acquire(mylock); balance-=amount; release(mylock); } class Account{ int balance; lock mylock; bool credit(int amount); bool debit(int amount); }; transfer(Account a, Account b, int amount){ } acquire(a.mylock); acquire(b.mylock); release(a.mylock); release(b.mylock); atomic{ a.credit(amount); b.debit(amount); } inconsistent state a.credit(amount); b.debit(amount); Poor abstraction of class Account Deadlock Exposed implementation details
Transactional Memory Background • Current Implementations • TM libraries • DSTM • DracoSTM • TL2 • TinySTM • …….. Function calls: TM_INIT()/TM_SHUTDOWN() TM_ATOMIC_BEGIN()/TM_ATOMIC_END() TM_SHARED_READ()/TM_SHARED_WRITE() Explicit Transaction
Transactional Memory Background • Current Implementations • Compilers • Intel C++ STM Compiler • Tanger • OpenTM • GCC
Design • Programming Interfaces readonly #pragma tm atomic [clause] structured block private(var list) shared(var list) #pragma tm abort #pragma tm function function declaration #pragma tm waiver function declaration
Design • TM runtime interfaces (TL2)
Design • Wrapper functions • To ease the process of integrating new TM libraries tm_init()/tm_finalize() tm_thread_start()/tm_thread_end() __tm_atomic_begin()/__tm_atomic_end() __tm_shared_read()/__tm_shared_read_float() __tm_shared_write()/__tm_shared_write_float() __tm_local_write()/__tm_local_write_float() by programmers by compiler more wrapper functions are needed for other data types, and additional TM semantics
Design • Optimization • Eliminate redundant calls to runtime libraries
Implementation • General Transformation
Implementation • General Transformation • #pragma tm atomic • simple statements • control flow statements • IF • WHILE_DO PARM #address of I CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_0 WHILE_DO LDID #tm_preg_num_0 INTCONST 9 LE BODY BLOCK ……………. PARM #address of I CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_0 END_BLOCK setjmp(); __tm_atomic_begin(); PARM #address of c CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_0 PARM #address of b CALL <__tm_shared_read> LDID <return_offset> STID #tm_preg_num_1 LDID #tm_preg_num_0 LDID #tm_preg_num_1 ADD PARM PARM #address of a CALL <__tm_shared_write> a = b+c; for(;i<10;i++){ }
Implementation • General Transformation
Implementation • Functions • clone and instrument void calculate() __tm_cloned__calculate() //instrumented #pragma tm function void calculate(){} #pragma tm atomic { calculate(); } #pragma tm atomic { __tm_cloned__calculate(); }
Implementation • Optimization Transaction local variables : detected by the frontend
Implementation • Optimization Barrier Free variables : detected according to its storage class
Implementation • Optimization
Implementation • Optimization • Optimization opportunities detection strategy • Pthread parallel task • transaction local: declared in tm atomic scope • barrier free: auto variables • Cloned transactional function • transaction local: declared in the function • OpenMP parallel task • transaction local: declared in tm atomic scope • barrier free: declared in micro task, marked in openmp private clause • Checking readonly transactions • Limitation • Reserved design for pointers • Needs programmers to participate in optimization
Preliminary Experiments • Compare with fine-grained lock based application
Preliminary Experiments • Compare with manually instrumented application
Preliminary Experiments private(feature) #pragma tm atomic { int j; *new_centers_len[index] ++; for(j=0;j<nfeatures;j++){ new_centers[index][j]+=feature[i][j]; } }
Conclusion & Future work • A infrastructure for TM on Open64 • Replaceable TM implementation • Optimization • More experiments on non-trivial applications are desired • Nested transaction • Signal processing • Event handler • Indirect calls • Dealing with legacy code • … FastDB: 8 out of 75 critical regions contain nested transactions FastDB: 28 out of 75 critical regions contain signal processing PARSEC: 20 out of 55 critical regions contain signal processing