Read-Write Lock Allocation in Software Transactional Memory

Read-Write Lock Allocation in Software Transactional Memory Amir Ghanbari Bavarsad and Ehsan Atoofian Lakehead University

Transactional Memory • Software transactional memory (STM) exploits a global clock to validate transactional data • Pros: reduces validation overhead • Cons: contention • Alternate: Read Write Lock Allocation (RWLA) • Pros: no central clock • Cons: overhead if a TX aborts • Speculative RWLA: changes validation policy dynamically → Speedup: up to 66% P P n 1 $ $ Global Clock

Outline • Background • RWLA • Speculative RWLA • Conclusion

Counter in STM TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); T1

Validation in STM • Transactional data are validated using: • Global clock • Shared variable • Timestamp for transactions • Lock • Memory is mapped to Lock Table • Each entry of the table: • Version # Global Clock … Version # Lock Table … Memory

Version # Updating Global Clock & Lock • Increment Global Clock • Version # = global_clock Global Clock … counter Lock Table … Memory

Validation in STM • rv (read version) is set to global_clock T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); rv Metadata for TX1 Global Clock

Successful Read Validation • rv >= version# • The most recent write to counter, occurred before TM_BEGIN() T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); rv Metadata for TX1 Global Clock

Failed Read Validation • rv < version# • The most recent write to counter, occurred after TM_BEGIN() T1 TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); rv Metadata for TX1 Global Clock

Overhead of Validation • This method, called GV4, results in many cache coherence misses if transactions commit frequently P P n 1 $ $ Global Clock

Outline • Background • RWLA • Speculative RWLA • Conclusion

Read Write Lock Allocation (RWLA) • Lock • Memory is mapped to Lock Table • Each entry of the table: • Lock bit • Read bits … lock bit Read bits Pn-1 … P1 P0 Lock Table … Memory

TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); 0 0 0 ….. 0 0 0

TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); TM_READ() Lock bit is free? Yes Set read bit in the corresponding lock entry lock bit 1 0 0 0 ….. 0 0 0

TM_READ TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); TM_READ() No Lock bit is free? Abort Yes Set read bit in the corresponding lock entry 0 0 0 ….. 0 0 1

TM_WRITE TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); All read bits are clear? No Abort 0 0 1 ….. 0 0 0

TM_WRITE TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); All read bits are clear? No Abort Yes Acquire lock failed 0 0 0 ….. 0 0 1

TM_WRITE TM_WRITE TM_BEGIN(); local_counter = TM_READ(counter); local_counter++; TM_WRITE(counter, local_counter); TM_END(); All read bits are clear? No Abort Yes Acquire lock failed 1 0 0 0 ….. 0 0 0

Experimental Framework • Benchmarks: Stamp v0.9.7 • Run up to competition • Measured statistics over 10 runs • TL2 as an STM framework • Two Intel Xeon E5660, 6-way CMP

Performance of RWLA better

Speculative RWLA • Conflict occurs frequently → select GV4 • Conflict occurs rarely → select RWLA • How to predict conflict?

1 X1 Xn Contention Predictor xi: global transaction history, bipolar value • Prediction: • y≥0 →predict commit • y<0 →predict abort • Update • If outcome of current TX and TXi agree/disagree →increment/decrement wi wi: weight vector … w0 wn w1 y

Performance of Speculative RWLA • # of threads changes between 2 and 16 • On average, performance changes from 21% in Bayes to 47% in Labyrinth better

Conclusion • RWLA to overcome contentions over global clok • Applications react differently to GV4 and RWLA • Speculative RWLA changes validation policy dynamically • Speculative RWLA performance of STMs up to 66%

Thank You! Questions?

Read-Write Lock Allocation in Software Transactional Memory

Read-Write Lock Allocation in Software Transactional Memory

Presentation Transcript

Transactional memory

Software Transactional Memory

Software Transactional Memory

Lock-free programming and transactional memory

Adaptive Software Transactional Memory

Software Transactional Memory

Maintaining Multiple Versions in Software Transactional Memory

Software Transactional Memory

Software Transactional Memory

Transactional Memory

Transactional Memory

Analyzing Aborts in Software Transactional Memory

Analyzing Aborts in Software Transactional Memory

Transactional Memory

Algorithmics for Software Transactional Memory

Software Transactional Memory

Software Transactional Memory

Dynamic Software Transactional Memory

Transactional Memory

Transactional Memory

Software Transactional Memory

Software Perspectives on Transactional Memory