290 likes | 380 Views
LogTM: Log-Based Transactional Memory. Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis. Credits. Animations from the original LogTM HPCA presentation Original graphs modified for readability. Big Picture.
E N D
LogTM: Log-Based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood Presented by Colleen Lewis
Credits • Animations from the original LogTM HPCA presentation • Original graphs modified for readability
Big Picture • Hardware transaction motivation • Per thread log • Optimize commits (Hardware)
Design Decisions • Version Management • Eager – write in place • Lazy – write on commit • Conflict Detection • Eager – detect at read/write time • Lazy – detect at commit time
Transaction Logs • Pointer to the beginning of the log • Pointer to the end of the log • Read and Write bits for each cache line
Transaction Log Example VA Data Block R W • Initial State • LogBase = LogPointer • TM count > 0 00 12-------------- 0 0 40 --------------23 0 0 C0 34-------------- 0 0 1000 Log Base 1000 1040 Log Ptr 1000 1080 TM count 0 1 HPCA-12
Transaction Log Example VA Data Block R W • Store r2, (c0) /* r2 = 56 */ • Set W bit for block (c0) • Store address (c0) and old data on the log • Increment Log Ptr to 1048 • Update memory 00 12-------------- 0 0 40 --------------23 0 0 C0 34-------------- 56-------------- 0 0 1 1000 c0 34------------ Log Base 1000 1040 -- Log Ptr 1000 1048 1080 TM count 1 HPCA-12
0 0 0 1 Transaction Log Example VA Data Block R W • Commit transaction • Clear R & W for all blocks • Reset Log Ptr to Log Base (1000) • Clear TM count 00 12-------------- 0 0 40 --------------23 0 0 C0 56-------------- 0 0 1000 c0 34------------ Log Base 1000 1040 -- Log Ptr 1048 1000 1080 TM count 1 0 HPCA-12
0 0 0 1 34------------ -- Transaction Log Example VA Data Block R W • Abort transaction • Replay log entries to “undo” the transaction • Reset Log Ptr to Log Base (1000) • Clear R & W bits for all blocks • Clear TM count 00 12-------------- 0 0 40 --------------23 0 0 C0 56-------------- 34-------------- 0 0 1000 c0 Log Base 1000 1040 Log Ptr 1090 1000 1048 1080 TM count 1 0 HPCA-12
Conflict Detection • Checked at every read/write • Directory forwards read requests • Directory can have “sticky” data • Individual nodes responsible for detecting conflicts • Needs • Transaction mode bit • Overflow bit
GETX DATA Conflict Detection (example) • P0 store • P0 sends get exclusive (GETX) request • Directory responds with data (old) • P0 executes store Directory I [old] M@P0 [old] P0 P1 TM mode TM mode 0 1 0 Overflow Overflow 0 0 M (-W) [new] M (--) [old] I (--) [none] I (--) [none] HPCA-12
Conflict! Conflict Detection (example) • In-cache transaction conflict • P1 sends get shared (GETS) request • Directory forwards to P0 • P0 detects conflict and sends NACK Directory M@P0 [old] GETS Fwd_GETS P0 P1 TM mode TM mode 1 0 0 Overflow Overflow 0 0 M (-W) [new] M (-W) [new] I (--) [none] NACK HPCA-12
Conflict Detection (example) • Cache overflow • P0 sends put exclusive (PUTX) request • Directory acknowledges • P0 sets overflow bit • P0 writes data back to memory Directory M@P0 [old] Msticky@P0 [new] PUTX ACK DATA P0 P1 TM mode TM mode 1 0 0 Overflow Overflow 1 0 0 M (-W) [new] I (--) [none] I (--) [none] HPCA-12
Conflict! Conflict Detection (example) • Out-of-cache conflict • P1 sends GETS request • Directory forwards to P0 • P0 detects a (possible) conflict • P0 sends NACK Directory M@P0 [old] Msticky@P0 [new] GETS Fwd_GETS P0 P1 TM mode TM mode 1 0 0 Overflow Overflow 1 0 1 0 M (--) [old] I (--) [none] M (-W) [new] I (--) [none] I (--) [none] NACK HPCA-12
0 0 Conflict Detection (example) • Commit • P0 clears TM mode and Overflow bits Directory M@P0 [old] Msticky@P0 [new] P0 P1 TM mode TM mode 1 0 0 Overflow Overflow 0 1 0 M (--) [old] I (--) [none] M (-W) [new] I (--) [none] I (--) [none] HPCA-12
Conflict Detection (example) • Lazy cleanup • P1 sends GETS request • Directory forwards request to P0 • P0 detects no conflict, sends CLEAN • Directory sends Data to P1 Directory S(P1) [new] Msticky@P0 [new] GETS DATA CLEAN Fwd_GETS P0 P1 TM mode TM mode 0 0 0 Overflow Overflow 0 0 0 S (--) [new] I (--) [none] M (--) [old] M (-W) [new] I (--) [none] I (--) [none] HPCA-12
False Positives? • What if P0 has started a new transaction without cleaning the sticky data?
False Positive Example • Cache overflow • P0 sends put exclusive (PUTX) request • Directory acknowledges • P0 sets overflow bit • P0 writes data back to memory Directory M@P0 [old] Msticky@P0 [new] PUTX ACK DATA P0 P1 TM mode TM mode 0 1 0 Overflow Overflow 0 1 0 M (-W) [new] I (--) [none] I (--) [none]
0 0 False Positive Example • Commit • P0 clears TM mode and Overflow bits • Start New Transaction • P0 set TM mode • Eventually overflow • Set overflow bits Directory M@P0 [old] Msticky@P0 [new] P0 P1 TM mode TM mode 0 1 1 0 Overflow Overflow 1 1 0 0 I (--) [none] M (--) [old] M (-W) [new] I (--) [none] I (--) [none]
Conflict! Conflict Detection (example) • Out-of-cache conflict • P1 sends GETS request • Directory forwards to P0 • P0 detects a (possible) conflict • P0 sends NACK Directory M@P0 [old] Msticky@P0 [new] GETS Fwd_GETS P0 P1 TM mode TM mode 0 1 0 Overflow Overflow 1 0 1 0 M (--) [old] I (--) [none] M (-W) [new] I (--) [none] I (--) [none] NACK
Conflict Resolution and Deadlock Avoidance • Options • Wait – risk deadlock? • Abort – risk livelock? • Current Behavior • Wait • Abort if waiting on a logically younger process • Future Behavior? • Software contention manager
Evaluation • 32 SPARC processors • Solaris 9 OS • SIMICS – full system simulator • Magic no-ops • Tests • Micro-benchmarks • SPLASH suite
Microbenchmarks • High Contention / Short Transactions • Comparing: • EXP - TTS locks with exponential backoff • MCS – SW Queue based locks BEGIN_TRANSACTION(); new_total = total.count + 1; private_data[id].count++; total.count = new_total; COMMIT_TRANSACTION();
SPLASH2 Benchmark Results • Data presented as: PARMACS locks execution time LogTM execution time • Modified version: LogTM execution time PARMACS locks execution time 1 -
Conclusions • Optimize commits • Aborts handled by software • Stall to avoid wasting work • Allow sticky data because overflow is rare • Good performance on microbenchmark • False sharing has a big impacts on LogTM