310 likes | 324 Views
Explore LogTM—an innovative Log-based Transactional Memory system that prioritizes eager version management, fast conflict detection, and efficient transactional block updates. This system optimizes memory operations and effectively handles conflicts with minimal complexity, enhancing overall system performance.
E N D
LogTM: Log-based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo
Motivation • Previous TM systems abort fast, commit slow • Old values “in place” • New values somewhere else • Commit is the common case! • Remember Amdahl’s Law • Conflicts usually solved by hardware • Fast but myopic • Trapping to SW if needed for careful resolution
LogTM • Eager version management • Puts new values in place for faster commits • No data moves even on cache overflow • Eager conflict detection • Detects offending ld/st immediately • Fast conflict detection on evicted blocks • Fast commit by lazy reset of directory state • Handle aborts by SW • Aborts are much less common than commits
Eager Version Management • Per-thread log in cacheable virtual memory • On st. logs address and previous contents of block • Write bit • Tracks if a block has been stored and logged • Faster commits • Clear W bits and reset log (pointer) • Slower aborts • Also has to write old values back
Virtual Address R W Data Block 0 0 00 0 0 40 0 0 c0 1000 1040 1080 LogBase 1000 LogPtr 1 1000 LogPtr
Virtual Address R W Data Block 1 0 00 0 0 40 0 0 c0 1000 1040 1080 LogBase 1000 LogPtr 1 1000 LogPtr
Virtual Address R W Data Block 1 0 00 0 0 40 0 1 c0 1000 1040 1080 LogBase 1000 LogPtr 1 1048 LogPtr
Virtual Address R W Data Block 1 0 00 1 1 40 0 1 c0 1000 1040 1080 LogBase 1000 LogPtr 1 1090 LogPtr
Virtual Address R W Data Block 0 0 00 0 0 40 0 0 c0 1000 1040 1080 LogBase 1000 LogPtr 0 1000 LogPtr
Virtual Address R W Data Block 0 0 00 0 0 40 0 0 c0 1000 1040 1080 LogBase 1000 LogPtr 0 1000 LogPtr
Conflict detection • Coherence requests sent to directory • Directory will forward to other processor(s) • Processors will detect conflict • Using local state • Ack/Nack as response • Requester resolves any conflict • Adds read bit to each cache block • Extends MOESI protocol • “Sticky” states
Conflict detection • Works even after cache overflow • Forward to conflicting requests to “interested” processors • Adds a per processor overflow bit • The transactional block can be updated • Requests will still be redirected to the processor • Processor can Nack on conflict
Replacement behavior • Depends on MOESI state • M: Replace with transactional writeback • Sets state as “Sticky@Processor” • Requests are forwarded to the processor • S: Silently replaced, • Adds processor to sharer list • Requests forwarded to all sharers • O: Write back to directory • Add itself to sharer list, same as S if requested exclusively • E: Same as O
Directory Idle [old] P I (--) [none] TMcount: 1 Overflow: 0
Directory M@P [old] GETX DATA ACK P M (R W) [new] TMcount: 1 Overflow: 0
Directory M@P [old] GETS Fwd_GETS NACK P Q M (R W) [new] I (- -) [ ] NACK TMcount: 1 Overflow: 0 TMcount: 1 Overflow: 0
Directory M@P[new] PUTX NACK WB_XACT P I (- -) [ ] TMcount: 1 Overflow: 1
Directory M@P[new] Fwd_GETS NACK GETS NACK P Q I (- -) [ ] I (- -) [ ] TMcount: 1 Overflow: 1 TMcount: 1 Overflow: 0
Directory E@Q[new] DATA Fwd_GETS CLEAN GETS ACK P Q I (- -) [ ] E (R -) [new] TMcount: 0 Overflow: 0 TMcount: 1 Overflow: 0
Conflict detection • Lazy clean up better if overflow is rare • Can be improved otherwise (i.e. use Bloom filters) • Ambiguities handled conservatively • Refetch during same against earlier transaction • Set R&W bits • Log old values
Conflict Resolution • When two transactions conflict • At least one must stall or abort • Quick myopic decision by HW • Slow and careful by SW • Hybrid approach: • HW seeks fast solution, traps to software if problem persists
Conflict resolution • Distributed timestamp • Trap to conflict handler (SW) • Transaction could cause deadlock • Logically later than transaction in conflict • Per processor possible cycle flag • Conflict if nack received from a logically earlier transaction with possible cycle flag set
Evaluation • Target System • SPARC Solaris 32 Processors 1Ghz • L1: 16KB 4-way split, 1 cycle latency • L2: 4 MB 4-way unified, 12-cycle latency • Memory: 4GB 80-cycle latency • Directory: Full-bit vector sharer list, migratory sharing optimization, directory cache, 6-cycle latency • Interconnection: Hierarchical switch topology, 14-cycle link latency • Simulated using Simics • LogTM interface added by “magic” instructions
Microbenchmark • Shared counter micro-benchmark • Compared to • Exponential Backoff • MCS locks • LogTM outperforms them • LogTM does not abort transactions
SPLASH • Evaluated using a subset of SPLASH-2 • Used two versions of raytrace (with/without false sharing) • False sharing has significant impact! • Performance gains from moderate to large
Benchmark Analysis • LogTM must read a block before writing it to the log • Benchmarks showed that data is usually read anyway • LogTM is more sensitive to false sharing than lock approaches • Since the log is required to be valid only until an abort • A k-block log write buffer reduces most writes as shown in the benchmarks.
Related Work • TCC • Lazy version management (slow commits) • Lazy conflict detection (detect on commit) • LTM • On overflow stores new values in uncacheable in-memory hash table • LogTM allows both old and new versions cached
Related Work • UTM • Logs blocks targeted by both loads and stores • More complete conflict detection • Must walk log on certain coherence requests • VTM • Per address space virtual mode for cache evictions, paging, context switches • Virtualized VTM uses micro-code for conflict detection. (LogTM uses MOESI extension)
Conclusion • Presents a TM implementation designed to speed up the common case • Efficiently handles cache evictions • Requires simple architectural changes • Registers, state, directory extension • Work towards hybrid conflict detection • No paging or context switch support • Very sensitive to false sharing