1 / 31

LogTM: Log-based Transactional Memory

LogTM: Log-based Transactional Memory. Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo. Motivation. Previous TM systems abort fast, commit slow Old values “in place” New values somewhere else Commit is the common case!

wan
Download Presentation

LogTM: Log-based Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LogTM: Log-based Transactional Memory Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo

  2. Motivation • Previous TM systems abort fast, commit slow • Old values “in place” • New values somewhere else • Commit is the common case! • Remember Amdahl’s Law • Conflicts usually solved by hardware • Fast but myopic • Trapping to SW if needed for careful resolution

  3. Transactional Memory Taxonomy

  4. LogTM • Eager version management • Puts new values in place for faster commits • No data moves even on cache overflow • Eager conflict detection • Detects offending ld/st immediately • Fast conflict detection on evicted blocks • Fast commit by lazy reset of directory state • Handle aborts by SW • Aborts are much less common than commits

  5. Eager Version Management • Per-thread log in cacheable virtual memory • On st. logs address and previous contents of block • Write bit • Tracks if a block has been stored and logged • Faster commits • Clear W bits and reset log (pointer) • Slower aborts • Also has to write old values back

  6. Virtual Address R W Data Block 0 0 00 0 0 40 0 0 c0 1000 1040 1080 LogBase 1000 LogPtr 1 1000 LogPtr

  7. Virtual Address R W Data Block 1 0 00 0 0 40 0 0 c0 1000 1040 1080 LogBase 1000 LogPtr 1 1000 LogPtr

  8. Virtual Address R W Data Block 1 0 00 0 0 40 0 1 c0 1000 1040 1080 LogBase 1000 LogPtr 1 1048 LogPtr

  9. Virtual Address R W Data Block 1 0 00 1 1 40 0 1 c0 1000 1040 1080 LogBase 1000 LogPtr 1 1090 LogPtr

  10. Virtual Address R W Data Block 0 0 00 0 0 40 0 0 c0 1000 1040 1080 LogBase 1000 LogPtr 0 1000 LogPtr

  11. Virtual Address R W Data Block 0 0 00 0 0 40 0 0 c0 1000 1040 1080 LogBase 1000 LogPtr 0 1000 LogPtr

  12. Conflict detection • Coherence requests sent to directory • Directory will forward to other processor(s) • Processors will detect conflict • Using local state • Ack/Nack as response • Requester resolves any conflict • Adds read bit to each cache block • Extends MOESI protocol • “Sticky” states

  13. Conflict detection • Works even after cache overflow • Forward to conflicting requests to “interested” processors • Adds a per processor overflow bit • The transactional block can be updated • Requests will still be redirected to the processor • Processor can Nack on conflict

  14. Replacement behavior • Depends on MOESI state • M: Replace with transactional writeback • Sets state as “Sticky@Processor” • Requests are forwarded to the processor • S: Silently replaced, • Adds processor to sharer list • Requests forwarded to all sharers • O: Write back to directory • Add itself to sharer list, same as S if requested exclusively • E: Same as O

  15. Directory Idle [old] P I (--) [none] TMcount: 1 Overflow: 0

  16. Directory M@P [old] GETX DATA ACK P M (R W) [new] TMcount: 1 Overflow: 0

  17. Directory M@P [old] GETS Fwd_GETS NACK P Q M (R W) [new] I (- -) [ ] NACK TMcount: 1 Overflow: 0 TMcount: 1 Overflow: 0

  18. Directory M@P[new] PUTX NACK WB_XACT P I (- -) [ ] TMcount: 1 Overflow: 1

  19. Directory M@P[new] Fwd_GETS NACK GETS NACK P Q I (- -) [ ] I (- -) [ ] TMcount: 1 Overflow: 1 TMcount: 1 Overflow: 0

  20. Directory E@Q[new] DATA Fwd_GETS CLEAN GETS ACK P Q I (- -) [ ] E (R -) [new] TMcount: 0 Overflow: 0 TMcount: 1 Overflow: 0

  21. Conflict detection • Lazy clean up better if overflow is rare • Can be improved otherwise (i.e. use Bloom filters) • Ambiguities handled conservatively • Refetch during same against earlier transaction • Set R&W bits • Log old values

  22. Hardware support

  23. Conflict Resolution • When two transactions conflict • At least one must stall or abort • Quick myopic decision by HW • Slow and careful by SW • Hybrid approach: • HW seeks fast solution, traps to software if problem persists

  24. Conflict resolution • Distributed timestamp • Trap to conflict handler (SW) • Transaction could cause deadlock • Logically later than transaction in conflict • Per processor possible cycle flag • Conflict if nack received from a logically earlier transaction with possible cycle flag set

  25. Evaluation • Target System • SPARC Solaris 32 Processors 1Ghz • L1: 16KB 4-way split, 1 cycle latency • L2: 4 MB 4-way unified, 12-cycle latency • Memory: 4GB 80-cycle latency • Directory: Full-bit vector sharer list, migratory sharing optimization, directory cache, 6-cycle latency • Interconnection: Hierarchical switch topology, 14-cycle link latency • Simulated using Simics • LogTM interface added by “magic” instructions

  26. Microbenchmark • Shared counter micro-benchmark • Compared to • Exponential Backoff • MCS locks • LogTM outperforms them • LogTM does not abort transactions

  27. SPLASH • Evaluated using a subset of SPLASH-2 • Used two versions of raytrace (with/without false sharing) • False sharing has significant impact! • Performance gains from moderate to large

  28. Benchmark Analysis • LogTM must read a block before writing it to the log • Benchmarks showed that data is usually read anyway • LogTM is more sensitive to false sharing than lock approaches • Since the log is required to be valid only until an abort • A k-block log write buffer reduces most writes as shown in the benchmarks.

  29. Related Work • TCC • Lazy version management (slow commits) • Lazy conflict detection (detect on commit) • LTM • On overflow stores new values in uncacheable in-memory hash table • LogTM allows both old and new versions cached

  30. Related Work • UTM • Logs blocks targeted by both loads and stores • More complete conflict detection • Must walk log on certain coherence requests • VTM • Per address space virtual mode for cache evictions, paging, context switches • Virtualized VTM uses micro-code for conflict detection. (LogTM uses MOESI extension)

  31. Conclusion • Presents a TM implementation designed to speed up the common case • Efficiently handles cache evictions • Requires simple architectural changes • Registers, state, directory extension • Work towards hybrid conflict detection • No paging or context switch support • Very sensitive to false sharing

More Related