1 / 43

Hybrid Transactional Memory

Hybrid Transactional Memory. Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school). Haswell. Transactional Memory [HerlihyMoss93]. Transactional Memory.

lok
Download Presentation

Hybrid Transactional Memory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hybrid Transactional Memory Nir Shavit MIT and Tel-Aviv University Joint work with Alex Matveev (and describing the work of many in this summer school)

  2. Haswell

  3. Transactional Memory[HerlihyMoss93]

  4. Transactional Memory • Memory Transactions are collections of reads and writes executed atomically • Should Provide • Disjoint Access Parallelism • Should maintain internal and external consistency • External (Serializability): with respect to the interleavings of other transactions. • Internal (Opacity): the transaction itself should operate on a consistent state.

  5. External Consistency Transaction A: Read y Write x = 4 Return x+y Transaction B: Read x Write y = 4 Return x+y X 0 0 Y Cannot both return 4 Canonical synchronization problem all STM/HTM implementations must solve Application Memory

  6. V# Locking STMs Map Array of Versioned- Write-Locks Application Memory

  7. V# 1 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V# 0 V#+1 0 V# 0 V# 0 V# 0 V# 0 V# 0 V#+1 0 V# 1 V#+1 0 V#+1 0 V#+1 0 V# 0 V# 0 V# 1 V# 1 V# 0 X Y Commit Time Locking (Write Buff) Mem Locks • To Read/Write: Check unlocked add to Read/Write set • Acquire Locks • Validate read/write v#’s unchanged • Write Values • Release each lock with v#+1 X Y Read/Write Lock Validate Write Unlock

  8. Internal Inconsistency (Opacity)[GuerraouiKapalka07] 4 8 Transaction A: Write x = 4 X Transaction B: Read x Read y 4 Y Compute z = 1/(x-y) Transaction A: Write y = 2 DIV by 0 ERROR!

  9. TL2/TinySTM’s Global Clock [DiceShalevShavit06/ReigelFelberFetzer06] • Have a shared global version clock • Incremented by writing transactions (as infrequently as possible) • Read by all transactions • Used to validate state viewed by transaction is always opaque

  10. 99 0 87 0 50 0 50 0 121 0 V# 0 87 0 34 0 88 0 44 0 V# 0 34 0 99 0 99 0 87 0 50 0 44 0 87 0 34 0 V# 0 50 0 34 1 99 1 87 0 88 0 121 0 121 0 121 0 50 0 TL2 Style STM 120 121 100 100 VClock Mem Locks • Read Vclock • Read/Write: if unlocked and v# less clock add to Read/Write-Set • Acquire Locks • Increment Clock • Validate each v# less than clock • Write values • Release locks with v# = new clock X X Y Y Read Clock Read/Write Lock Inc Validate Write Unlock

  11. TL2 Style STM • Advantages • Great Disjoint Access Parallelism • Disadvantages • Accessing Meta-Data is Expensive • Progress guarantee is only deadlock freedom

  12. NOrec STM [DalessandroSpearScott10] • Use shared global clock as a seqlock • Validation in every read if a seqlock change is detected • Value-based validation: no need for meta-data (local time stamps or locks)

  13. NOrec STM Lock seqlock (set odd) with validation if seqlock changed 100 101 104 103 102 100 104 seqlock Read/Write (with validation if seqlock changed) R/W Set Unlock seqlock (set even) = X X Not odd? seqlock Z Z Z Write Y = Y

  14. NOrec STM • Advantages • No Expensive Meta-Data • Disadvantages • Poor Disjoint Access Parallelism (all writes are serialized by clock) • Progress guarantee is only starvation freedom

  15. Hardware TM [HerlihyMoss93,IBM/Intel13] • Advantages • Everything in Hardware, No Meta Data • Great Disjoint Access Parallelism • Disadvantages • No Progress Guarantee; Fail because of: • Unsupported instructions: system or protected instructions • Exceptions: page faults and similar • Capacity limit: too many accessed locations

  16. Hybrid TM [Moir,Damron et. Al, Kumar et. al] • Fast-Path: Execute Trans Using Best Effort HTM • If it Aborts because of Special Instructions or Transaction Too Large, then… • Slow-Path: Execute Trans Using STM Performance of HTM with progress guarantee of STM

  17. Software Transaction Update locks 0 0 Traditional Hybrid TM [DamronFedorovaLevLuchangcoMoirNussbaum06] Hardware Transaction Test Versioned- Write- Lock in every Read/Write. Update in Write. Versioned- Write-Lock 0 1 Versioned- Write-Lock 0 1

  18. Traditional Hybrid TM • Advantages • Progress Guarantee of STM • Disadvantages • HTM must access meta data • Fast path is actually slow because of extra load and branch on every read

  19. Traditional Hybrid TM

  20. Phased TM [LevMoirNussbaum07] • Two modes: all hardware or all software • Shared globalmode indicator • If some hardware transaction aborts switch to software mode • Eventually mode reverts back to hardware

  21. Phased TM • Advantages • Fast-path Pure HTM: No Meta Data Accesses • Disadvantages • Single Software Transaction Causes all HTM to switch to STM slow path • Not clear how to tune to avoid frequent mode transitions…

  22. Hybrid Norec (1st Attempt) SoftwareNorec: Unlock Seqlock (set even) Lock Seqlock (set odd) Read/Write (with validation) Not odd? seqlock Write Validate Software will fail seqlock validation! Hardware: Write seqlock +2 Not odd? seqlock Read/Write (no validation)

  23. Hybrid Norec (1st Attempt) SoftwareNorec: Lock Seqlock (set odd) Unlock Seqlock (set even) Read/Write (with validation) Not odd? seqlock Validate Write Hardware will fail seqlock validation! Hardware: Write seqlock +2 Not odd? seqlock Read/Write (no validation)

  24. Hybrid Norec (1st Attempt) SoftwareNorec: Guaranteed External Consistency Lock Seqlock (set odd) Unlock Seqlock (set even) Read/Write (with validation) Odd? seqlock Validate Write Hardware will fail seqlock validation! Hardware: Write seqlock +2 Not odd? seqlock Read/Write (no validation)

  25. Hybrid Norec (1st Attempt) SoftwareNorec: Problem: hardware opacity Lock Seqlock (set odd) Unlock Seqlock (set even) Read/Write (with validation) Not odd? seqlock Validate Write Hardware will fail seqlock validation! Hardware: Write seqlock +2 Not odd? seqlock Read/Write (no validation)

  26. Internal Inconsistency (Opacity)[GuerraouiKapalka07] 4 8 Software A: Lock seqlock +1 Write x = 4 X Hardware B: Read x Read y 4 Y Compute z = 1/(x-y) … Odd? Seqlock Write y = 2 Unlock seqlock+1 DIV by 0 ERROR!

  27. Hybrid Norec (2nd Attempt) SoftwareNorec: Guarantee hardware opacity Lock Seqlock (set odd) Unlock Seqlock (set even) Read/Write (with validation) Not odd? seqlock Validate Write Hardware will detect seqlock invalidation! Hardware: Write seqlock +2 Not odd? seqlock Read/Write (no validation)

  28. Hybrid NOrec • Advantages • Fast-path HTM: No Meta Data Accesses • Disadvantages • Limited Disjoint Access Parallelism • Seqlock is in hardware tracking set throughout HTM transaction • Major sequential bottleneck

  29. Possible Solutions • Forget Opacity, Use sandboxing [DalessandroCarougeWhiteLevMoirScottSpear2011] • Hybrid Norec 2 [RiegelMarlierNowackFelberFetzer11]: use non-transactional operations in a hardware transaction to read and validate seqlock has not changed after every read But sandboxing is complex…and non-transactional ops only available in AMD proposal, not actual IBM or Intel …

  30. Reduced Hardware Approach to HyTM [MatveevShavit13] • Use short hardware transactions in the software slow-path • I.e. create new “mixed” software/hardware path • Not in order to make slow-path faster • But rather, in order to remove meta-data accesses from fast path • Default to all software if mixed path fails

  31. Transactional Writes Imply Hardware Opacity 4 8 Trans A: Write x = 4 X Hardware B: Read x Read y 4 Y 2 Compute z = 1/(x-y) Write y = 2 DIV by 0 ERROR! If in a hardware transaction this cannot happen…

  32. Reduced Hardware NOrec [MatveevShavit13] • In Slow-path commit, use a small hardware transaction to: • Write all values • Check seqlock has not changed • Write seqlock+1 • In Fast-path: • Move seqlock test to end, un-instrumented read/writes

  33. Reduced Hardware NOrec SoftwareNorec: Guarantee fast-path opacity without having seqlock in TM tracking set for long In HTM Trans: Write values Changed? seqlock seqlock +1 Lock seqlock (set odd) Lock seqlock (set even) Read/Write (with validation) Changed? seqlock Write Validate Hardware will detect write conflict without seqlock! Hardware: Write seqlock +1 Changed? seqlock Read seqlock Read/Write (no instrumentation)

  34. Reduced Hardware NOrec • Properties • Fast-path: No Meta Data; No instrumentation of reads or writes • Slow-path: • short hardware transaction: size of write set • can repeatedly attempt short hardware transaction in commit

  35. Reduced Hardware NOrec • Advantages • Hardware Disjoint Access Parallelism • seqlock accessed only at end of HTM transaction • Surprise: 1st HyTM that is Obstruction-free and Privatizing • Disadvantages • Still window of possible abort due to seqlock increment

  36. Reduced Hardware NOrec

  37. Reduced Hardware NOrec

  38. Reduced Hardware TL2 Style Hardware Will See Software SoftwareTL2 style: In HTM Trans: Write values Write Read Clock Read/Write (validate) Validate Hardware will detect write conflict Hardware: Read/Write (no validation) Read Clock Write values With Clock +1

  39. Problem: if between validate and hardware write, can have inconsistency Reduced Hardware TL2 Style Solution: combine validation and writes in single transaction SoftwareTL2 style: In HTM Trans: Validate and Write values In HTM Trans: Write values Read Clock Read/Write (validate) Validate Hardware will detect write conflict Hardware: Read/Write (no validation) Read Clock Write values With Clock +1

  40. Reduced Hardware TL2 Style • Advantages • Complete Disjoint Access Parallelism • GV6 clock incremented on aborts only • Obstruction-free • Disadvantages • No privatization • Mixed path transaction size of meta-data set

  41. RH1: Reduced Hardware TL2 Style

  42. RH1: Reduced Hardware TL2 Style

  43. HyTM: Long Journey • Combination of ideas: • hardware transactions, • global clocks, • no meta data access, • mixed hardware software paths • And there is still room for improvement

More Related