1 / 37

Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems

Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems. P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix Conference 1994 2008-22952 Jun Lee. DSM (distributed shared memory). A software system for parallel computation

alton
Download Presentation

Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Treadmarks: Distributed Shared Memory on Standard Workstations and Operating Systems P. Keleher, A. Cox, S. Dwarkadas, and W. Zwaenepoel The Winter Usenix Conference 1994 2008-22952 Jun Lee

  2. DSM (distributed shared memory) • A software system for parallel computation • Shares distributed memories • Easier programming • Provide a single global address space

  3. DSM (distributed shared memory) • No widely available DSM implementations • In-house research platforms • Kernel modifications • Poor performance • Imitating consistency protocols of hardware • False sharing

  4. Treadmarks • Objectives • Commercially available workstations and OS • Standard Unix system on DECstation • Efficient user-level DSM implementation • Reduce communication overhead • Design • LRC (lazy release consistency) • Multiple writer protocol • Lazy diff creation

  5. Consistency protocol (SC) • Sequential Consistency • Every write visible “immediately” • Single writer P0 P1 R(a):0 W(a):1 R(a):1

  6. Consistency protocol (SC) • Sequential Consistency • Every write visible “immediately” • Single writer P0 P1 R(a):0 W(a):1 Big problem with pagesize granularity R(a):? R(a):1

  7. Consistency protocol (SC) • Sequential Consistency • Every write visible “immediately” • Single writer P0 P1 Page X Page X a b c a b c d W(x0):a W(x1):b W(x2):c W(x3):d False sharing

  8. Consistency protocol (RC) • Release Consistency • Relaxed memory consistency model • delay making its changes visible to other processors until certain synchronization accesses occurs • Synchronization points • Acquire(), Release() (similar to locks, barriers) • Two types • ERC (eager), LRC (lazy)

  9. Consistency protocol (RC) • Release Consistency • Acquire() and release() are sequentially consistent • Release() is performed after all previous operations have completed • Operations are performed after previous acquire() have been performed • Acquire() and release() pair between conflicting accesses • SC and RC produce the same results.

  10. Consistency protocol (RC) • ERC • Write information is delivered at the release point P0 P1 Acquire(L) W(a):1 Release(L) Write Notice R(a):0 Acquire(L) R(a):? R(a):1 Release(L)

  11. Consistency protocol (RC) • ERC • Write information is delivered at the release point P0 P1 Acquire(L) W(a):1 Acquire(K) Release(L) R(a):0 Release(K) Acquire(L) R(a):1 Release(L)

  12. Consistency protocol (RC) • LRC • The delivery is postponed until the acquire • Fewer messages than ERC P0 P1 Acquire(L) W(a):1 R(a):0 Release(L) Acquire(L) R(a):? R(a):1 Release(L)

  13. Consistency protocol (RC) • ERC vs. LRC ERC P0 P1 P2 P3 Acquire(L) R(a):0 R(a):0 R(a):0 W(a):1 Release(L) Acquire(L) R(a):1 Release(L)

  14. Consistency protocol (RC) • ERC vs. LRC ERC LRC P0 P0 P1 P1 P3 P2 P2 P3 Acquire(L) Acquire(L) R(a):0 R(a):0 R(a):0 R(a):0 R(a):0 R(a):0 W(a):1 W(a):1 Release(L) Release(L) Acquire(L) Acquire(L) R(a):1 R(a):1 Release(L) Release(L)

  15. Multiple writer protocol Page X Page X a b c a b c d a c b a c W(x0):a W(x0):a W(x1):b W(x1):b P0 P1 W(x2):c Acquire(L) P0 P1 W(x3):d Page X W(x2):c Page X Release(L) Acquire(L) W(x3):? False sharing

  16. Multiple writer protocol Page X Page X a b c a b c d a c b d a c W(x0):a W(x0):a W(x1):b W(x1):b P0 P1 W(x2):c Acquire(L) P0 P1 W(x3):d Page X W(x2):c Page X False sharing Release(L) Acquire(L) W(x3):? W(x3):d Release(L)

  17. Twin and Diff a c b d a c W(x0):a W(x1):b Diff Acquire(L) P0 P1 Page X W(x2):c Twin X Twin X Page X Release(L) Acquire(L) a c diff W(x3):? W(x3):d Release(L)

  18. Twin and Diff a c b d a c W(x0):a W(x1):b Diff Acquire(L) Diff P0 P1 Page X W(x2):c b a c Twin X Page X Twin X Twin X Release(L) Acquire(L) a c diff b diff W(x3):? interval a c diff W(x3):d Release(L)

  19. Implementation

  20. Etc. • Lock & barrier • Statically assigned manager • Garbage collection • reclaim the space used by write notice records, interval records, and diffs • Triggered when the free space drops below a threshold

  21. Evaluation • Experimental Environment • 8 DECstation-5000/240 • connected to a 100-Mbps ATM LAN and a 10-Mbps Ethernet • Applications • Water – molecular dynamics simulation • Jacobi – Successive Over-Relaxation • TSP – branch & bound algorithm to solve the traveling salesman problem • Quicksort – using bubblesort to sort subarray of less than 1K element • ILINK – genetic linkage analysis

  22. Evaluation Speedup Execution statistics

  23. Evaluation Execution time breakdown

  24. Evaluation Unix overhead breakdown TreadMarks overhead breakdown

  25. Evaluation Execution time breakdown for Water

  26. Evaluation ERC vs. LRC Speedup Message rate

  27. Evaluation ERC vs. LRC Data rate Diff creation rate

  28. Implementation P0 side P1 side pages pages . . . . . . Acq(L) . . . . . . P P . . . . . . time stamp time stamp 0 0 0 0 0 0

  29. Implementation P0 side P1 side twin twin pages pages . . . . . . Acq(L) W.N W.N W(a) . . . . . . a b P P W(b) . . . . . . Rel(L) P0 P1 time stamp time stamp 0 1 0 0 0 0

  30. Implementation P0 side P1 side twin twin pages pages . . . . . . Acq(L) W.N W.N W(a) . . . . . . a b W(b) . . . . . . Rel(L) P1 P0 Acq(L) time stamp time stamp 0 2 0 0 0 0

  31. Implementation P0 side P1 side twin twin pages pages . . . . . . Acq(L) W.N W.N W(a) . . . . . . a b W(b) . . . . . . Rel(L) P0 P0 P1 Acq(L) time stamp time stamp 0 2 0 0 0 0 1 0 0

  32. Implementation P0 side P1 side twin twin pages pages . . . . . . Acq(L) W.N W.N W.N W(a) . . . . . . a b P W(b) . . . . . . Rel(L) P1 P0 P0 b a a Acq(L) time stamp time stamp diff diff 1 2 1 0 0 0 W(c) 1 0 0 P0 P1

  33. Implementation P0 side P1 side twin a b pages pages . . . . . . Acq(L) W.N W.N W.N W.N W(a) . . . . . . a a c P b P W(b) . . . . . . Rel(L) P0 P1 P0 P1 a b a Acq(L) time stamp time stamp diff diff diff 1 2 1 0 0 0 W(c) Rel(L) 1 0 0 P0 P1

  34. Implementation P0 side P1 side twin a b pages pages . . . . . . Acq(L) W.N W.N W.N W.N W(a) . . . . . . a a c b P W(b) . . . . . . Rel(L) P0 P1 P1 P0 a b a Acq(L) time stamp time stamp diff diff diff 1 2 2 0 0 0 W(c) Rel(L) 1 0 0 P0 P1

  35. Implementation P1 side P2 side twin a b pages pages . . . . . . Acq(L) W.N W.N W.N . . . . . . a c b P . . . . . . P1 P1 P0 b a time stamp time stamp diff diff 0 0 0 1 2 0 0 1 1 1 0 0 0 0 0 P0 P1

  36. Implementation P1 side P2 side twin a b pages pages . . . . . . Acq(L) W.N W.N W.N W.N . . . . . . W.N W.N a c b P . . . . . . P0 P1 P1 P1 P1 P0 b a time stamp time stamp diff diff 1 1 1 2 1 0 0 1 1 1 0 0 0 0 0 P0 P1

  37. Thank you ! Any questions?

More Related