1 / 34

TreadMarks : Distributed Shared Memory on Standard Workstations and Operating Systems

TreadMarks : Distributed Shared Memory on Standard Workstations and Operating Systems. Jeongho Kim, Youngseok Son M.S. Student. Real-Time Operating Systems Laboratory Seoul National University. Introduction Techniques I mplementation: TreadMarks Evaluation.

aliza
Download Presentation

TreadMarks : Distributed Shared Memory on Standard Workstations and Operating Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Jeongho Kim, Youngseok Son M.S. Student. Real-Time Operating Systems Laboratory Seoul National University

  2. Introduction Techniques Implementation: TreadMarks Evaluation TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Contents

  3. The distributed shared memory (DSM) implements the shared memory model in distributed systems, which have no physical shared memory The shared memory model provides a virtual address space shared between all processors The high cost of communication in distributed systems is that DSM systems move data between processors I. Introduction Distributed Shared Memory Processor 1 Processor 2 Processor N Mem 1 Mem 2 Mem N network Shared memory

  4. OS dependency • Kernel modifications Poor performance • Communication overhead • False sharing I. Introduction Problems of Existing DSM

  5. Communication overhead • If the communication occurs whenever the x variable is changed, it costs a lot of overhead. • Solution) The communication occurs, when the x variable is changed finally I. Introduction Problems of Existing DSM w(x) w(x) w(x) t Process A page page page t Process B w(x) w(x) w(x) t Process A page t Process B

  6. False sharing • A situation in which two or more processes access different variables within a page. • If only one process is allowed to write to a page at a time, false sharing leads to unnecessary communication, called the “ping-pong” effect. I. Introduction Problems of Existing DSM Process Code using variable a Process Code using variable b ③ Access to read variable b ① Access to write variable a X ④ wait Page Page a b a b ② Transfer the page

  7. User-level implementation Good performance • Reduction of communication overhead • Lazy release consistency • Invalidate-based protocol • Solving false sharing • multiple writer protocol I. Introduction Objectives

  8. Introduction Techniques Implementation: TreadMarks Evaluation TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Contents

  9. Lazy release consistency Multiple-writer protocol II. Related Works Related Works

  10. Release Consistency (RC) is a memory consistency model that permits a node to delay making its changes of shared data visible to other nodes until certain synchronization accesses occur Shared memory accesses • Ordinary access • Synchronization access • acquire access • release access Example II. Related Works 1. Lazy Release Consistency acq(L1) w(x)1 w(y)1 rel(L1) t Process L1: lock related to x, y

  11. Two types of RC • Eager release consistency • Lazy release consistency II. Related Works 1. Lazy Release Consistency

  12. r(x)0 r(x)0 r(z)0 r(x)1 Eager release consistency (ERC) postpones sending the modifications to the next release. • Understanding ERC • Release operation does not complete (is not performed) until the acknowledgements from all the processes are received. • L1 is lock related to x, y and L2 is lock related to z II. Related Works 1. Lazy Release Consistency apply changes apply changes r(y)1 r(z)1 acq(L1) t Process A x=1y=1 z=1 apply changes acq(L1) w(x)1 w(y)1 r(z)1 t rel(L1) Process B x=1y=1 z=1 w(z)1 acq(L2) t Process C rel(L2) apply changes

  13. r(x)0 r(x)0 r(x)1 r(z)0 Eager release consistency (ERC) postpones sending the modifications to the next release. • Understanding ERC • Release operation does not complete (is not performed) until the acknowledgements from all the processes are received. • L1 is lock related to x, y and L2 is lock related to z II. Related Works 1. Lazy Release Consistency apply changes apply changes r(y)1 r(z)1 acq(L1) t Process A x=1y=1 z=1 apply changes acq(L1) w(x)1 w(y)1 r(z)1 t rel(L1) Process B x=1y=1 z=1 w(z)1 acq(L2) t Process C rel(L2) apply changes

  14. r(x)0 r(x)0 r(x)0 r(x)0 w(x)1 r(x)0 r(x)0 r(x)0 r(x)1 acq(L1) acq(L2) rel(L1) t Lazy release consistency (LRC) postpones sending of modifications until a remote processoractually needs them • Understanding ERC • It is guaranteed that the acquirer of the same lock sees the modification that precede the release in program order. II. Related Works 1. Lazy Release Consistency Process A Process B How did process B know whether variable x has to be updated? 1) Write notice 2) Timestamp Process C

  15. w(x)1 r(x)0 r(x)0 r(x)0 r(x)1 rel(L1) acq(L1) II. Related Works Lazy release consistency (LRC) • Write Notice • The releaser send write notice to all other processes 1. Lazy Release Consistency Process A x=1 write notice Process B

  16. rel(L1) acq(L3) II. Related Works Lazy release consistency (LRC) • Timestamp • Satisfying the happened-before relationship between all operations is enough to satisfy LRC • The ordering is applied to process intervals. • Interval is a segment of time in the execution of a single process. • New interval begins each time a process executes a synchronization operation. 1. Lazy Release Consistency Process 1 2 3

  17. II. Related Works Lazy release consistency (LRC) • Timestamp • It is implemented as a vector timestamp 1. Lazy Release Consistency (2,3,0) (1,0,0) Process 1 rel(L1) (1,1,0) (1,2,0) (1,3,0) Process 2 acq(L1) rel(L1) (1,0,1) (1,3,2) Process 2

  18. Process1 Process2 X Y Multiple-writer protocol is designed to solve false sharing • It is possible that several processes make modifications to different variables at the same page Two techniques • Twin • It is a copied page of original page • It will be compared with a changed page. • Diff • diff is a difference between twin and ‘copyset’ II. Related Works 2. Multiple-Writer Protocol • Variable • Variable • Page

  19. write P twin writable working copy release: diff  Two techniques (cont’d) • Twin • Diff • Note that twinning and diffing not only allows multiple independent writers but also significantly reduces the amount of data sent II. Related Works 2. Multiple-Writer Protocol

  20. Solving communication overhead • Lazy release consistency Solving false sharing • Multiple-writer protocol Implementation • Lazy release consistency + Multiple-writer protocol • Eager release consistency + Multiple-writer protocol II. Related Works Summary

  21. w(y) w(x) acq(L2) rel(L1) t Eager release consistency + multiple-writer protocol II. Related Works Summary twin diff P1 inv(P) page P acq(L1) r(x) r(y) x P2 y inv(P) P3 twin diff rel(L2)

  22. w(y) w(x) acq(L2) rel(L1) t Lazy release consistency + multiple-writer protocol II. Related Works Summary twin diff P1 inv(P) page P acq(L1) r(x) r(y) x P2 y inv(P) P3 twin diff rel(L2)

  23. Introduction Techniques Implementation: TreadMarks Evaluation TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Contents

  24. t Data structures Interval and Diff creation Access misses Garbage collection Etc III. Implementation: TreadMarks Implementation

  25. t Lazy release consistency • Write notice • Timestamp Multiple-writer protocol • Diff III. Implementation: TreadMarks 1) Data Structure (1/2)

  26. t III. Implementation: TreadMarks 1) Data Structure (2/2) 1 2 3 4 ... Page Array Page State Copyset Write Notice Records 1 2 3 4 ... Write Notice Record 1 Write Notice Record 2 Write Notice Record 3 If there is a write notice record in a page, happen before relationship should be considered through the vector time stamp Diff value is managed by diff pool for efficient memory management Diff Pool Vector Timestamp Process Array 1 2 3 4 ... Processor Array

  27. t Interval craetion • logically • a new interval begins at each release and acquire • In practice • Interval creation can be postponed until we communicate with another process, avoiding overhead if a lock is reacquired by the same processor Diff creation • With lazy diff creation, these pages remain writable until a diff request or a write notice arrives for that page • a subsequent write results in a write notice for the next interval III. Implementation: TreadMarks 2) Interval and Diff creation

  28. t When access miss occurs • without write notice • Initially setup that processor 0 has the page • with write notice • Get the diffs from the write notice with small timestamp • Create an actual diff which is correction of all diff related to the page • The twin is discarded and the result is copied to copyset III. Implementation: TreadMarks 3) Access Misses

  29. t Lock & barrier • Statically assigned manager Garbage collection • It reclaim the space used by write notice records, interval records, and diffs • It is triggered when the free space drops below a threshold Unix Aspects • TreadMarks relies on Unix standard libraries • Remote process creation, interprocessor communication, and memory management. III. Implementation: TreadMarks 4) Etc.

  30. Introduction Techniques Implementation: TreadMarks Evaluation TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems Contents

  31. Experimental Environment • 8 DECstation-5000/240 • Connected to a 100-Mbps ATM LAN and a 10-Mbps Ethernet Applications • Water – molecular dynamics simulation , 343 molecules for 5 steps • Jacobi – Successive Over-Relaxation with a grid of 2000 by 1000 elements • TSP – branch & bound algorithm to solve the traveling salesman problem for a 19 cities • Quicksort – sorts an array of 256K integers. Using bubblesort to sort subarray of less than 1K element • ILINK – genetic linkage analysis lV. Evaluation Experimental Environment

  32. Speedup lV.Evaluation Results • Message rate • messages / sec

  33. Diff creation rate • diffs / rate lV. Evaluation Results

  34. End of Doc.

More Related