1 / 31

ECE 1747: Parallel Programming

ECE 1747: Parallel Programming. Distributed Shared Memory (DSM). Multiprocessor (SMP). proc1. proc3. proc2. X=0. X=0. X=0. X=0. Consistency Models. Sequential Consistency All processors observe the same order Must correspond to some serial order

inga
Download Presentation

ECE 1747: Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 1747: Parallel Programming Distributed Shared Memory (DSM)

  2. Multiprocessor (SMP) proc1 proc3 proc2 X=0 X=0 X=0 X=0

  3. Consistency Models • Sequential Consistency • All processors observe the same order • Must correspond to some serial order • Only ordering constraint is that reads/writes of P1 appear in the same order, but no restrictions on relative ordering between processors.

  4. Common consistency protocols • Write update • Multicast update to all replicas • Write invalidate • Invalidate cached copies in p2, p3 • Cache miss if p2/p3 access X • Valid data from other cache

  5. Distributed Shared Memory (DSM) shared memory network mem0 mem1 mem2 memN ... proc0 proc1 proc2 procN

  6. DSM programming • Standard – pthread-like • synchronizations • Barriers • Locks • Semaphores

  7. Sequential SOR for some number of timesteps/iterations { for (i=0; i<n; i++ ) for( j=1, j<n, j++ ) temp[i][j] = 0.25 * ( grid[i-1][j] + grid[i+1][j] grid[i][j-1] + grid[i][j+1] ); for( i=0; i<n; i++ ) for( j=1; j<n; j++ ) grid[i][j] = temp[i][j]; }

  8. Parallel SOR with Barriers (1 of 2) void* sor (void* arg) { int slice = (int)arg; int from = (slice * (n-1))/p + 1; int to = ((slice+1) * (n-1))/p + 1; for some number of iterations { … } }

  9. Parallel SOR with Barriers (2 of 2) for (i=from; i<to; i++) for (j=1; j<n; j++) temp[i][j] = 0.25 * (grid[i-1][j] + grid[i+1][j] + grid[i][j-1] + grid[i][j+1]); barrier(); for (i=from; i<to; i++) for (j=1; j<n; j++) grid[i][j]=temp[i][j]; barrier();

  10. Sequential Consistency DSM • As proposed by Li & Hudak, TOCS ‘86. • Use virtual memory to implement sharing. • Shared memory divided up by virtual memory pages. • Use an SMP-like coherence protocol. • Keep pages in one of three states: • invalid, read-only, read-write

  11. SC implementation • Synchronous read/write • Writes must be propagated before moving on to the next operation

  12. Read-Write False Sharing x y

  13. Read-Write False Sharing (Cont.) w(x) w(x) w(x) r(x) r(y) r(y)

  14. Read-Write False Sharing (Cont.) w(x) w(x) w(x) r(x) r(y) r(y) synch

  15. Weak Consistency (WEAKC) • Data modifications are only propagated at the time of synchronization. • Works fine if program is properly synchronized through system primitives. • All programs should be …

  16. Read-Write False Sharing (Before) w(x) w(x) w(x) r(x) r(y) r(y) synch

  17. Read-Write False Sharing (WEAKC) w(x) w(x) r(y) r(y) r(x) synch

  18. Write-Write False Sharing x y

  19. Write-Write False Sharing w(x) w(x) w(x) r(x) w(y) w(y) synch

  20. Write-Write False Sharing (WEAKC) w(x) w(x) w(x) r(x) w(y) w(y) synch

  21. Multiple Writer (MW) Protocols • Allows multiple writers per page. • Modifications merged at synchronization (according to weakc definition). • Modifications are recorded through a mechanism called twinning and diffing.

  22. Write-Write False Sharing and MW w(x) w(x) w(x) w(y) r(x) w(y) synch

  23. Creating a diff (delta) Diff (delta) twin w(x) ... w(x) write- protected write- protected writable

  24. Write-Write False Sharing and MW x synch twin w(x) w(x) w(x) x w(y) r(x) w(y) x twin y y

  25. Release Consistency (RC) • Distinguish acquires from releases • Ordinary read/write wait until the previous acquire is performed • Release waits until previous read/write are performed • Acquire/release are sequentially consistent w.r.t. one another

  26. Eager & Lazy Release Consistency • Eager release consistency: transfer consistency information at release of a lock. • Lazy release consistency: transfer consistency information at acquire of a lock.

  27. Eager Release Consistency w(x) rel p1 acq w(x) rel p2 Acq w(x) rel p3 acq r(x) p4

  28. Lazy Release Consistency w(x) rel p1 acq w(x) rel p2 Acq w(x) rel p3 acq r(x) p4

  29. Lazy Release Consistency • Acquiring processor determines witch modifications it needs to see. w(x) rel p1 acq w(y) rel p2 acq r(x) r(y) p3 synch

  30. Vector Timestamps 1 0 0 0 0 0 w(x) rel p1 1 1 0 acq w(y) rel 0 0 0 p2 acq r(x) r(y) p3 0 0 0

  31. DSM Summary • Relaxed consistency • application’s definition of correctness • >70% performance of corresponding message passing applications

More Related