1 / 50

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Explore simulation of shared memory over message passing, memory consistency, linearizability, and sequential consistency in distributed systems. Learn about operation invocations, response handling, and specification of shared memory systems.

aliciar
Download Presentation

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Set 16: Distributed Shared Memory CSCE 668DISTRIBUTED ALGORITHMS AND SYSTEMS Fall 2011 Prof. Jennifer Welch

  2. Distributed Shared Memory • A model for inter-process communication • Provides illusion of shared variables on top of message passing • Shared memory is often considered a more convenient programming platform than message passing • Formally, give a simulation of the shared memory model on top of the message passing model • We'll consider the special case of • no failures • only read/write variables to be simulated Set 16: Distributed Shared Memory

  3. The Simulation users of read/write shared memory read/write return/ack read/write return/ack Shared Memory … alg0 algn-1 send recv send recv Message Passing System Set 16: Distributed Shared Memory

  4. Shared Memory Issues • A process invokes a shared memory operation (read or write) at some time • The simulation algorithm running on the same node executes some code, possibly involving exchanges of messages • Eventually the simulation algorithm informs the process of the result of the shared memory operation. • So shared memory operations are not instantaneous! • Operations (invoked by different processes) can overlap • What values should be returned by operations that overlap other operations? • defined by a memory consistency condition Set 16: Distributed Shared Memory

  5. Sequential Specifications • Each shared object has a sequential specification: specifies behavior of object in the absence of concurrency. • Object supports operations • invocations • matching responses • Set of sequences of operations that are legal Set 16: Distributed Shared Memory

  6. Sequential Spec for R/W Registers • Each operation has two parts, invocation and response • Read operation has invocation readi(X) and response returni(X,v) (subscript i indicates proc.) • Write operation has invocation writei(X,v) and response acki(X) (subscript i indicates proc.) • A sequence of operations is legal iff each read returns the value of the latest preceding write. • Ex: [write0(X,3) ack0(X)] [read1(X) return1(X,3)] Set 16: Distributed Shared Memory

  7. Memory Consistency Conditions • Consistency conditions tie together the sequential specification with what happens in the presence of concurrency. • We will study two well-known conditions: • linearizability • sequential consistency • We will only consider read/write registers, in the absence of failures. Set 16: Distributed Shared Memory

  8. Definition of Linearizability • Suppose  is a sequence of invocations and responses for a set of operations. • an invocation is not necessarily immediately followed by its matching response, can have concurrent, overlapping ops •  is linearizableif there exists a permutation  of all the operations in  (now each invocation is immediately followed by its matching response) s.t. • |X is legal (satisfies sequential spec) for all vars X, and • if response of operation O1 occurs in  before invocation of operation O2, then O1 occurs in  before O2 ( respects real-time order of non-overlapping operations in ). Set 16: Distributed Shared Memory

  9. read(X) write(X,1) read(Y) write(Y,1) return(X,1) ack(X) ack(Y) return(Y,1) Linearizability Examples Suppose there are two shared variables, X and Y, both initially 0 p0 1 3 0 p1 2 4 Is this sequence linearizable? Yes - brown triangles. What if p1's read returns 0? No - see arrow. Set 16: Distributed Shared Memory

  10. Definition of Sequential Consistency • Suppose  is a sequence of invocations and responses for some set of operations. •  is sequentially consistentif there exists a permutation  of all the operations in  s.t. • |X is legal (satisfies sequential spec) for all vars X, and • if response of operation O1 occurs in  before invocation of operation O2 at the same process, then O1 occurs in  before O2 ( respects real-time order of operations by the same process in ). Set 16: Distributed Shared Memory

  11. read(X) write(X,1) read(Y) write(Y,1) return(X,0) ack(X) ack(Y) return(Y,1) Sequential Consistency Examples Suppose there are two shared variables, X and Y, both initially 0 0 p0 3 4 p1 1 2 Is this sequence sequentially consistent? Yes - brown numbers. What if p0's read returns 0? No - see arrows. Set 16: Distributed Shared Memory

  12. Specification of Linearizable Shared Memory Communication System • Inputs are invocations on the shared objects • Outputs are responses from the shared objects • A sequence  is in the allowable set iff • Correct Interaction: each proc. alternates invocations and matching responses • Liveness:each invocation has a matching response • Linearizability: is linearizable Set 16: Distributed Shared Memory

  13. Specification of Sequentially Consistent Shared Memory Commun. System • Inputs are invocations on the shared objects • Outputs are responses from the shared objects • A sequence  is in the allowable set iff • Correct Interaction: each proc. alternates invocations and matching responses • Liveness:each invocation has a matching response • Sequential Consistency: is sequentially consistent Set 16: Distributed Shared Memory

  14. Algorithm to Implement Linearizable Shared Memory • Uses totally ordered broadcast as the underlying communication system. • Each proc keeps a replica for each shared variable • When read request arrives: • send bcast msg containing request • when own bcast msg arrives, return value in local replica • When write request arrives: • send bcast msg containing request • upon receipt, each proc updates its replica's value • when own bcast msg arrives, respond with ack Set 16: Distributed Shared Memory

  15. The Simulation users of read/write shared memory read/write return/ack read/write return/ack Shared Memory … alg0 algn-1 to-bc-send to-bc-recv to-bc-send to-bc-recv Totally Ordered Broadcast Set 16: Distributed Shared Memory

  16. Correctness of Linearizability Algorithm • Consider any admissible execution  of the algorithm in which • underlying totally ordered broadcast behaves properly • users interact properly (alternate invocations and responses • Show that , the restriction of  to the events of the top interface, satisfies Liveness and Linearizability. Set 16: Distributed Shared Memory

  17. Correctness of Linearizability Algorithm • Liveness (every invocation has a response): By Liveness property of the underlying totally ordered broadcast. • Linearizability: Define the permutation  of the operations to be the order in which the corresponding broadcasts are received. •  is legal: because all the operations are consistently ordered by the TO bcast. •  respects real-time order of operations: if O1 finishes before O2 begins, O1's bcast is ordered before O2's bcast. Set 16: Distributed Shared Memory

  18. Why is Read Bcast Needed? • The bcast done for a read causes no changes to any replicas, just delays the response to the read. • Why is it needed? • Let's see what happens if we remove it. Set 16: Distributed Shared Memory

  19. Why Read Bcast is Needed read return(1) p0 write(1) p1 to-bc-send p2 read return(0) Not linearizable! Set 16: Distributed Shared Memory

  20. Algorithm for Sequential Consistency • The linearizability algorithm, without doing a bcast for reads: • Uses totally ordered broadcast as the underlying communication system. • Each proc keeps a replica for each shared variable • When read request arrives: • immediately return the value stored in the local replica • When write request arrives: • send bcast msg containing request • upon receipt, each proc updates its replica's value • when own bcast msg arrives, respond with ack Set 16: Distributed Shared Memory

  21. Correctness of SC Algorithm Lemma (9.3): The local copies at each proc. take on all the values appearing in write operations, in the same order, which preserves the order of non-overlapping writes • implies per-process order of writes is preserved Lemma (9.4): If pi writes Y and later reads X, then pi's update of its local copy of Y (on behalf of that write) precedes its read of its local copy of X (on behalf of that read). Set 16: Distributed Shared Memory

  22. Correctness of the SC Algorithm (Theorem 9.5) Why does SC hold? • Given any admissible execution , must come up with a permutation  of the shared memory operations that is • legal and • respects per-proc. ordering of operations Set 16: Distributed Shared Memory

  23. The Permutation  • Insert all writes into  in their to-bcast order. • Consider each read R in  in the order of invocation: • suppose R is a read by piof X • place R in  immediately after the later of • the operation by pi that immediately precedes R in , and • the write that R "read from" (caused the latest update of pi's local copy of X preceding the response for R) Set 16: Distributed Shared Memory

  24. Permutation Example 4 read return(2) p0 write(2) 3 ack p1 to-bc-send to-bc-send p2 write(1) ack read return(1) 1 2 permutation is given by brown numbers Set 16: Distributed Shared Memory

  25. Permutation  Respects Per Proc. Ordering For a specific proc: • Relative ordering of two writes is preserved by Lemma 9.3 • Relative ordering of two reads is preserved by the construction of  • If write W precedes read R in exec. , then W precedes R in  by construction • Suppose read R precedes write W in . Show same is true in . Set 16: Distributed Shared Memory

  26. Permutation  Respects Ordering • Suppose in contradiction R and W are swapped in : • There is a read R' by pi that equals or precedes R in  • There is a write W' that equals W or follows W in the to-bcast order • And R' "reads from" W'. R W R' |pi : …W … W' … R' … R …  : • But: • R' finishes before W starts in  and • updates are done to local replicas in to-bcast order (Lemma 9.3) so update for W' does not precede update for W • so R' cannot read from W'. Set 16: Distributed Shared Memory

  27. …W … W' … R …  : Permutation  is Legal • Consider some read R of X by pi and some write W s.t. R reads from W in . • Suppose in contradiction, some other write W' to X falls between W and R in : • Why does R follow W' in ? Set 16: Distributed Shared Memory

  28. Permutation  is Legal Case 1:W' is also by pi. Then R follows W' in  because R follows W' in . • Update for W at pi precedes update for W' at pi in  (Lemma 9.3). • Thus R does not read from W, contradiction. Set 16: Distributed Shared Memory

  29. …W … W' … O … R …  : Permutation  is Legal Case 2:W' is not by pi. Then R follows W' in  due to some operation O, also by pi , s.t. • O precedes R in , and • O is placed between W' and R in  Consider the earliest such O. • Case 2.1:O is a write (not necessarily to X). • update for W' at pi precedes update for O at pi in  (Lemma 9.3) • update for O at pi precedes pi's local read for R in  (Lemma 9.4) • So R does not read from W, contradiction. Set 16: Distributed Shared Memory

  30. Permutation  is Legal …W … W' … O … R …  : • Case 2.2:O is a read. • By construction of , O must read X and in fact read from W' (otherwise O would not be after W') • Update for W at piprecedes update for W' at pi in  (Lemma 9.3). • Update for W' at pi precedes local read for O at piin  (otherwise O would not read from W'). • Thus R cannot read from W, contradiction. Set 16: Distributed Shared Memory

  31. Performance of SC Algorithm • Read operations are implemented "locally", without requiring any inter-process communication. • Thus reads can be viewed as "fast": time between invocation and response is only that needed for some local computation. • Time for a write is time for delivery of one totally ordered broadcast (depends on how to-bcast is implemented). Set 16: Distributed Shared Memory

  32. Alternative SC Algorithm • It is possible to have an algorithm that implements sequentially consistent shared memory on top of totally ordered broadcast that has reverse performance: • writes are local/fast (even though bcasts are sent, don't wait for them to be received) • reads can require waiting for some bcasts to be received • Like the previous SC algorithm, this one does not implement linearizable shared memory. Set 16: Distributed Shared Memory

  33. Time Complexity for DSM Algorithms • One complexity measure of interest for DSM algorithms is how long it takes for operations to complete. • The linearizability algorithm required D time for both reads and writes, where D is the maximum time for a totally-ordered broadcast message to be received. • The sequential consistency algorithm required D time for writes and 0 time for reads, since we are assuming time for local computation is negligible. • Can we do better? To answer this question, we need some kind of timing model. Set 16: Distributed Shared Memory

  34. Timing Model • Assume the underlying communication system is the point-to-point message passing system (not totally ordered broadcast). • Assume that every message has delay in the range [d-u,d]. • Claim:Totally ordered broadcast can be implemented in this model so that D, the maximum time for delivery, is O(d). Set 16: Distributed Shared Memory

  35. Time and Clocks in Layered Model • Timed execution: associate an occurrence time with each node input event. • Times of other events are "inherited" from time of triggering node input • recall assumption that local processing time is negligible. • Model hardware clocks as before: run at same rate as real time, but not synchronized • Notions of view, timed view, shifting are same: • Shifting Lemma still holds (relates h/w clocks and msg delays between original and shifted execs) Set 16: Distributed Shared Memory

  36. Lower Bound for SC Let Tread = worst-case time for a read to complete Let Twrite = worst-case time for a write to complete Theorem (9.7): In any simulation of sequentially consistent shared memory on top of point-to-point message passing, Tread + Twrited. Set 16: Distributed Shared Memory

  37. SC Lower Bound Proof • Consider any SC simulation with Tread + Twrite < d. • Let X and Y be two shared variables, both initially 0. • Let 0 be admissible execution whose top layer behavior is write0(X,1) ack0(X) read0(Y) return0(Y,0) • write begins at time 0, read ends before time d • every msg has delay d • Why does 0 exist? • The alg. must respond correctly to any sequence of invocations. • Suppose user at p0 wants to do a write, immediately followed by a read. • By SC, read must return 0. • By assumption, total elapsed time is less than d. Set 16: Distributed Shared Memory

  38. write(X,1) read(Y,0) p0 0 p1 SC Lower Bound Proof time 0 d Set 16: Distributed Shared Memory

  39. SC Lower Bound Proof • Similarly, let 1 be admissible execution whose top layer behavior is write1(Y,1) ack1(Y) read1(X) return1(X,0) • write begins at time 0, read ends before time d • every msg has delay d • 1 exists for similar reason. Set 16: Distributed Shared Memory

  40. write(X,1) read(Y,0) p0 0 p1 p0 1 p1 write(Y,1) read(X,0) SC Lower Bound Proof time 0 d Set 16: Distributed Shared Memory

  41. SC Lower Bound Proof • Now merge p0's timed view in 0 with p1's timed view in 1 to create admissible execution '. • But ' is not SC, contradiction! Set 16: Distributed Shared Memory

  42. write(X,1) read(Y,0) p0 0 p1 p0 1 p1 write(Y,1) read(X,0) write(X,1) read(Y,0) p0 ' p1 write(Y,1) read(X,0) SC Lower Bound Proof time 0 d not SC - contradiction! Set 16: Distributed Shared Memory

  43. Linearizability Write Lower Bound Theorem (9.8): In any simulation of linearizable shared memory on top of point-to-point message passing, Twrite ≥ u/2. Proof: Consider any linearizable simulation with Twrite < u/2. • Let be an admissible exec. whose top layer behavior is: p1 writes 1 to X, p2 writes 2 to X, p0 reads 2 from X • Shift to create admissible exec. in which p1 and p2's writes are swapped, causing p0's read to violate linearizability. Set 16: Distributed Shared Memory

  44. u time: 0 u/2 read 2 p0 write 1  : p1 write 2 p2 p0 d - u/2 d - u/2 d - u/2 delay pattern p1 d - u/2 d d - u p2 Linearizability Write Lower Bound linearizable admissible Set 16: Distributed Shared Memory

  45. p0 d d - u d delay pattern p1 d - u d- u d p2 Linearizability Write Lower Bound u time: 0 u/2 read 2 p0 not linearizable write 1 shift p1 by u/2 p1 shift p2 by -u/2 write 2 p2 contradiction! admissible Set 16: Distributed Shared Memory

  46. Linearizability Read Lower Bound • Approach is similar to the write lower bound. • Assume in contradiction there is an algorithm with Tread < u/4. • Identify a particular execution: • fix a pattern of read and write invocations, occurring at particular times • fix the pattern of message delays • Shift this execution to get one that is • still admissible • but not linearizable Set 16: Distributed Shared Memory

  47. Linearizability Read Lower Bound Original execution: • p1reads X and gets 0 (old value). • Then p0 starts writing 1 to X. • When write is done, p2 reads X and gets 1 (new value). • Also, during the write, p1and p2 alternate reading X. • At some point, the reads stop getting the old value (0) and start getting the new value (1) Set 16: Distributed Shared Memory

  48. u/2 read 1 read 1 read 0 read 1 p2 read 0 read 0 read 1 read 1 p1 write 1 p0 Linearizability Read Lower Bound Set 16: Distributed Shared Memory

  49. Linearizability Read Lower Bound • Set all delays in this execution to be d - u/2. • Now shift p2 earlier by u/2. • Verify that result is still admissible (every delay either stays the same or becomes d or d - u). • But in shifted execution, sequence of values read is 0, 0, …, 0, 1, 0, 1, 1, …, 1 not linearizable! Set 16: Distributed Shared Memory

  50. u/2 read 1 read 1 read 0 read 1 read 1 p2 read 0 read 0 read 0 read 1 read 1 p1 write 1 p0 read 1 read 1 read 0 p2 read 0 read 1 read 1 p1 write 1 p0 Linearizability Read Lower Bound Set 16: Distributed Shared Memory

More Related