520 likes | 664 Views
Distributed Shared Memory CIS825 Project Presentation. Sathish R. Yenna Avinash Ponugoti Rajaravi Kollarapu Yogesh Bharadwaj Sethuraman Subramanian Nagarjuna Nagulapati Manmohan Uttarwar. Distributed Shared Memory. Introduction Consistency models Sequential consistency PRAM consistency
E N D
Distributed Shared MemoryCIS825 Project Presentation Sathish R. Yenna Avinash Ponugoti Rajaravi Kollarapu Yogesh Bharadwaj Sethuraman SubramanianNagarjuna Nagulapati Manmohan Uttarwar Distributed Shared Memory
Distributed Shared Memory Introduction Consistency models Sequential consistency PRAM consistency Release consistency Final System Performance Evaluation Distributed Shared Memory
Introduction What is shared memory? - Memory location or object accessed by two or more different processes running on same machine - A mechanism is to be defined for the access of the shared location, otherwise unpredicted states will result - Many operating systems provide various mechanisms to avoid simultaneous access of the shared memory For example: semaphores, monitors etc.. Distributed Shared Memory
Ex: Consider the Reader/Writer Problem: We have a shared buffer into which writer writes and reader reads form the values from the same. For avoiding writing on existing value and reading the same twice, we need to have a mechanism. We have semaphore/monitors provided by OS to avoid simultaneous access. What if writer is writing from one machine and reader is reading from another machine??? Reader Writer Memory Distributed Shared Memory
What is Distributed shared memory? - Memory accessed by two or more processes running on different machines connected via a communication network Formal Definition: A Distributed Shared Memory System is a pair (P,M) where P is a set of N processors { P1, P2, P3, . . . Pn } and M is a shared memory. Each process Pi sequentially executes read and write operations on data items in M in the order defined by the program running on it. Distributed Shared Memory
DSM improves the performance of the whole system • An abstraction like DSM simplifies the application programming BUT - The main problem is how to keep the memory consistent • We don’t have traditional semaphores or monitors to control the accesses in DSM • We can implement by keeping the memory at a central location and allowing processes at different machines to access • We can only use the message transmission as an aid to control the accesses Distributed Shared Memory
But networks are slow, so for performance improvement, we have to keep various copies of the same variable at various machines • Maintaining perfect consistency (i.e., Any read to a variable x returns the value stored by the most recent write operation to x) of all the copies is hard and results in low performance as the processes are on different machines communicating over a slow network • The solution is to accept less than perfect consistency as the price for better performance • Moreover many application programs don’t require strict consistency Because of all these reasons many consistency models were defined Distributed Shared Memory
Consistency Models A consistency model is essentially a contract between the software and the memory. If the software agrees to obey certain rules, the memory promises to work correctly. In our project we are implementing three of them: - Sequential consistency - PRAM consistency - Release consistency Distributed Shared Memory
Sequential Consistency A system is sequentially consistent if the result of the any execution is the same as if - the operations of all the processors were executed in some sequential order, and - the operations of each individual processor appear in this sequence in the order specified by its program. Distributed Shared Memory
- When processes run in parallel on different machines, any valid interleaving is acceptable behavior, but all processes must see the same sequence of memory references. - Note that nothing is said about time; that is no reference to the “most recent” store. - It merely guarantees that all processes see all memory references in the same order. - Two possible results of the same program P1: W(x)1 P1: W(x)1 --------------------------------- ------------------------------------ P2: R(x)0 R(x)1 P2: R(x)1 R(x)1 Distributed Shared Memory
Implementation Brown’s Algorithm: Each Process has a queue INi of invalidation requests W( x) v: Perform all invalidations in IN queue. Update the main memory and cache. Place invalidation requests in IN queue of each process. R( x): If x is in cache then read it from cache Else Perform all invalidations in INi Read from the main memory Distributed Shared Memory
Problems with Brown’s Implementation • All the three operations in W( x) v i.e., updating cache, main memory and broadcasting invalid message should be done atomically. • For ensuring the above atomicity, we will have to use robust mechanism involving an agreement by all the processes. There is lot of communication overhead involved in ensuring this atomicity. • For a single write, we have N invalid messages being transmitted, where ‘N’ is the number of processes. Distributed Shared Memory
Sequentially Consistent DSM Protocol- J. Zhou, M. Mizuno, and G. Singh DSM System consists shared memory module (SMem Manager) and local manager (Processor Manager) at each machine. Each Processor manager: - handles requests to read or write objects from the user processes - communicates with SMem manager. SMem manager: - processes request messages from processor managers to read or write objects. Distributed Shared Memory
Protocol Description SMem manages the following data structures: - Object memory : M[Object Range] - Twodimensional binary array : Hold_Last_Write[Processor Range, Object Range] At any time T , - Hold_Last_Write[i; x]=1 : object x in the cache at processor i holds a value written by the last- write with respect to T , - Hold_Last_Write[i; x]=0 : object x in the cache at processor i does not hold a value written by the last-write with respect to T. Each element of Hold_Last_Write is initialized to 0. Let us say ‘n’ processors and ‘m’ objects. Distributed Shared Memory
Each processor i maintains the following data structures: Onedimensional binary array : Validi [Object Range] : -- Validi [x] = 1 : object x in the cache is valid -- Validi [x] = 0 : object x in the cache is not valid Each element of Validi is initialized to 0. For each object x such that Validi [x] = 1, Ci [x] (Ci [x] is cache memory to hold value of object x) Distributed Shared Memory
Operations at processor i Write(x; v):: send[write,x; v] to SMem; receive[Invalid_array[1..m]] message from SMem; Validi [1..m] := Invalid_array[1..m]; //elementwise // assignments Ci [x] := v; Read(x):: if Validi [x] = 0 then send[read,x] message to SMem; receive [v, Invalid_array[1..m]] from SMem; Validi [1..m] := Invalid_array[1..m]; Ci [x] := v; endif return Ci [x]; Distributed Shared Memory
Operations at SMem: Process [write,x; v] message from processor i:: M [x] := v; Hold_Last_Write[1..n; x] := 0; Hold_Last_Write[i; x] := 1; send [Hold_Last_Write[i; 1..m]] to processor i; /*send processor i's row of Hold_Last_Write to i. Processor i receives the row in Invalid array */ Process [read,x] message from processor i:: Hold_Last_Write[i; x] := 1; send [M [x], Hold_Last_Write[i; 1..m]] to processor i; Each procedure is executed atomically. Distributed Shared Memory
Advantages of the SC-DSM Protocol byJ. Zhou, M. Mizuno, and G. Singh The number of messages to be exchanged for read and write operations is the same and requires considerably less. - A write operation requires one round of message exchange between the processor and the shared memory. - A read operation at a processor also requires one round of message exchange between the processor and the shared memory if the object is not found in its local cache. The protocol does not require an atomic broadcast. The protocol does not require any broadcast of messages. Distributed Shared Memory
Release Consistency • Sequential and PRAM consistencies are restrictive. • For the case when a process is reading or writing some variables inside a CS. • Drawback: • No way for memory to differentiate between entering or leaving a CS. • So, release consistency is introduced. Distributed Shared Memory
Release Consistency • Three classes of variables: • Ordinary variables • Shared data variables • Synchronization variables: Acquire and Release (CS) DSM has to guarantee the consistency of the shared data variables. If a shared variable is read without acquire memory has no obligation to return the current value. Distributed Shared Memory
Protected Variables • Acquire and release do not have to apply to all of the memory. • Only specific shared variables may be guarded, in which case these variables are kept consistent and called protected variables. • On acquire, the memory makes sure that all the local copies of protected variables are made consistent and changes are propagated to other machines on release. Distributed Shared Memory
P1: Acq(L) W(x)1 W(x)2 Rel(L) P2:Acq(L) R(x)2 Rel(L) P3:R(x)1 Fig: Valid event sequence for release consistency. Distributed Shared Memory
Rules for release consistency • Before an ordinary access to a shared variable is performed, all previous acquires done by the process must have completed successfully. • Before a release is allowed to be performed, all previous reads and writes done by the process must have completed. • The acquire and release accesses must be processor consistent (sequential consistency is not required). Distributed Shared Memory
Implementation of Release Consistency • Two types of implementation: • Eager release consistency: Broadcast of modified data to all other processors is done at the time of release. • Lazy release consistency: A process gets the most recent values of the variables when it tries to acquire them. Distributed Shared Memory
Our Implementation • Eager release consistency • All the operations are done locally by the process and then sent to the DSM, which then broadcasts the updated values to all the other processes. Distributed Shared Memory
Data Structures Each process Pi maintains the following data structures: Cache array cache[1…..n]] // cache memory Array valid[1…..n] // whether the value in the cache is valid or not (0/1) Array locked[1…..n] // whether the variable is locked or not (0/1) Array request[1…..m] // which variables it wants to lock Distributed Shared Memory (DSM) maintains the following data structures: Memory array M[1…..n] // central memory Array lock[1…..n] // to keep track of which variables are locked (0/1) Array whom[1…..n] // locked by which processor Array pending[1…..m] // processes who are yet to be replied Array invalidate[1…..m] // values processes need to invalidate Distributed Shared Memory
Operations at Processor Pi lock(list of variables): send(Pid, ACQUIRE, no_of_variables, request[1…..m]); receive(ACK and received_values[]); for i = 1 to m locked[i] = 1; read(i): if locked[i] return cache[i]; else if valid[i] return cache[i]; else send(Pid, READ, i); receive(x); cache[i] = x; valid[i] = 1; Distributed Shared Memory
Operations at Processor Pi write(i, x): if locked[i] { cache[i] = x; valid[i] = 1; } else { send(Pid, WRITE, i, x); cache[i] = x; valid[i] = 1; } unlock(list of variables): send(Pid, RELEASE, locked[1…..m], cache[1…..m]); receive[ACK]; for i = 1 to n locked[i] = 0; Distributed Shared Memory
Operations at DSM • receive() { • switch(message) { • case READ: • send(M[i]); • break; • case WRITE: • M[i] = x; • break; Distributed Shared Memory
case ACQUIRE: /* for all the variable indices in request[1…..m], check in lock[] if they are free */ for i = 0 to no_of_variables { if (lock[request[i]] = = 0) { lock[request[i]] = 1; whom[request[i]] = Pid; requested_variable_values[i] = M[request[i]]; continue; } else { for i = 0 to no_of_variables { lock[request[i]] = 0; whom[request[i]] = 0; /* add request[i] to pending[] */ pending[Pid, i] = request[i]; } break; } send(requested_variable_values[]); } break; Distributed Shared Memory
case RELEASE: /* has received arrays locked[] and cache[] */ for i = 0 to no_of_variables { M[locked[i]] = cache[i]; invalidate[i] = locked[i]; } broadcast(invalidate[]); receive(ACK); for i = 0 to no_of_variables { lock[locked[i]] = 0; whom[locked[i]] = 0; } send(Pid, ACK); check(pending[]); check() { for i = 0 to n { /* if all pending[i] = = 1, send(ACK, Pid) */ } } break; Distributed Shared Memory
Code for P1: Lock (a, b ,c); Write (a); Read (b); Write (c); Write (c); Unlock (a, b, c); Sample Execution P1 DSM P2 (ACQUIRE, request[]) (ACK, values[]) Enter CS Write (a) Read (b) Write (c) Write (b) Exit CS (RELEASE, locked[], cache[] BROADCAST ACK RELEASE_ACK Leave CS Distributed Shared Memory
Performance Issues • Knowing the Execution History • Broadcast overhead can be reduced • No potential deadlocks • Operations inside the critical section are atomic Distributed Shared Memory
PRAM Consistency • The total ordering of requests leads to inefficiency due to more data movement and synchronization requirements than what a program may really call for. • A more relaxed version than Sequential consistency is PRAM. Distributed Shared Memory
PRAM(contd) • PRAM stands forPipelined RAM, thus, pipelined random access “ Writes done by a single process are received by all the processes in the order in which they were issued but writes from different processes may be seen in a different order by different processes.” Distributed Shared Memory
Example P1: W(x)1 P2 R(x)1 W(x)2 P3: R(x)1 R(x)2 P4: R(x)2 R(x)1 Fig: Valid sequence of events for the PRAM consistency. Distributed Shared Memory
Weak Restrictions • Only write operations performed by a single process are required to be viewed by other processes in the order that they were performed. • In other terms, all writes generated by different processes are concurrent. • Only the write order from same process needs to be consistent, thus the name pipelined. • This is a weaker model than the causal model. Distributed Shared Memory
System Architecture Cache Cache Cache Cache MiddleWare (Java Groups) DSM System Central Memory Distributed Shared Memory
Implementation The operations by the processes are carried as shown below: Write(x) : Update the local cache value. Send the updated value to all the processes. Read(x) : If present in the cache, read it from cache. else goto main memory for the variable. Distributed Shared Memory
continued… • Whenever a write is carried, the value is pushed to all the processes, thus writes done by a process are always seen in the order in which they are written in the program as each is broadcasted after its occurrence Distributed Shared Memory
Data Structures • Central Memory (CM): - An array CM[] of shared variables var1..var2. - We can do read operations and write operations on this array. - Array implemented using a Vector • Local Cache : - An array C[] of type int of size equal to that of Central memory’s. - A boolean one-dimensional array V[ ] for validity of the ith variable. - We can do read operations and write operations on cache. - Arrays implemented using a Vector Distributed Shared Memory
Pseudo Code At Processn: Read ( in) - If (valid(in)) fetch the element in from the cache vector Vc else - send read(in,n) to CM - receive value(in,n) from CM - update element in in cache - set valid(in) = true - return value(in); Distributed Shared Memory
Continued… write( in, valn) - write value valn into element in of cache vector - send write( in, valn) to CM Receive ( in, valn) - write value valn into element in of cache vector Distributed Shared Memory
At Central memory Write (index in, value vn) - write value vn into element in of vector. - send in, vn to all the n processes. Read (processn, index in ) - fetch the element in from the vector. - send value(in) to the processn. Distributed Shared Memory
Issues • Easy to implement - No guarantee about the order in which different processes see writes. - Except, that writes issued by a particular process must arrive in pipeline • Processor does not have to stall waiting for each one to complete before starting the next one. Distributed Shared Memory
Final System We are using Java Groups as Middleware We have only a single group containing all the processes and the central DSM. We are using the Reliable, FIFO JChannel for the communication between the processes and the DSM. We have only two types of communications unicast and broadcast which are efficiently provided by Jchannel. Distributed Shared Memory
DSM Initialization: DSM will be given an argument saying which consistency level it should provide for the processes. Process Initialization: When a process starts execution, it: - sends a message to DSM inquiring about the consistency level provided by the DSM. - waits for the response - Initializes the variables related to the consistency level so as to use the corresponding library for communicating with the DSM. Distributed Shared Memory
In order to connect to the system each process should know: • Group Address/Group Name • Central DSM Address Scalable Easy to connect, with just one round of messages Less load on the network. Distributed Shared Memory
Performance Evaluation • Planning to test the performance of each consistency level with large number of processes accessing the shared memory • Calculating the write cycle time and read cycle time for each consistency level at the application level • Comparing our implementation of each consistency level with the above criteria Distributed Shared Memory
References • Brown, G. Asynchronous multicaches. Distributed Computing, 4:31-36, 1990. • Mizuno, M., Raynal, M., and Zhou J.Z. Sequential consistency in distributed systems. • A Sequentially Consistent Distributed Shared Memory, J. Zhou, M. Mizuno, and G. Singh. • Distributed Operating Systems, Andrew S. Tanenbaum. • www.javagroups.com • www.cis.ksu.edu/~singh Distributed Shared Memory