370 likes | 879 Views
Distributed Shared Memory. Chapter 9. Introduction. Distributed Shared Memory Making the main memory of a cluster of computers look as though it is a single memory with a single address space. Then you can use shared memory programming techniques. 1. Distributed Shared Memory.
E N D
Distributed Shared Memory Chapter 9
Introduction • Distributed Shared Memory • Making the main memory of a cluster of computers look as though it is a single memory with a single address space. • Then you can use shared memory programming techniques.
1. Distributed Shared Memory • Still need messages or mechanisms to get data to processor, but these are hidden from the programmer:
1. Distributed Shared Memory • Advantages of DSM • System scalable • Hides the message passing - do not explicitly specific sending messages between processes • Can use simple extensions to sequential programming • Can handle complex and large data bases without replication or sending the data to processes
1. Distributed Shared Memory • Disadvantages of DSM • May incur a performance penalty • Must provide for protection against simultaneous access to shared data (locks, etc.) • Little programmer control over actual messages being generated • Performance of irregular problems in particular may be difficult
2. Implementing Distributed Shared Memory • Hardware • Special network interfaces and cache coherence circuits • Software • Modifying the OS kernel • Adding a software layer between the operating system and the application - most convenient way for teaching purposes
2.1 Software DSM Systems • Page based • Using the system’s virtual memory • Shared variable approach • Using routines to access shared variables • Object based • Shared data within collection of objects. • Access to shared data through object oriented discipline (ideally)
2.1 Software DSM Systems • Software Page Based DSM Implementation
2.1 Software DSM Systems • Some Software DSM Systems • Treadmarks • Page based DSM system • Apparently not now available • JIAJIA • C based • Some users have said it required significant modifications to work (in message-passing calls) • Adsmith object based • C++ library routines
2.2 Hardware DSM Implementation • Special network interfaces and cache coherence circuits are added to the system to make a memory reference to a remote look like a reference to a local memory location • Typical examples are in clusters of SMP machines
2.3 Managing Shared Data • There are several ways that a processor could be given access to shared data • The simplest solution is to have a central server responsible for all read and write routines. • Problem – the server is the bottleneck • Solution?
2.3 Managing Shared Data • Multiple copies of data. • Allows simultaneous access by different processors. • How do you maintain these copies using a coherence policy? • One option is Multiple Reader/Single Writer • When the writer changes the data they have 2 choices • Update policy • Invalidate policy
2.3 Managing Shared Data • Another option is Multiple Reader / Multiple Writer • This is the most complex scenario.
2.4 Multiple Reader/Single Writer Policy in a Page-Based System • Remember that a page holds more than one variable location. • Problem… • A and B can change different things in the same page • How do you deal with the consistency?
3. Achieving Consistent Memory in a DSM System • The term memory consistency model addresses when the current value of a shared variable is seen by other processors. • There are various models with decreasing constraints to provide higher performance.
3. Achieving Consistent Memory in a DSM System • Consistency Models • Strict Consistency - Processors sees most recent update, • i.e. read returns the most recent write to location. • Sequential Consistency - Result of any execution same as an interleaving of individual programs. • Relaxed Consistency- Delay making write visible to reduce messages.
3. Achieving Consistent Memory in a DSM System • Consistency Models (cont) • Weak consistency - programmer must use synchronization operations to enforce sequential consistency when necessary. • Release Consistency - programmer must use specific synchronization operators, acquire and release. • Lazy Release Consistency - update only done at time of acquire.
3. Achieving Consistent Memory in a DSM System • Strict Consistency • Every write immediately visible • Disadvantages: number of messages, latency, maybe unnecessary.
3. Achieving Consistent Memory in a DSM System • Release Consistency • An extension of weak consistency in which the synchronization operations have been specified: • acquire operation - used before a shared variable or variables are to be read. • release operation - used after the shared variable or variables have been altered (written) and allows another process to access to the variable(s) • Typically acquire is done with a lock operation and release by an unlock operation (although not necessarily).
3. Achieving Consistent Memory in a DSM System • Release Consistency • Arrows show messages
3. Achieving Consistent Memory in a DSM System • Lazy Release Consistency • Messages only on acquire • Advantages: Fewer Messages
4. Distributed Shared Memory Programming Primitives • 4 fundamental primitives • Process/thread creation (and termination) • Shared-data creation • Mutual-exclusion synchronization (controlled access to shared data) • Process/thread and event synchronization.
4.1 Process Creation • Simple routines such as: • dsm_spawn(filename, num_processes); • dsm_wait();
4.2 Shared Data Creation • Routines to construct shares data • dsm_malloc(); • dsm_free();
4.3 Shared Data Access • In a DSM system employing relaxed-consistency model (most DSM systems) • dsm_lock(lock1) • dsm_refresh(sum); • *sum++; • dsm_flush(sum); • dsm_unlock(lock1);
4.4 Synchronization Access • Two types must be provided • Global Synchronization • Process-Pair Synchronization • Typically done with identifiers to barriers • dsm_barrier(identifier);
4.5 Features to Improve Performance • Overlapping computation with communication. • dsm_prefetch(sum); • This tells the system to get the variable because we will need it soon. • Similar to speculative loads used in processor/cache algorithms.
Reducing the Number of Messages dsm_acquire(sum); *sum++; dsm_release(sum); Old Way: dsm_lock(lock1); dsm_refresh(sum); *sum++; dsm_flush(sum); dsm_unlock(lock1); 4.5 Features to Improve Performance
5. Distributed Shared Memory Programming • Uses the same concepts as shared memory programming. • Message passing occurs that is hidden from the user, and thus there are additional inefficiency considerations. • You may have to modify the code to request variables before you need them. • Would be nice if the compiler could do this
6. Implementing a Simple DSM System • It is relatively straightforward to write your own simple DSM system. • In this section we will review how this can be done.
6.1 User Interface Using Classes and Methods. • Using the OO methodology of C++ we might have a wrapper class SharedInteger sum = new SharedInteger(); • This class could extend the Integer class to provide the following methods. sum.lock() sum.refresh() sum++ sum.flush() sum.unlock()
6.2 Basic Shared-Variable Implementation • Option 1: Centralized Server
6.2 Basic Shared-Variable Implementation • Option 2: Multiple Servers
6.2 Basic Shared-Variable Implementation • Option 3: Multiple Readers (1 owner)
6.2 Basic Shared-Variable Implementation • Option 4: Migrating the owner
DSM Projects • Write a DSM system in C++ using MPI for the underlying message-passing and process communication. • (More advanced) One of the fundamental disadvantages of software DSM system is the lack of control over the underlying message passing. Provide parameters in a DSM routine to be able to control the message-passing. Write routines that allow communication and computation to be overlapped.