470 likes | 684 Views
Distributed Shared Memory Systems and Programming. By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations and parallel computers by Barry Wilkinson and Michael Allen, and. Distributed Shared Memory Systems.
E N D
Distributed Shared Memory Systems and Programming By: Kenzie MacNeil Adapted from Parallel Programming Techniques and Applications using networked workstations and parallel computers by Barry Wilkinson and Michael Allen, and
Distributed Shared Memory Systems • Shared memory programming model on al cluster • Has physically distributed and separate memory • Programming Viewpoint: • Memory is grouped together and sharable between processes • Known as Distributed Shared Memory (DSM)
Distributed Shared Memory Systems • Can be achieved by software or hardware • Software: • Easy to use on clusters • Inferior to using explicit message passing on the same cluster • Utilizes the same techniques as true shared memory systems (Chapter 8)
Distributed Shared Memory • Shared memory programming is generally more convenient than message passing • Data can be accessed by individual processors without explicitly sending data • Shared data has to be controlled • Locks or other means • Both message passing and shared memory often require synchronization
Distributed Shared Memory • Distributed Shared Memory is a group of interconnected computers appearing to have a sing memory with a single address space • Each computer having its own memory which is physically distributed • Any memory location can be accessed by any processor in the cluster • Regardless of the memory residing locally
Advantages of DMS • Normal shared memory programming techniques can be used • Easily scalable, compared to traditional bus-connected shared memory multiprocessors • Message passing is hidden from the user • Can handle complex and large data bases without replication or sending the data to processes
Disadvantages of DMS • Lower performance than true shared memory multiprocessor systems • Must provide for protection against simultaneous access to shared data • Locks, etc. • Little programmer control over actual messages being generated • Incur performance penalties when compared to message passing routines on a cluster
Hardware DSM Systems • Special network interfaces and cache coherence circuits are required • Several interfaces that support shared memory operations • Higher level of performance • More expensive
Software DSM Systems • Requires no hardware changes • Preformed by software routines • Software layer added between the operating system and the applications • Kernel may or may not be modified • Software layer can be • Page based • Shared variable based • Object based
Page Based DMS • Existing virtual memory is used to instigate movement of data between computer • Occurs when page referenced does not reside locally • Referred to as virtual shared memory system • Page based systems include: • The first DMS system by Li(1986), TreadMarks (1996), Locust (1998)
Page Based DMS Disadvantages • Size of the unit of the data, a page, can be too big • More than the specific data is usually referenced • Leads to longer messages • Not portable, because they are tied to a particular virtual memory hardware and software • False sharing effects appear at the page level • Situation in which different parts of a page are required by different processors without any actual sharing of information, but each page must be shared by each process to access different parts
Shared Variable DMS • Only variables declared as shared are transferred • Transferred on demand • Paging mechanism is not used • Software routines perform the actions • Shared Variable DMS approach includes: • Munin (1990), JIAJIA (1999), Adsmith (1996)
Object Based DMS • Shared data is embodied in objects • Includes data items and procedures/methods • Methods used to access data • Similar to shared variable approach, even considered an extension • Easily implemented in OO languages
Managing Shared Data • Many ways a processor can be given access to shared data • Simplest is the use of a central server • Responsible for all read write operations on shared data • Requests sent to this server • Occurs sequentially on the server • Implements a single reader/ single writer policy
Managing Shared Data • Single reader/writer policy incurs bottleneck • Additional servers can be added to relieve this bottleneck by sharing variables • However multiple copies of data is preferable • Allows simultaneous access to the data by different processors • Coherence policy must be used to maintain these copies
Multiple Reader / Single Writer • Allows multiple processors to read shared data • Which can be achieved by replicating data • Allows only one processor, the owner, to alter data at any instant • When an owner alters data two policies are available: • Update policy • Invalidate policy
Multiple Reader/Single Writer Policy • Update policy • Utilizes broadcast • All copies are altered to reflect broadcast message • Invalidate policy • All unaltered copies of the data are flagged as invalid • Requires a processor to make a request from the owner • Any copies of the data that are not accessed remain invalid • Both policies are needed to be reliable
Multiple Reader/Single Writer Policy • Page based approach • Complete page, which holds the variable, is transferred • A variable stored on a page which is not shared will be moved or invalidated • Protocols offered by applications like TreadMarks for dual writing to a single page
Achieving Consistent Memory in DSM • Memory consistency addresses when the current value of a shared variable is seen by other processors • Various models are available: • Strict Consistency • Sequential Consistency • Relaxed Consistency • Weak consistency • Release Consistency • Lazy Release Consistency
Strict Consistency • Variable is obtained from the most recent write to the shared variable • As soon as a variable is altered all other processors are informed • Can be done by update or invalidity • Disadvantage is the large number of messages and changes are not instantaneous • Relaxed memory consistency, writes are delayed to reduce message passing
Sequential and Weak Consistency • Sequential consistency, result of any execution same as an interleaving of individual programs • Weak consistency, synchronized operations are used by the programmer to enforce sequential consistency • Any accesses to shared data can be controlled with synchronized operations • Locks, etc
Release Consistency • Extension of weak consistency • Specified synchronization operation • Acquire operation, used before a shared variable or variables are to be read • Release operations, used after the shared variable or variable have been altered • Acquire is performed with a lock operation • Release is performed with an unlock operation
Lazy Release Consistency • Version of release consistency • Update is only done at the time of acquire rather than at release • Generates fewer messages that release consistency
Distributed Shared Memory Programming Primitives • Four fundamental and necessary operations of shared memory programming: • Process/thread creations and termination • Shared data creation • Mutual exclusion synchronization, controlled access to shared data • Process/thread and event synchronization • Typically provided by user-level library calls
Process Creation • Set of routines are defined by DSM systems • Such as Adsmith and TreadMarks • Used to start new process if process creation is supported • dsm_spawn(filename, num_processes);
Shared Data Creation • Routine is necessary to declare shared data • dsm_shared(&x); or shared int x; • Dynamically creates memory space for shared data in the manner of a C malloc • After memory space can be discarded
Shared Data Access • Various forms of data access are provided depending on the memory consistency used • Some systems provide efficient routines for difference classes of accesses • Adsmith provides three types of accesses: • Ordinary Accesse • Synchronization Access • Non-Synchronization Access
Synchronization Accesses • Two principle forms: • Global synchronization and process-process pair synchronization • Global is usually done through barrier routines • Process-process pair can be done by the same routine or separate routines through simple synchronous send/receive routines • DSM systems could also provide their own routines
Overlapping Computations with Communications • Can be provided by starting a nonblocking communication before it results are needed • Called a prefetch routine • Program continues execution after the prefetch has been called and while the data is being fetched • Could even be done speculatively • Special mechanism must be in place to handle memory exceptions • Similar to speculative load mechanism used in advanced processors that overlap memory operations with program execution
Distributed Shared Memory Programming • DSM programming on a cluster uses the same concepts as shared memory programming on a shared memory multiprocessor system • Uses user level library routines or methods • Message passing is hidden from the user
Basic Shared-Variable Implementation • Simplest DSM implementation is to use a shared variable approach with user level DSM library routines • Sitting on top of an existing message passing systems, such as MPI • Routines can be embodied into classes and methods • The routines could send messages to a central location that is responsible for the shared variables
Simple DSM System using a Centralized Server Single reader/writer protocol
Basic Shared-Variable Implementation • A simple DSM system using a centralized server can easily result in a bottleneck • One method to reduce this bottleneck is to have multiple servers running on different processors • Each server responsible for specific shared variables • This is a single reader / single writer protocol
Basic Shared-Variable Implementation • Also can provide multiple reader capability • A specific server is responsible for the shared variable • Other local copies are invalidated
Simple DSM System using Multiple Servers and Multiple Reader Policy
Overlapping Data Groups • Existing interconnections structure • Access patterns of the application • Static overlapping • Defined by the programmer prior to execution • Shared variables can migrate according to usage
Symmetrical Multiprocessor System with Overlapping Data Regions
Simple DSM System using Multiple Servers and Multiple Reader Policy