340 likes | 365 Views
Topic. What is Shared Memory? By M. N. Adow. Overview. On-Chip memory Bus-Based Multiprocessors Ring-Based Multiprocessors Switched Multiprocessors NUMA multiprocessors Comparison of Shared Memory Systems. On-Chip Memory. Self –contained chips CPU Memory
E N D
Topic What is Shared Memory? By M. N. Adow
Overview • On-Chip memory • Bus-Based Multiprocessors • Ring-Based Multiprocessors • Switched Multiprocessors • NUMA multiprocessors • Comparison of Shared Memory Systems
On-Chip Memory • Self –contained chips • CPU • Memory • CPU portion of chip has address and data lines that directly connect to memory • Application • Cars, Toys and Appliances • An extension of this chip to have multiple CPUs directly sharing the same memory • Shared – memory multiprocessor
On-Chip Memory • Fig.6-1 (a) and (b) [1]
Bus-Based Multiprocessors • The connection between the CPU and Memory is a collection of wires. • Some holding the address the CPU wants to READ/WRITE • Some sending and receiving DATA, rest for CONTROLLING the transfers • Such a collection of wires are called bus
Bus-Based Multiprocessors • Multiprocessors based on a bus is when more than one CPUs is connected to A memory using a BUS • The memory is said to be shared • When any CPU wants to READ from memory, It puts the address it wants to read on the bus • And asserts (puts a signal on) a bus control line indicating that it wants to read • When memory has fetched the requested word, it puts the word on the bus and asserts another control line announce that it is ready • Then the CPU reads the word
Bus-Based Multiprocessors • To avoid two or more CPUs from accessing memory at same time • A bus arbitration is required • Example • A CPU might first have to request the bus by asserting a special request line • Only after receiving permission would it be allowed to use the bus • Granting of permission can be done in a centralized using bus arbitration device • Or in a decentralized way, with the first requesting CPU along the bus winning any conflict
Bus-Based Multiprocessors • Problem • Bus becomes of overloaded • Solution • Reduce overloading by equipping each CPU with a snooping cache • Problem • Different cache may contain different values for same memory location • Solution • Cache consistency protocols
Bus-Based Multiprocessors • Cache consistency protocols • Write through protocol • Write once protocol
Bus-Based Multiprocessors • Write through protocol • Cases • –read miss (word not cached) • -read hit (word cached) • It is simple and common • When CPU first read a word from memory, that word is fetched over the bus and is stored in the cache • If needed later, CPU takes it from the cache • This reduces traffic
Bus-Based Multiprocessors • Each processor does its caching independently • Its possible for a particular to be cached at two/more CPUs at the same time • Next is how writing to memory is done in write through protocol
Bus-Based Multiprocessors • cases • If no CPU has the word being written in it is cache Then memory is just updated • If CPU doing the write has the only copy of the word, Then it is cache and memory are both updated • If two/more CPUs have the word in their cache, (including the CPU doing the write ) Then { it is cache is updated first, also written to bus to update memory. Now all the cache see the writing (snooping on the bus) and check to see if there also hold the word being updated, \If so, invalidate their cache} • So, only one CPU has the word in it is cache and memory is up to date • An alternative to invalidate, is to update all the caches, this is slower
Bus-Based Multiprocessors • Advantage – protocol • Simple to understand • Easy to implement • Reduces bus traffic • Disadvantage • No. of CPUs is a constrain
Bus-Based Multiprocessors • Write once protocol • Premises • Once a CPU has written a word, that CPU is likely to need that word again, and it is unlikely that another CPU will use the word quickly • CPU writing the word takes a temp ownership of the word • It could avoid having to update memory on subsequent writes until a different CPU exhibits interest in that word • This protocol manages cache blocks, each of which can be in one of the following three states:- • Invalid - block do not contain valid data • Clean - memory is up to data/block may be cached • Dirty – memory is incorrect/no other cache holds the block • A word that is being read by multiple CPUs is allowed to be present in all their caches • A word that is being heavily written by one machine is kept in it cache
Bus-Based Multiprocessors • Illustration • Each cache block consists of a single word • CPUs A,B,C • CPU B has a cached copy of the word at address w • Value is W1 • Copy in memory is valid
Bus-Based Multiprocessors • Fig 6-4 [1]
Bus-Based Multiprocessors • Important properties • Consistency is achieved by having all caches do snooping over the bus • The protocol is built into memory management unit • Entire algorithm is performed under a memory cycle
Ring-Based Multiprocessors • Example of Ring-Based multiprocessors is memnet • In a memnet, a single address space is divided in to a private part and a shared part • The private part is divided up to regions so that each machine has a piece for it is stacks and other unshared data & code • The shared part is common to all machines and is kept consistent by a hardware protocol • Shared memory is divide in to 32 byte blocks, which is the unit for transfer between machines • A block may be cached on machine other than its home machine • A read only block may be present on multiple machines
Ring-Based Multiprocessors • A read and write may be present on only one machine • A block need not to be present on it is home machine but guarantees storage place because there is no global memory • Memnet device on each machine contains a table, with entry for each block in the shared space, index by block number • Each entry contains • A valid bit -block is cached & up to date • A exclusive bit-specifies a local copy • A home- block home machine • A interrupt bit- forcing interrupt • A location bit – cache is present and valid
Ring-Based Multiprocessors • Protocol • Read • Write
Ring-Based Multiprocessors • Read • When a CPU wants to read a word from shared memory • Address is passed to memnet device • Memnet checks the block table to see If it is present, if so, the request is satisfied If not, memnet waits until it captures the circulating token, then it puts a request packet on to ring and suspends the CPU Each memnet device along the way checks to see if it has the block needed, If so, modifies & inhibits the token If the blocks’ exclusive bit is set, it is cleared and the token comes back to the sender, when it is stored If no free space in it is cache, it picks a block at random and sends it home thus free cache slot Blocks whose home bit are set, are not chosen because there are at home
Ring-Based Multiprocessors • Write • Three cases • If Exclusive bit is set -block is present and only copy -word is written locally • If Block is present but not the only copy -invalidation token sent out -exclusive bit is set • If Block is not present -token is sent that combines read and invalidation -first machine with a copy copies to the token and invalidates its copy -all subsequent machines discard their copies
Switched multiprocessors • Bus-based and ring-based multiprocessors work fine for small system • As CPU are added, bandwidth saturates and performance goes down • Two approaches to the problem of bandwidth -reduce amount communication • Use of caching • Improvement on caching schemes/protocol [2] • Optimizing the block size • Re-organizing the process to increase the locality of memory ref. -increase communication capacity (add more bandwidth) • Change the topology –add more bus and change interconnection • Building a hierarchy (regarding CPUs on every bus as cluster and building multiple clusters and connect them using inter-cluster bus
Switched multiprocessors • Most CPUs communicate primary within their own cluster and little inter-cluster traffic • As required, inter-cluster bus can be added • System can be upgraded to multiple supper-cluster connected by a bus • An example of a hierarchical design based on a grid of cluster is dash machine
NUMA multiprocessors • Non-uniform memory access • Have a single virtual address that is visible to all CPUs • No caching/access to remote memory is slower • CM*, first NUMA machine • Each machine consists of number of cluster • Each cluster consists of a CPU, MMU, Memory module and etc
NUMA multiprocessors • All connected by a bus • No cache is present and no bus snooping • Clusters are connected by inter-cluster buses
NUMA multiprocessors • Properties of NUMA multiprocessors 1.Access to remote is possible 2.Access to remote is slower 3.Remote access times are not hidden by caching
NUMA multiprocessors • NUMA Algorithms • NUMA systems have a daemon process called page scanner, that allow system to adopt to changes is reference pattern • Page scanner gathers usage statistics about local and remote references (periodic) • If usage statistics indicate that a page is in wrong place, the page scanner un-maps the page so that the next reference causes a page fault • This allows a new placement decision to be made • If page is moved too often within a short interval, page scanner can mark the page as frozen, which inhibits further movement until some seconds have elapsed • New strategies proposed includes -invalidate any page with more remote references than local ones
references 1.Andrew S. Tanenbaum, Distributed Operating System, Prentice Hall inc 1995 2. Http:/courseweb.sp.cs.cmu.edu/~cs612 3.Http://1se.sourceforge.net/numa 4.John B. Carter el at, Distributed Shared Memory. Computer systems laboratory, university of Utah-IEEE 1995