570 likes | 898 Views
Module 3. Distributed Multiprocessor Architectures. Syllabi(Chapter 7 KHAB) (7.1.1 and 7.1.2). Loosely coupled and tightly coupled architectures Cluster computing as an application of loosely coupled architecture. Examples –CM* and Hadoop . Some Basics….
E N D
Module 3 Distributed Multiprocessor Architectures
Syllabi(Chapter 7 KHAB)(7.1.1 and 7.1.2) • Loosely coupled and tightly coupled architectures • Cluster computing as an application of loosely coupled architecture. Examples –CM* and Hadoop.
Some Basics…. • Whenever working on projects, it seems as though several people coordinating together makes for a better solution then one person trying to piece things together on their own. • This is similar to the concept of multiprocessing. • Multiprocessing is n number of p processors working and operating concurrently. • A multiprocessing system refers to a system configuration that contains more than one main control processor unit (CPU).
Why use amultiprocessing system? • First of all, a multiprocessing system is used to increase overall system performance in work being accomplished, also referred to as throughput. • By working together problems can be divided up among processor for faster completion, also called “divide and conqueror”. • Another reason for using multiprocessing systems is to increase system availability.
Introduction • Key attributes of “multiprocessors”:- • Single computer that includes multiple processors • Processors may communicate at various levels • Message passing or shared memory • Multiprocessor and Multicomputer systems • Multiple computer system consist of several autonomous computers which may or may not communicate with each other. • Multiprocessor system is controlled by single operating system which provides mechanism for interactions among processors • Architectural models • Tightly coupled multiprocessor • Loosely coupled multiprocessor
Tightly coupled multiprocessor(Basics) • Communicate via shared memory. • Complete connectivity between processor and memory. • This Connectivity accomplished by any interconnection network. • Drawback-: Performance degradation due to memory conflicts
Tightly Coupled Architecture(Details) • A tightly coupled multiprocessor system may be used in cases where speed is more of a concern. • Models:- • Without private cache • With private cache
Architecture(Without Private Cache) • This model consists of p number of processors, l memory modules, and d I/O channels. • Everything is then connected using a P/M interconnection network (PMIN). • The PMIN is a switch that can connect every processor to every memory module. • A memory module can satisfy only one processors request in a given memory cycle. This conflict is arbitrated by the PMIN.
However, in this system the best way to prevent these types of conflicts is to make l equal to p (i.e. memory modules equal to the number of processors). • Another way of eliminating this conflict is to use unmapped local memory (ULM)(Reserved Memory Area For Each Processor) • By adding the ULM we are able to reduce the amount of traffic to the PMIN and thereby reducing conflicts to and from memory.
Tightly coupled multiprocessor contd. Interrupt signal Interconnection network (ISIN) Input Output channels d-1 p-1 . . disks Processors .. . . Input/Output Interconnection network (IOPIN) .. . . .. . . . . 0 0 . . . . . . Mapped Local Memory Unmapped Local Memory Processor Memory Interconnection network (PMIN) Shared Memory Modules . . . . . . . . . l-1 0
Problem • In this type of system architecture the memory references made by the processors is usually main memory. • Memory reference common to all processor will cause conflicts. • PMIN will surely resolve this conflicts but it will cause delay in operation,which increases instruction cycle time,which decreases throughput..
Solution • Delay can be reduced by having cache for each processor which will hold memory reference for each processor. • But cache coherance problem should be taken care of. • Refer to diagram.
Tightly coupled multiprocessor contd. Interrupt signal Interconnection network (ISIN) Input Output channels d-1 p-1 . . disks Processors .. . . Input/Output Interconnection network (IOPIN) .. . . .. . . . . 0 Mapped Local Memory 0 . . . . . . Unmapped Local Memory Private Caches Processor Memory Interconnection network (PMIN) Shared Memory Modules . . . . . . . . . l-1 0
ISIN permits each processor to interupt to each processor. • ISIN also used by failong processor to broadcast message. • IOPIN permits processor to communicate with IO channel.
Tightly coupled multiprocessor contd. • Processor types • Homogeneous, if all processors perform same function • Heterogeneous, if processors perform different functions Note:Two functionally same processor may differ along other parameters like I/O, memory size, etc, i.e. they are asymmetric
Loosely Coupled Architecture • Each processor has its own set of I/O devices and memory where it accesses most of its instructions and data • Computer Module: Processor, I/O interface and memory Input/Output (I/O) Local memory (LM) Processor (P) Channel and Arbiter Switch (CAS)
Loosely coupled multiprocessor contd. • Inter-process communicate over different module happens by exchange of messages, using message transfer system(MTS) • Distributed system, degree of coupling is loose • Degree of memory conflicts is less LM I/O LM I/O P P CAS CAS Computer Module 0 ………….. Computer Module N-1 Message Transfer System (MTS)
Loosely coupled multiprocessor contd. • Inter module communication • Channel arbiter and switch (CAS) • Arbiter decide when requests from two or more computer module collide in accessing a physical segment of MTS • Also responsible for delaying other request until servicing request is completed.
Message Transfer System (MTS) • Time based or shared memory • The latter case can be implemented with set of memory modules and processor-memory interconnection network or multiported main memory. • MTS determines the performance of multiprocessor system.
For LCS,that use single time shared bus,performance limited by ,essage arrival rate on bus,message length and bus capacity. • For LCS with shared memory,limiting factor is memory conflict problem imposed by processor memory interconnection network.
Cm* Architecture • Project at Carnegie Melon University • Now what is computer module? • Computer module consists of processor,Slocal,local memory and I/O. • Slocal similar to CAS in loosely coupled arch. P S LM I/O
Cluster of computer Modules Intercluster Bus Map Bus Cm1 Cm10 KMAP P P S S … LM I/O LM I/O
Role Of Slocal • Receives and interprets requests for access to P's local and foreign to local memory and the I / O • S allows a local P to access external resources Cm • To make interpretation of local and external applications software provide: • A translation of local addresses
It uses 4 high order bits along with 1 pSW bit and then they access map table. • Map Table determines whether memory is local or not. • If memory non local control is given to Kmap via map bus. • CM coonected to kmap via map bus. • Kmap responsible for routing data between slocals.
AP Kmap Components Intercluster Bus 1 Intercluster Bus 2 Link SEND PORT 1 SEND PORT 2 RETURN SERVICE RUN PMAP KBUS OUT Map Bus Cm Cm Cm …
Request for non local memory arrives at kbus via map bus. • Linc manages communication Between Kmap and another kmap. • Pmap ->mapping processor which response to request between kbus and linc.
Kmap can simultanously handle 8 processor request. • Pmap uses the concept of queue to handle request.
Service req signaled to kbuswhenverreq for non local memort ref. • Such computer module called master Cm. • Kmap fetches virtual address via map bus and allocates context for pmap. • It places the virtual address in pmap run queue. • Pmap performs virtual address to physical address translation.
Using physical address it can initiate memory acces in any cm. • Kmap services the out req by sending physical memoryof memory req via map bus. • When destination cm completes memory access it sends return signal to kmap.
KMAP Intracluster Communication PMAP 4 Map Bus 3 5 RUN OUT 1 Cm Cm … KBUS 2 Master Slave • Cm Master initiates a memory access nonlocal • Master Cm virtual address issued by KBUS • KBUS activates a context (creating specific data structure transition) that the PMAP RUN queue • PMAP treats context and do address translation • PMAP OUT queue a request for memory cycle Cm Slave of the current cluster
KMAP PMAP 4 6 Map Bus 3 5 RUN OUT 1 Cm Cm 9 8 7 … KBUS 2 Master Slave • KBUS sendphysical address to Cm Slave by Map Bus • There is the local slave Cm local memory access cycle . • KBUS "allow" the result of memory access operation to be provided by Master Cm • Cm Master takes the data, complete and continuous operation during execution
KMAP Slave KMAP Master Intracluster communication 3 Intercluster Bus 2 4 Map Bus Map Bus … … 1 Cm Cm Cm Cm 5 Master Slave • Cm Master sends a transfer request to KMAP Master • Master prepares KMAP message / request package encode intercluster • Intercluster message is transmitted on the bus intercluster routing algorithms • Slave KMAP decode incoming requests and sends to the cluster or localMemory cycle request is sent to Cm Slave
Cop Segment Offset R/W Cm # Page Offset KMAP Master KMAP Slave K/U R/W Cm # Page Offset 3 Intercluster Bus 8 2 9 7 4 Map Bus Map Bus … … 1 Cm Cm Cm Cm 5 10 6 Master Slave Cm Slave Slave transmits the result to KMAP Slave ready KMAP message intercluster (ie context reactivation) KMAP Slave Master transmits the result to KMAP KMAP Master receives and interprets the message received The result is sent to the Master Cm