Presented by: Tan Q. Nguyen

ORGANIZATION OF MULTIPROCESSOR SYSTEMS Section 12.2 Presented by: Tan Q. Nguyen

Why do we need multiprocessor systems? Who will carry this?

Recall the goal of computer architecture • To maximize computer system performance • Within the CPU: • Incorporate an instruction pipeline to increase the number of instructions processed per clock cycle • Include cache memory to reduce the time needed to load and store data • speeds up transfers between memory and I/O devices by using Direct Memory Access (DMA) controller • Be able to check status of I/O devices by accepting interrupts

The significance of multiprocessor systems • Just another way to maximize system performance

Multiport Memory • Is designed for the purpose of handling multiple transfers within the memory itself. • A Multiport memory chip hastwo sets of address , data and control pins for simultaneous data transfer. • The CPU and DMA Controller can transfer data concurrently. • A system with more than one CPU can handle simultaneous requests from two different processors.

Advantage Can handle two requests to read data from the same location at the same time Disadvantage Multiport Memory cannot process two simultaneous requests to write data to the same memory location or to read from and write to the same memory location. Multiport Memory

The ways to organize processors with multiprocessor systems • There are many diverged, complex designs of computers. • Three known designs are: • Flynn’s Classification • System Topologies • MIMD System Architectures.

Flynn’s Classification • Named after the researcher Michael J. Flynn. • This classification is based on the flow of instructions and data processing within the computer. • There are four categories: • SISD: single instruction single data • SIMD: single instruction multiple data • MISD: Multiple instruction single data • MIMD: Multiple instruction multiple data Picture found in Dr. Lee’s website

Flynn’s Classification (cont’d) • SISD: the classic von Neumann architecture • MISD: not practical – Forget it • SIMD: practical – but unnecessary to use multiple processors to fetch and decode one single instruction. • An only significance of SIMD organization: • all the processors are less complex than traditional CPUs

The classic von Neumann architecture CPU Memory Subsystem Address Bus Data Bus Control Bus I/O Device I/O Device ... I/O Subsystem

Generic organization of SIMD Communication Network Main Memory Control Unit Processor Memory Memory Processor ... .... Processor Memory

Generic organization of MIMD • Each processor has its own control unit • The processors can be assigned to parts of the same task or to completely separate tasks, which depends on their topology and architecture

System Topologies • The Topology of a Multiprocessor System refers to the pattern of connections between its processors • Diameter: the maximum distance between two processors • Bandwidth: the capacity of a communications link multiplied by the number of such links in the system • Bisection bandwidth: • Split the processors into two halves • Compute the total bandwidth of the links connecting two halves

Types of System Topologies • Shared Bus • Ring • Tree • Mesh • Hypercube. • Completely Connected.

Shared Bus • Processors communicate with each other exclusively via this bus. • The bus can only handle one data transmission at a time. • Its diameter is 1 , total bandwidth is 1*l and bisection bandwidth is also 1*l (where l is the bandwidth).

M M M ... P P P Shared Bus Global Memory

Ring • Processors communicate with each other directly instead of a bus. • All communication links are active simultaneously. • A ring with n processors has diameter of |_n/2_| , total bandwidth of n*l and bisection bandwidth is 2*l (where l is the bandwidth).

P P P P P P

Tree • Processors communicate with each other directly like in ring topology. • Each processor has three connections. • It has an advantageously low diameter of 2*|_log n_| , total bandwidth of (n-1)*l and bisection bandwidth of 1*l (where l is the bandwidth).

P P P P P P P

Mesh • Every processor connects to the processors above and below it , and to its left and right. • It has a diameter of 2n , total bandwidth of (2n - 2n)and bisection bandwidth of 2n*l (where l is the bandwidth).

P P P P P P P P P

Hypercube • Is a multidimensional mesh. • It has n processors with nlogn connections. • It has a relatively low diameter of logn , total bandwidth of (n/2)*logn*l and a bisection bandwidth of (n/2)*l (where l is the bandwidth).

P P P P P P P P P P P P P P P P

Completely Connected • Every processor has n-1 connections , one to each of the other processors. • Its diameter is 1 , a total bandwidth of (n/2)*(n-1)*l and bisection bandwidth of (|_n/2_| * n/2)*l (where l is the bandwidth)

P P P P P P P P

MIMD Architectures • The Architecture of an MIMD system refers to its connections with respect to system memory. • A Symmetric Multiprocessor ( SMP ) is a computer system that has two or more processors with comparable capabilities. • The processors are capable of performing the same functions ; this is the symmetry of the SMPs.

Types of SMP • Uniform Memory Access ( UMA ). • NonUniform Memory Access ( NUMA ). • Cache Coherent NUMA ( CC-NUMA). • Cache Only Memory Access ( COMA ).

UMA • UMA gives all CPUs equal access to all locations in shared memory. Communications Mechanism Processor 1 Shared Memory Processor 2 ... Processor n

NUMA • NUMA architectures do not allow uniform access to all shared locations. • Each processor can access the memory module closest to it , its local shared memory faster than the other modules , hence ununiform memory access times. • Example: The Cray T3E Supercomputer.

Processor 1 Processor 2 Processor n . . . Memory 1 Memory 1 Memory n Communications Mechanism

Cache Coherent NUMA CCNUMA • It is similar to the NUMA Architecture. • In addition each processor includes cache memory. • Example:Silicon Graphic’s SGI.

Cache Only Memory Access COMA • In this architecture , each processor’s local memory is treated as a cache. • Example:1 )Kendall Square Research’sKSR1 and KSR2. • 2 )The Swedish Institute of Computer Science’s Data Diffusion Machine ( DDM ).

Multicomputer • Network Of Workstations ( NOW ) or Cluster Of Workstations ( COW ): • NOWs and COWs are more than a group of workstations on a local area network (LAN). • They have a master scheduler , which matches tasks and processors together.

Massive parallel processor (MMP) • Consist of many self-contained nodes , each having a processor , memory , and hardware for implementing internal communications. • The processors communicate with each other using shared memory. • Example: IBM’s Blue Gene.

Presented by: Tan Q. Nguyen

Presented by: Tan Q. Nguyen

Presentation Transcript

Four-Bit Serial Adder

EE166

Ford Motors “Virtual Integration”?

Van- Thanh -Van Nguyen (and Students) Endowed Brace Professor Chair in Civil Engineering

Wafer Fabrication

Four-Bit Serial Adder

I/O and File Systems

Software for Scientists

Presented by

CAPITAL BUDGETING TECHNIQUES

Presented by:

Computer Architecture

Analog VLSI Design

The Vietnam War

Beatbot the Beatbox

Final Thesis Defense

Grid Computing

Presented by Indiana Treasurer of State’s Office

Data Services, Inc.

Assessment of Skeleton Health