Computer Architecture Guidance

ComputerArchitectureGuidance Keio University AMANO, Hideharu hunga@am．ics．keio．ac．jp

Contents Techniques of two key architectures for future system LSIs • Parallel Architectures • Reconfigurable Architectures • Advanced uni-processor architecture →Special Course of Microprocessors (by Prof. Yamasaki, Fall term)

Class • Lecture using Powerpoint: (70mins. ) • The ppt file is uploaded on the web site http://www.am.ics.keio.ac.jp, and you can down load/print before the lecture. • When the file is uploaded, the message is sent to you by E-mail. • Textbook: “Parallel Computers” by H.Amano (Sho-ko-do) • Exercise (20mins.) • Simple design or calculation on design issues

Evaluation • Exercise on Parallel Programming using SCore on RHiNET-2 (50%) • Exercise after every lecture (50%)

ComputerArchitecture1Introduction to Parallel Architectures Keio University AMANO, Hideharu hunga@am．ics．keio．ac．jp

Parallel Architecture A parallel architecture consists of multiple processing units which work simultaneously. • Purposes • Classifications • Terms • Trends

Boundary between Parallel machines and Uniprocessors Uniprocessors • ILP(InstructionLevelParallelism) • A single Program Counter • Parallelism Inside/Between instructions • TLP(TreadLevelParallelism) • Multiple Program Counters • Parallelism between processes and jobs Parallel Machines Definition Hennessy & Petterson’s Computer Architecture: A quantitative approach

Performance improvement Tightly Coupling Increasing of simultaneous issued instructions vs. Tightly coupling Single pipeline Multiple instructions issue Multiple Threads execution On chip implementation Shared memory, Shared register Connecting Multiple Processors

Purposes of providing multiple processors • Performance • A job can be executed quickly with multiple processors • Fault tolerance • If a processing unit is damaged, total system can be available： Redundant systems • Resource sharing • Multiple jobs share memory and/or I/O modules for cost effective processing：Distributed systems • Low power • High performance with Low frequency operation Parallel Architecture: Performance Centric!

Flynn’s Classification • The number of InstructionStream：　M(Multiple)/S(Single) • The number of DataStream：M/S • SISD • Uniprocessors（including Super scalar、VLIW） • MISD： Not existing（AnalogComputer） • SIMD • MIMD

Instruction SIMD (Single Instruction StreamsMultiple Data Streams • All Processing Units executes the same instruction • Low degree of flexibility • Illiac-IV/MMX（coarse grain） • CM-2 type（fine grain） Instruction Memory Processing Unit Data memory

Two types of ＳＩＭＤ • Coarse grain：Each node performs floating point numerical operations • ILLIAC-IV，BSP,GF-11 • Multimedia instructions in recent high-end CPUs • Dedicated on-chip approach: NEC’s IMEP • Fine grain：Each node only performs a few bits operations • ICLDAP,CM-2，MP-2 • Image/Signal Processing • Connection Machines extends the application to Artificial Intelligence (CmLisp)

A processing unit of CM-2 Flags A B F OP C Context s c 256bit memory 1bit serial ALU

P P P P P P P P P P P P P P P P Element of CM2 4096 chips = 64KPE Instruction LSI chip Router 4x4 Processor Array 12links 4096 Hypercube connection 256bit x 16 PE RAM

Thinking Machines’ CM2 (1996)

The future of SIMD • Coarse grain SIMD • A large scale supercomputer like Illiac-IV/GF-11 will not revive. • Multi-media instructions will be used in the future. • Special purpose on-chip system will become popular. • Fine grain SIMD • Advantageous to specific applications like image processing • General purpose machines are difficult to be built ex.CM2　→　CM5

Each processor executes individual instructions • Synchronization is required • High degree of flexibility • Various structures are possible MIMD Processors Interconnection networks Memory modules (Instructions・Data）

Classification of MIMD machines Structure of shared memory • UMA(UniformMemoryAccessModel） provides shared memory which can be accessed from all processors with the same manner. • NUMA(Non-UniformMemoryAccessModel） provides shared memory but not uniformly accessed. • NORA/NORMA（NoRemoteMemoryAccessModel） provides no shared memory. Communication is done with message passing.

On-chip multiprocessor Chip multiprocessor Single chip multiprocessor UMA • The simplest structure of shared memory machine • The extension of uniprocessors • OS which is an extension for single processor can be used. • Programming is easy. • System size is limited. • Bus connected • Switch connected • A total system can be implemented on a single chip

Snoop Cache Snoop Cache Snoop Cache Snoop Cache PU PU PU An example of UMA：Bus connected MainMemory sharedbus PU SMP(Symmetric MultiProcessor) On chip multiprocessor

Switch connected UMA ．．．． LocalMemory CPU Interface Switch …． MainMemory The gap between switch and bus becomes small

Competitive to WS/PC clusters with Software DSM NUMA • Each processor provides a local memory, and accesses other processors’ memory through the network. • Address translation and cache control often make the hardware structure complicated. • Scalable： • Programs for UMA can run without modification. • The performance is improved as the system size.

Typical structure of NUMA Node　００ Node　１１ＩｎｔｅｒｃｏｎｎｅｃｔｏｎＮｅｔｗｏｒｋ２ Node2 ３ Logical address space Node　３

Classification of NUMA • Simple NUMA： • Remote memory is not cached. • Simple structure but access cost of remote memory is large. • CC-NUMA：CacheCoherent • Cache consistency is maintained with hardware. • The structure tends to be complicated. • COMA:CacheOnlyMemoryArchitecture • No home memory • Complicated control mechanism

Cray’s T3D: A simple NUMA supercomputer (1993) • Using Alpha 21064

The Earth simulator(2002)

SGIOrigin BristledHypercube MainMemory HubChip Ｎｅｔｗｏｒｋ MainMemory is connected directly with HubChip 1 cluster consists of ２PE.

SGI’s CC-NUMA Origin3000(2000) • Using R12000

．．． ．．．．．． DDM(DataDiffusionMachine）Ｄ．．．

The fastest processor is always NORA (except The Earth Simulator) Cluster computing Inter-PU communications NORA/NORMA • No shared memory • Communication is done with message passing • Simple structure but high peak performance Hard for programming

Early Hypercube machine nCUBE2

Fujitsu’s NORA AP1000(1990) • Mesh connection • SPARC

Intel’s Paragon XP/S(1991) • Mesh connection • i860

PC Cluster • Beowulf Cluster (NASA’s Beowulf Projects 1994, by Sterling) • Commodity components • TCP/IP • Free software • Others • Commodity components • High performance networks like Myrinet / Infiniband • Dedicated software

RHiNET-2 cluster

Terms(1) • Multiprocessors： • MIMD machines with shared memory • （Strict definition：ｂｙ　Ｅｎｓｌｏｗ　Ｊｒ．） • Shared memory • Shared I/O • Distributed OS • Homogeneous • Extended definition: All parallel machines（Wrong usage）

Don’t use if possible Terms（2) • Multicomputer • ＭＩＭＤ machines without shared memory, that is ＮＯＲＡ／ＮＯＲＭＡ • Arraycomputer • A machine consisting of array of processing elements : ＳＩＭＤ • A supercomputer for array calculation (Wrong usage) • Loosely coupled ・ Tightly coupled • Loosely coupled: ＮＯＲＡ，Tightly coupled:ＵＭＡ • But, are NORAs really loosely coupled??

Classification Ｆｉｎｅ　ｇｒａｉｎ　ＳＩＭＤＣｏａｒｓｅ　ｇｒａｉｎ　 Multiprocessors Stored programming based Bus connected UMA Switch connected UMA Ｓｉｍｐｌｅ　ＮＵＭＡＣＣ－ＮＵＭＡＣＯＭＡＭＩＭＤＮＵＭＡＮＯＲＡ Multicomputers Systolic architecture Data flow architecture Mixed control Demand driven architecture Others

Systolic Architecture Data ｘ Computational Array Data ｙ Data streams are inserted into an array of special purpose computing node in a certain rhythm. Introduced in Reconfigurable Architectures

Data flow machines ｄｅ A process is driven by the data ｃｘａｂ＋＋ｘ（ａ＋ｂ）ｘ（ｃ＋（ｄｘｅ）） Also introduced in Reconfigurable Systems

Exercise 1 • In this class, a PC cluster RHiNET is used for exercising parallel programming. Is RHiNET classified into Beowulf cluster ? Why do you think so? • If you take this class, send the answer with your name and student number to hunga@am.ics.keio.ac.jp

Computer Architecture Guidance

Computer Architecture Guidance

Presentation Transcript

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture Guidance

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture

Computer Architecture