230 likes | 414 Views
CS61V. Parallel Architectures II. Computing Components. How did we learn to fly ? By constructing a machine that flaps its wings like a bird ? Answer By applying aerodynamics principles demonstrated by the nature...
E N D
CS61V Parallel Architectures II
Computing Components How did we learn to fly ? By constructing a machine that flaps its wings like a bird ? Answer By applying aerodynamics principles demonstrated by the nature... Likewise we model parallel processing after those of biological species.
Motivating Factors • Aggregated speed with which complex calculations carried out by neurons • Individual response is slow (measured in ms). This demonstrates the feasibility of parallel processing.
Threads Interface Microkernel Multi-Processor Computing System . . P P P P P P P Processor Process Thread Computing Components Applications Programming paradigms Operating System Hardware
Processing Elements Simple classification by Flynn: (No. of instruction and data streams) • SISD - conventional • SIMD - data parallel, vector computing • MISD - • MIMD - very general, multiple approaches. Current focus is on MIMD model, using general purpose processors.
Instructions Processor Data Output Data Input SISD : A Conventional Computer Speed is limited by the rate at which computer can transfer information internally. Examples: PC, Macintosh, Workstations
Instruction Stream A Instruction Stream B Instruction Stream C Processor A Data Output Stream Data Input Stream Processor B Processor C The MISD Architecture More of an intellectual exercise than a practical configuration. Few built, but commercially not available
Instruction Stream Data Output stream A Data Input stream A Processor A Data Output stream B Processor B Data Input stream B Data Output stream C Processor C Data Input stream C SIMD Architecture Examples: CRAY machine vector processing, Thinking machine CM Intel MMX (multimedia support)
MIMD Architecture Instruction Stream A Unlike SIMD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD Instruction Stream C Instruction Stream B Data Output stream A Data Input stream A Processor A Data Output stream B Processor B Data Input stream B Data Output stream C Processor C Data Input stream C
MEMORY MEMORY MEMORY BUS BUS BUS Shared Memory MIMD machine Processor A Processor B Processor C Comm: Source PE writes data to GM & destination retrieves it • Easy to build, conventional OSes of SISD can be easily be ported • Limitation : reliability & expandibility. A memory component or any processor failure affects the whole system. • Increase of processors leads to scalability problems. Examples : Silicon graphics supercomputers.... Global Memory System
SMM Examples • Dual and quad Pentiums • Power Mac G5s • Dual processor (2 GHz each)
Quad Pentium Shared Memory Multiprocessor Processor Processor Processor Processor L1 cache L1 cache L1 cache L1 cache L2 cache L2 cache L2 cache L2 cache Bus interface Bus interface Bus interface Bus interface Processor/ memory bus Memory controller I/O interface I/O bus Shared memory Memory
Shared memory • Any memory location is accessible by any of the processors • A single address space exists, meaning that each memory location is given a unique address within a single range of addresses • Generally shared memory programming is more convenient although it does require access to shared data to be controlled by the programmer • Inter-process communication is done in the memory interface through reads and writes. • Virtual memory address maps to a real address.
Shared Memory Address Space • Different processors may have memory locally attached to them. • Different instances of memory access could take different amounts of time. Collisions are possible. • UMA (i.e., shared memory) vs. NUMA (i.e., distributed shared memory)
Building Shared Memory systems Building SMM machines with more than 4 processors is very difficult and very expensive e.g. Sun Microsystems E10000 “Starfire” server • 64 processors • Price: $US several million
MEMORY MEMORY MEMORY BUS BUS BUS Memory System A Memory System B Memory System C Distributed Memory MIMD IPC channel IPC channel • Communication : IPC on High Speed Network. • Network can be configured to ... Tree, Mesh, Cube, etc. • Unlike Shared MIMD • easily/ readily expandable • Highly reliable (any CPU failure does not affect the whole system) Processor A Processor B Processor C
Distributed Memory Decentralized memory (memory module with CPU) • Lower memory latency Drawbacks • Longer communication latency • Software model more complex
Decentralized Memory versions Message passing "multi-computer" with separate address space per processor • Can invoke software with Remote Procedure Call (RPC) • Often via library, such as MPI: Message Passing Interface • Also called “synchronous communication" since communication causes synchronization between 2 processes
Message Passing System • Inter-process communication is done at the program level using sends and receives. • Reads and writes refer only to a processor’s local memory. • Data can be packed into long messages before being sent, to compensate for latency. • Global scheduling of messages can help avoid message collisions.
MIMD program structure Multiple Program Multiple Data (MPMD) Each processor will have its own program to execute Single Program Multiple Data (SPMD) A single source program is written, and each processor executes its own personal copy of the program
Execution time on a single processor Execution time on a multiprocessor with n processors Speedup factor S(n) gives increase in speed by using a multiprocessor S(n) = Speedup factor can also be cast in terms of computational steps Number of steps using one processor S(n) = Number of parallel steps using n processors Maximum speedup is n with n processors (linear speedup) - this theoretical limit is not always achieved
Maximum Speedup - Amdahl’s Law ts (1-f)ts fts Serial section Parallelizable sections One processor Multiple processors n S(n) = 1 + f(n-1) (1-f)ts/n tp
Parallel Architectures Function-parallel architectures Data-parallel architectures Instruction level PAs Thread level PAs Process level PAs (MIMDs) Distributed Memory MIMD Shared Memory MIMD