340 likes | 569 Views
Parallel Processing. I’ve gotta spend at least 10 hours studying for the IT 344 final!. I’m going to study with 9 friends… we’ll be done in an hour. Next up: TIPS. Mega- = 10 6 , Giga- = 10 9 , Tera- = 10 12 , Peta- = 10 15 BOPS, anyone?
E N D
Parallel Processing I’ve gotta spend at least 10 hours studying for the IT 344 final! I’m going to study with 9 friends… we’ll be done in an hour.
Next up: TIPS • Mega- = 106, Giga- = 109, Tera- = 1012, Peta- = 1015 • BOPS, anyone? • Light travels about 1 ft / 10-9 secs in free space. • A Tera-Hertz uniprocessor could have no clock-to-clock path longer than 300 microns… • We already know of problems that require greater than a TIP (Simulations of weather, weapons, brains)
Solution: Parallelism • Pipelining – reasonable for a small number of stages (5-10), after that bypassing and stalls become unmanageable. • Superscalar – replicate data paths and design control logic to discover parallelism in traditional programs. • Explicit parallelism – must learn how to write programs that run on multiple CPUs.
Superscalar – How far can it go? • Multiple functional units (ALUs, Addr, Floating point, etc.) • Instruction dispatch • Dynamic scheduling • Pipelines • Speculative execution
Explicit Parallelism • Distributed • Transaction-oriented • Geographically dispersed locations • E.g. SETI@home • Parallel • Single goal computing • Computing intense and/or data-intense • High-speed data exchange • Often on custom hardware • E.g. Geochemical surveys
Challenges • For distributed processing, parallelism is given and usually cannot easily change. Programming is relatively easy. • For parallel processing, the programmer defines parallelism by partitioning the serial program(s). Parallel programming in general is more difficult than transaction applications.
Other vocabulary • Decomposition • The way that a program can be broken up for parallel processing • Course-grain • Breaks into big chunks (fewer processors) • SMP • Distributed (often) • Fine-grain • Breaks into small chunks (more processors) • Image processing
Inter-processor communications Loosely-coupled Tightly-coupled Custom supercomputers Distributed processors Beowulf clusters
More Terminology • SIMD (Single Instruction Multiple Data) • MIMD (Multiple Instruction Multiple Data) • MISD (Pipeline)
SIMD • Same instruction executed in multiple units, on different data • Examples: Vector processors, AltiVec D1 I D2 I D3 I D4 I
D1 I1 D2 I2 D3 I3 D4 I4 MIMD • Each unit does own instruction on own text • Examples: Mercury, Beowulf, etc.
MISD (pipeline) D4 D3 D2 D1 I1 I2 I3 I4
Distributed Programming Tools • C/C++ with TCP/IP • Perl with TCP/IP • Java • Corba • ASP • .Net
Parallel Programming Tools • PVM • MPI • Synergy • Others (proprietary hardware)
Parallel Programming Difficulties • Program partition and allocation • Data partition and allocation • Program(process) synchronization • Data access mutual exclusion • Dependencies • Process(or) failures • Scalability…
Software techniques • Shared Memory Buffers — Areas of memory that any node can read or write • Sockets — Provide full-duplex message passing between processes. • Semaphores and Spinlocks — Provide locking and synchronization functions • Mailbox Interrupts — Provide an interrupt-driven communication mechanism • Direct Memory Access — Provides asynchronous shared memory bufferI/O.
What it really looks like Note: this computer would rank well on www.top500.org
Summary • Prospects for future CPU architectures: • Pipelining - Well understood, but mined-out • Superscalar - Nearing its practical limits • SIMD - Limited use for special applications • VLIW - Returns controls to S/W. The future? • Prospects for future Computer System architectures: • SMP - Limited scalability. Harder than it appears. • MIMD/message-passing - It’s been the future for over 20 years now. How to program?