110 likes | 124 Views
Introduction. What is Parallel Algorithms? Why Parallel Algorithms? Evolution and Convergence of Parallel Algorithms Fundamental Design Issues. What is Parallel Algorithm?. An algorithm which uses multiple components of hardware and software. Level of parallelism
E N D
Introduction • What is Parallel Algorithms? • Why Parallel Algorithms? • Evolution and Convergence of Parallel Algorithms • Fundamental Design Issues
What is Parallel Algorithm? • An algorithm which uses multiple components of hardware and software. • Level of parallelism • Circuits -- VLSI Arithmetic • Functional Units -- Instructional level • Processors • Processes • Tasks (Job) • Data • Closely related to parallel architectures and computation model
What is Parallel Architecture? • A parallel computer is a collection of processing elements that cooperate to solve problems fast • Issues: • Resource • how large a collection? • how powerful are the processing elements? • Communication and Synchronization • how do the elements cooperate and communicate? • how are data transmitted between processors? • what are the abstractions and primitives for cooperation? • Performance and Scalability • how does it all translate into performance? • how does it scale?
Inevitability of Parallel Computing • Application demands:Demands for computing cycles • Scientific computing: CFD, Biology, Chemistry, Physics, ... • General-purpose computing: Video, Graphics, CAD, Databases, TP... • Technology Trends • Number of transistors on chip growing rapidly • Clock rates expected to go up only slowly • Architecture Trends • Instruction-level parallelism valuable but limited • Coarser-level parallelism, as in MPs, the most viable approach • Economics • Current trends: • Today’s microprocessors have multiprocessor support • Servers and workstations becoming MP: Sun, SGI, DEC, COMPAQ!... • Tomorrow’s microprocessors are multiprocessors
Driving Force of Parallel Architectures • Application Trends Applications provide wish lists (Grand Challenge Problems) Weather Simulation, Data Mining, N-body Some problems require terra (10^12) to Peta flops (10^15) • VLSI Technology Trends Smaller minimum feature size (transistor size) Increasing Die Size => Transistor count double every 3 year • Architectural Trends Trends in Computer Performance: Annual rate of 1.35 before 1985, 1.85 after 1986 (RISC) • Programming Model Convergence • Economics
Application Trends • Transition to parallel computing has occurred for scientific and engineering computing • In rapid progress in commercial computing • Database and transactions as well as financial • Usually smaller-scale, but large-scale systems also used • Desktop also uses multithreaded programs, which are a lot like parallel programs • Demand for improving throughput on sequential workloads • Greatest use of small-scale multiprocessors • Solid application demand exists and will increase
General Technology Trends • Microprocessor performance increases 50% - 100% per year • Transistor count doubles every 3 years • DRAM size quadruples every 3 years • Huge investment per generation is carried by huge commodity market • Not that single-processor performance is plateauing, but that parallelism is a natural way to improve it. 180 160 140 DEC 120 alpha Integer FP 100 IBM HP 9000 80 RS6000 750 60 540 MIPS MIPS 40 M2000 Sun 4 M/120 20 260 0 1987 1988 1989 1990 1991 1992
Proc $ Interconnect Technology: A Closer Look • Basic advance is decreasing feature size ( ) • Circuits become either faster or lower in power • Die size is growing too • Clock rate improves roughly proportional to improvement in • Number of transistors improves like (or faster) • Performance > 100x per decade; clock rate 10x, rest transistor count • How to use more transistors? • Parallelism in processing • multiple operations per cycle reduces CPI • Locality in data access • avoids latency and reduces CPI • also improves processor utilization • Both need resources, so tradeoff • Fundamental issue is resource distribution, as in uniprocessors
Economics • Commodity microprocessors not only fast but CHEAP • Development cost is tens of millions of dollars (5-100 typical) • BUT, many more are sold compared to supercomputers • Crucial to take advantage of the investment, and use the commodity building block • Exotic parallel architectures no more than special-purpose • Multiprocessors being pushed by software vendors (e.g. database) as well as hardware vendors • Standardization by Intel makes small, bus-based SMPs commodity • Desktop: few smaller processors versus one larger one? • Multiprocessor on a chip
Challenge of Parallel Processing: • Micro-processors, Memory are cheap. • Connect bunches of them together. • Use either aynchonously or syncronously. • Example: each borad: 100 of 10MFLOP micro-processor • each system: 100 Board • Speed: 100GFLOP • Reason: • Cache Coherancy • Extra Design Time for Bus, interface • Compiler may not find parallelism easily • Communication and I/O complexity • Economics • Amdahl's Law • Writing Parallel Programs not easy
Amdahl's Law Speedup S is bounded by 1 S < ------------- f + (1-f)/p p: number of PEs, f: fraction of serial part. Example: if 10% of a program must be sequentially, maximum speed-up is 10 To achieve S=20 for p=100, 99.75% of codes should be parallel How to overcome the law? Increase problem size: Scaled speed-up. Develop parallel algorithm maximizing the parallelism and minimizing the sequentail operations. How to Solve Communication Overhead Increase Communication bandwith Decrease Communication Latency Communication Latency Hiding Increase Granularity (Figure 8.4 of Henessey's book)