1 / 37

Enhancing Computing Performance: Trends & Techniques

Explore the evolution of computing speed, from hardware improvements to parallel programming. Learn about speedup, parallelism, and factors impacting performance in modern applications. Discover why faster components and advanced architectures are essential for computational intensity.

esmeraldak
Download Presentation

Enhancing Computing Performance: Trends & Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Tim Allen View of Computing • Faster!!! • Bigger Problems • I want 7 days of weather not 2 • I want 1024x1024x16-bit color • … • Most modern applications such as weather prediction, aerodynamics and bioinformatics are computationally intensive Overview

  2. Clock Speeds Have Been Increasing Culler, D., Singh, J.P., Gupta, A., “Parallel Computer Architecture: A Hardware/Software Approach”, Morgan Kaufman Publishers Overview

  3. Hardware Continues to Improve Motherboard performance measured in MHz/$ Source: http://www.zoology.ubc.ca/~rikblok/ComputingTrends/ Overview

  4. Moravec, Hans, “When will computer hardware match the human brain?” Journal of Evolution and Technology. 1998. Vol. 1

  5. Problem Size Is Growing Faster http://www.ncbi.nlm.nih.gov/Genbank/genbankstats.html Overview

  6. Why High Performance? • To calculate a 24 hour forecast for the UK requires about 1012 operations to be performed • Takes about 2.7 hours on a machine capable of 108 operations per second • What about a 7 day forecast? • Okay so buy a faster computer … • The speed of light is 3x108m/s. Consider two electronic devices (each capable of performing 1012 operations/second) 0.5mm apart. It takes longer for the signal to travel between them than it takes for either of them to process it. Overview

  7. Why Are Things Getting Better? • The huge increase in computing speed since the 50s can be attributed to • Faster components • More efficient algorithms • More sophisticated architectures • Most “advanced” architectures attempt to eliminate the von Neumann bottleneck Overview

  8. Von Neumann Architecture Memory May not have bandwidth Required to match the CPU Typically much slower Than the CPU CPU • Some improvements have included • Interleaved Memory • Caching • Pipelining Overview

  9. The Bottom Line • It is getting harder to extract the performance modern applications require out of a single processor machine • Given practical constraints, such as the speed of light, using multiple processing elements to solve a problem is the only way to go Overview

  10. Concurrent Programming • Operations that occur one after another, ordered in time, are said to be sequential • Operations are concurrent if they could be, but need not be, executed in parallel • A web browser often loads images concurrently • Concurrency in a program and parallelism in the underlying hardware are independent concepts Overview

  11. Parallel Computing • A large collection of processing elements that can communicate and cooperate to solve large problems quickly • A form of information processing which uses concurrent events during execution • In other words both the language and the hardware support concurrency Overview

  12. Parallelism • If several operations can be performed simultaneously, the total computation time is reduced • Here the parallel version has the potential of being 3 times faster Overview

  13. Measuring Performance • How should the performance of a parallel computation be measured? • Traditional measures like MIPS and MFLOPS really don’t cut it. • New ways to measure parallel performance are needed • Speedup • Efficiency Overview

  14. Speedup • Speedup is the most often used measure of parallel performance. • If • Ts is the best possible serial time • Tn is the time taken by a parallel algorithm on n processors • Then • Speedup = Ts / Tn Overview

  15. Read Between the Lines • Exactly what is meant by Ts (i.e. the time taken to run the fastest serial algorithm on one processor)? • One processor of the parallel computer? • The fastest serial machine available? • A parallel algorithm run on a single processor? • Is the serial algorithm the best one? • To keep things fair, Ts should be the best possible time in the serial world. Overview

  16. Speedup† • A slightly different definition of speedup also exists. • The time taken by the parallel algorithm on one processor divided by the time taken by the parallel algorithm on N processors. • However this is misleading since many parallel algorithms contain extra operations to accommodate the parallelism (e.g the communication) . • The result is Ts is increased thus exaggerating the speedup. Overview

  17. Factors That Limit Speedup • Software Overhead • Even with a completely equivalent algorithm, software overhead arises in the concurrent implementation. • Load Balancing • Speedup is generally limited by the speed of the slowest node. So an important consideration is to ensure that each node performs the same amount of work. • Communication Overhead • Assuming that communication and calculation cannot be overlapped, then any time spent communicating the data between processors directly degrades the speedup. Overview

  18. Linear Speedup • Which ever definition is used the ideal is to produce linear speedup. • A speedup of N using N processors • However in practice the speedup is reduced from its ideal value of N. • Super linear speedup results when • unfair values are used for Ts • Differences in the nature of the hardware used Overview

  19. Speedup Curves Super linear Speedup Linear Speedup Speedup Typical Speedup Number of Processors Overview

  20. Maximum Speedup • All parallel program will have a portion that cannot be parallelized • Given • F is the fraction of the computation that must be done serially • Ts is the best possible sequential time • The time to perform the computation with p processors is • FTs + ( 1 - F ) Ts / p • Therefore the speedup would be • Ts / ( FTs + ( 1 - F ) Ts / p ) = p / ( 1 + ( p - 1 ) F ) • This seems to be bad news since as p increases the speedup becomes 1/F • This is known as Amdahl’s Law Overview

  21. Efficiency • Speed up does not measure how efficiently the processors are being used. • Is it worth using 100 processors to get a speedup of 2? • Efficiency is defined as the ratio of the speedup and the number of processors required to achieve it. • The efficiency is bounded from above by 1. Overview

  22. Scalability • Scalability refers to the ability of a system to grow • Hardware • Software • For example • How difficult is it to add another 10 processors to a system? • How difficult is it to increase the image being calculated from 5 mega-pixels to 10? Overview

  23. Parallel Architectures • Unlike traditional von Neumann machines, there is no single standard architecture used on parallel machines. • In fact dozens of different parallel architectures have been built and are being used. • Several people have tried to classify the different types of parallel machines. • The taxonomy proposed by Flynn is the most commonly used. Overview

  24. Flynn’s Model of Computation • Any computer, whether sequential or parallel, operates by executing instructions on data. • a stream of instructions (the algorithm) tells the computer what to do. • a stream of data (the input) is affected by these instructions. • Depending on whether there is one or several of these streams, Flynn’s taxonomy defines four classes of computers. Overview

  25. Flynn’s Taxonomy Data Streams Instruction Streams Overview

  26. SISD Computers • Standard sequential computers • A single processing element receives a single stream of instructions that operate on a single stream of data • No parallelism here Overview

  27. SIMD Computers • All processors operate under the control of a single instruction stream • Processors can be selected under program control • There are N data streams, one per processor • Variables can live either in the parallel machine or the scalar host • Often referred to as data parallel computing Overview

  28. SIMD Controller PE PE PE PE Memory Memory Memory Memory … Interconnection Network Overview

  29. SIMD Algorithm • Calculate the number of heads in a series of coin tosses • All processors flip a coin • If ( coin is a head ) raise your hand • Note: there are better ways to count the hands Overview

  30. MIMD Computers • This is the most general and most powerful of Flynn’s taxonomy • N processors, N streams of instructions and N streams of data • Each processor operates under the control of an instruction stream issued by its own control unit • each processor is capable of executing its own program on a different data • Processors operate asynchronously and can be doing different things on different data at the same time. Overview

  31. MIMD – Shared Memory PE PE PE … PE Interconnection Network MIMD computers with shared memory are known as multiprocessors or tightly coupled machines. Overview

  32. MIMD – Message Passing PE PE PE PE Memory Memory Memory Memory … Interconnection Network MIMD computers with an interconnection network are known as multicomputers or loosely coupled machines. Multicomputers are sometimes referred to as distributed systems which is incorrect. Distributed systems refer to a network of computer and even though the number of processing units can be large the communication is typically slow Overview

  33. MIMD Algorithm • Generate the prime numbers from 2 to 20 • Receive a number • This is your prime • Repeat • Receive a Number • If this number is not evenly divisible by your prime, pass it to the next processor Overview

  34. Another MIMD Algorithm • Given a list of 5 numbers • Sort your list of numbers • Processors 1, 3, 5, and 7 give your list to processor (n-1) • Processors 0, 2, 4, 6 merge your list with the new list • Processors 2, 6 give your list to processor (n-2) • Processors 0 and 4 merge your list with the new list • Processor 4 give your list to processor 0 • Processor 0 merge your list with the new list • Processor 0 give your list to me Overview

  35. The 4 Classes Overview

  36. SPMD Computing • SPMD stands for single program multiple data • The same program is run on the processors of an MIMD machine • Occasionally the processors may synchronize • Because an entire program is executed on separate data, it is possible that different branches are taken, leading to asynchronous parallelism • SPMD can about as a desire to do SIMD like calculations on MIMD machines • SPMD is not a hardware paradigm, it is the software equivalent of SIMD Overview

  37. Conclusions • Parallel/Distributed computing seems to be the only way to achieve the computing power required for the current generation of applications. • Speedup and efficiency are common measures of parallel computing. • Flynn proposed a taxonomy to classify parallel architectures. • SIMD • MIMD • Software needs to be retooled to take advantage of the high-performance environment. Overview

More Related