1 / 69

CSE 8383 - Advanced Computer Architecture

CSE 8383 - Advanced Computer Architecture. Week-5 Week of Feb 9, 2004 engr.smu.edu/~rewini/8383. Contents. Project/Schedule Introduction to Multiprocessors Parallelism Performance PRAM Model …. Warm Up. Parallel Numerical Integration Parallel Matrix Multiplication

vin
Download Presentation

CSE 8383 - Advanced Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 8383 - Advanced Computer Architecture Week-5 Week of Feb 9, 2004 engr.smu.edu/~rewini/8383

  2. Contents • Project/Schedule • Introduction to Multiprocessors • Parallelism • Performance • PRAM Model • ….

  3. Warm Up • Parallel Numerical Integration • Parallel Matrix Multiplication In class: Discuss with your neighbor! Videotape: Think about it! What kind of architecture do we need?

  4. Explicit vs. Implicit Paralleism Parallel program Sequential program Parallelizer Programming Environment Parallel Architecture

  5. Motivation • One-processor systems are not capable of delivering solutions to some problems in reasonable time • Multiple processors cooperate to jointly execute a single computational task in order to speed up its execution • Speed-up versus Quality-up

  6. Multiprocessing One-processor Physical limitations Multiprocessor N processors cooperate to solve a single computational task Speed-up Quality-up Sharing

  7. Flynn’s Classification- revisited • SISD (single instruction stream over a single data stream) • SIMD(single instruction stream over multiple data stream) • MIMD(multiple instruction streams over multiple data streams) • MISD (multiple instruction streams and a single data streams)

  8. IS IS DS CU PU MU I/O SISD (single instruction stream over a single data stream) • SISD uniprocessor architecture Captions: CU = control unit PU = Processing unit MU = memory unit IS = instruction stream DS = data stream PE = processing element LM = Local Memory

  9. PE1 LM1 DS DS Data sets loaded from host IS CU IS Program loaded from host PEn DS LMn DS SIMD (single instruction stream over multiple data stream) SIMD Architecture

  10. MIMD (multiple instruction streams over multiple data streams) IS Shared Memory CU1 PU1 IS DS I/O I/O CU1 IS PUn DS IS MMD Architecture (with shared memory)

  11. MISD (multiple instruction streams and a single data streams) IS IS CU1 CU2 CUn Memory (Program and data) IS IS IS DS PU1 DS PU2 DS PUn DS I/O MISD architecture (the systolic array)

  12. System Components • Three major Components • Processors • Memory Modules • Interconnection Network

  13. Memory Access • Shared Memory • Distributed Memory M P P P P M M

  14. Interconnection Network Taxonomy Interconnection Network Dynamic Static Bus-based Switch-based 1-D 2-D HC Crossbar Single Multiple SS MS

  15. M M M M P P P P P MIMD Shared Memory Systems Interconnection Networks

  16. Shared Memory • Single address space • Communication via read & write • Synchronization via locks

  17. P P P P C C C C C C C P P P M M M M Bus Based & switch based SM Systems Global Memory

  18. M M M M C C C C Interconnection Network P P P P Cache Coherent NUMA

  19. M M M M MIMD Distributed Memory Systems P P P P Interconnection Networks

  20. Distributed Memory • Multiple address spaces • Communication via send & receive • Synchronization via messages

  21. P P P P P P P P P P P P P P P P M M M M M M M M M M M M M M M M Processor Memory SIMD Computers von Neumann Computer Some Interconnection Network

  22. SIMD (Data Parallel) • Parallel Operations within a computation are partitioned spatially rather than temporally • Scalar instructions vs. Array instructions • Processors are incapable of operating autonomously  they must be diven by the control uni

  23. Past Trends in Parallel Architecture (inside the box) • Completely custom designed components (processors, memory, interconnects, I/O) • Longer R&D time (2-3 years) • Expensive systems • Quickly becoming outdated • Bankrupt companies!!

  24. New Trends in Parallel Architecture (outside the box) • Advances in commodity processors and network technology • Network of PCs and workstations connected via LAN or WAN forms a Parallel System • Network Computing • Compete favorably (cost/performance) • Utilize unused cycles of systems sitting idle

  25. OS OS OS M M M I/O I/O I/O C C C P P P Clusters Programming Environment Middleware Interconnection Network

  26. Grids • Grids are geographically distributed platforms for computation. • They provide dependable, consistent, pervasive, and inexpensive access to high end computational capabilities.

  27. Problem Assume that a switching component such as a transistor can switch in zero time. We propose to construct a disk-shaped computer chip with such a component. The only limitation is the time it takes to send electronic signals from one edge of the chip to the other. Make the simplifying assumption that electronic signals travel 300,000 kilometers per second. What must be the diameter of a round chip so that it can switch 109 times per second? What would the diameter be if the switching requirements were 1012 time per second?

  28. Grosch’s Law (1960s) • “To sell a computer for twice as much, it must be four times as fast” • Vendors skip small speed improvements in favor of waiting for large ones • Buyers of expensive machines would wait for a twofold improvement in performance for the same price.

  29. Moore’s Law • Gordon Moore (cofounder of Intel) • Processor performance would double every 18 months • This prediction has held for several decades • Unlikely that single-processor performance continues to increase indefinitely

  30. Von Neumann’s bottleneck • Great mathematician of the 1940s and 1950s • Single control unit connecting a memory to a processing unit • Instructions and data are fetched one at a time from memory and fed to processing unit • Speed is limited by the rate at which instructions and data are transferred from memory to the processing unit.

  31. Parallelism • Multiple CPUs • Within the CPU • One Pipeline • Multiple pipelines

  32. Speedup • S = Speed(new) / Speed(old) • S = Work/time(new) / Work/time(old) • S = time(old) / time(new) • S = time(before improvement) / time(after improvement)

  33. Speedup • Time (one CPU): T(1) • Time (n CPUs): T(n) • Speedup: S • S = T(1)/T(n)

  34. Amdahl’s Law The performance improvement to be gained from using some faster mode of execution is limited by the fraction of the time the faster mode can be used

  35. Example 20 hours B A must walk 200 miles Walk 4 miles /hour 50 + 20 = 70 hours S = 1 Bike 10 miles / hour 20 + 20 = 40 hours S = 1.8 Car-1 50 miles / hour 4 + 20 = 24 hours S = 2.9 Car-2 120 miles / hour 1.67 + 20 = 21.67 hours S = 3.2 Car-3 600 miles /hour 0.33 + 20 = 20.33 hours S = 3.4

  36. Amdahl’s Law (1967) •  : The fraction of the program that is naturally serial • (1- ): The fraction of the program that is naturally parallel

  37. S = T(1)/T(N) T(1)(1-  ) T(N) = T(1) + N 1 N S = = (1-  )  + N + (1-  ) N

  38. Amdahl’s Law

  39. Gustafson-Barsis Law N &  are not independent from each other a : The fraction of the program that is naturally serial T(N) = 1 T(1) = a + (1- a ) N S = N – (N-1) a

  40. Gustafson-Barsis Law

  41. Comparison of Amdahl’s Law vs Gustafson-Barsis’ Law

  42. Example For I = 1 to 10 do begin S[I] = 0.0 ; for J = 1 to 10 do S[I] = S[I] + M[I, J]; S[I] = S[I]/10; end

  43. Distributed Computing Performance • Single Program Performance • Multiple Program Performance

  44. PRAM Model

  45. What is a Model? • According to Webster’s Dictionary, a model is “a description or analogy used to help visualize something that cannot be directly observed.” • According to The Oxford English Dictionary, a model is “a simplified or idealized description or conception of a particular system, situation or process.”

  46. Why Models? • In general, the purpose of Modeling is to capture the salient characteristics of phenomena with clarity and the right degree of accuracy to facilitate analysis and prediction. Megg, Matheson and Tarjan (1995)

  47. Models in Problem Solving • Computer Scientists use models to help design problem solving tools such as: • Fast Algorithms • Effective Programming Environments • Powerful Execution Engines

  48. An Interface Applications A model is an interface separating high level properties from low level ones Provides operations MODEL Requires implementation Architectures

  49. PRAM Model Control • Synchronized Read Compute Write Cycle • EREW • ERCW • CREW • CRCW • Complexity: T(n), P(n), C(n) Private Memory P1 Global Private Memory P2 Memory Private Memory Pp

More Related