1 / 11

Chapter 7:

Chapter 7:. 22540 - Computer Arch. & Org. (2). Parallel Computers. Parallelism. Uniprocessor vs. Multiprocessors Process per Processor  Process-Level Parallelism Parallel Processing Program (Multithreading) Multicore vs. Cluster Single Chip vs. LAN Interconnect.

pascal
Download Presentation

Chapter 7:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 7: 22540 - Computer Arch. & Org. (2) Parallel Computers

  2. Parallelism • Uniprocessor vs. Multiprocessors • Process per Processor  Process-Level Parallelism • Parallel Processing Program (Multithreading) • Multicore vs. Cluster • Single Chip vs. LAN Interconnect

  3. Parallel Processing Program • Amdahl’s Law Exercise: To achieve a speedup of 90 times faster with 100 processors, what percentage of the original computation can be sequential? Execution Time Execution Time Affected Execution After = ──────────────── + Time Improvement Amount of Improvement Unaffected

  4. Scaling • Strong Scaling Speedup achieved on a multiprocessor without increasing the size of the problem. Exercise: Consider sum of 10 scalars (10 sequential additions, Tadd) and sum of two 10 × 10 matrixes (100 parallel additions). What are the speedups for 10 & 100 processors?

  5. Scaling • Weak Scaling Speedup achieved on a multiprocessor while increasing size of the problem proportional to increase in # of processors. Exercise: Consider sum of 10 scalars (10 sequential additions, Tadd) and sum of two 100 × 100 matrixes (10,000 parallel additions). What are the speedups for 10 & 100 processors?

  6. Load Balance • Non Ideal Balance Processors don’t get equal amount of work. Exercise: Consider 10 sequential additions and 10,000 parallel additions using 100 processors. What is the speedup when a processor has 2% of the load instead of 1%? What about 5% of the load?

  7. Shared Memory Multiprocessors (SMP) • Single Physical Address Space • Uniform Memory Access (UMA) • NonuniformMemory Access (NUMA) • Synchronization (Lock) Processor Processor Processor ● ● ● Cache Cache Cache ● ● ● Interconnect I/OController MainMemory

  8. Message-Passing Multiprocessors • Private Physical Address Space • Send-Message & Receive-Message Routines Processor Processor Processor ● ● ● Cache Cache Cache ● ● ● MainMemory MainMemory MainMemory ● ● ● Interconnect I/OController

  9. Multithreading • Hardware Multithreading • Sharing Processor’s Functional Units Among Threads(Switch state from one thread to another when stalled) • Fine-Grained Multithreading • Switching State After Every Instruction • Coarse-Grained Multithreading • Switching State After a Cache Miss • Simultaneous Multithreading (SMT) • Multiple-Issue, Dynamically Scheduled Processor(Exploits thread-level & instruction-level parallelism)

  10. SISD, MIMD, SIMD, SPMD • Single-Instruction Single-Data • Uniprocessor • Multiple-Instruction Multiple-Data • Multiprocessor • Single-Instruction Multiple-Data • Vector/Array Processor (Data-Level Parallelism) • Single-Program Multiple-Data • Different Code Sections Execute in Parallel (MIMD)

  11. Chapter 7 The End

More Related