1 / 20

Parallelism (Multi-threaded)

Parallelism (Multi-threaded). Outline. A touch of theory Amdahl’s Law Metrics (speedup, efficiency, redundancy, utilization) Trade-offs Tightly coupled vs. Loosely coupled Shared Distributed vs Shared Centralized CMP vs SMT (and while we are at it: SSMT) Heterogeneous vs. Homogeneous

havyn
Download Presentation

Parallelism (Multi-threaded)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallelism (Multi-threaded)

  2. Outline • A touch of theory • Amdahl’s Law • Metrics (speedup, efficiency, redundancy, utilization) • Trade-offs • Tightly coupled vs. Loosely coupled • Shared Distributed vs Shared Centralized • CMP vs SMT (and while we are at it: SSMT) • Heterogeneous vs. Homogeneous • Interconnection networks • Other important issues • Cache Coherency • Memory Consistency

  3. Metrics • Speed-up: • Efficiency: • Utilization: • Redundancy:

  4. An example: a4*x^4 + a3*x^3 + a2*x^2 + a1*x + a0

  5. Tightly-coupled vs Loosely-coupled • Tightly coupled (i.e., Multiprocessor) • Shared memory • Each processor capable of doing work on its own • Easier for the software • Hardware has to worry about cache coherency, memory contention • Loosely-coupled (i.e., Multicomputer Network) • Message passing • Easier for the hardware • Programmer’s job is tougher • An early distributed shared example: cm*

  6. CMP vs SMT • CMP (chip multiprocessor) • aka Multi-core • SMT (simultaneous multithreading • One core – a PC for each thread, in a pipelined fashion • Multithreading introduced by Burton Smith, HEP ~1975 • “Simultaneous” first proposed by H.Hirata et.al, ISCA 1992 • Carried forward by Nemirovsky, HICSS 1994 • Popularized by Tullsen, Eggers, ISCA 1995

  7. Heterogeneous vs Homogeneous Large core Largecore Large core Largecore Largecore ACMP Approach “Tile-Large” Approach “Niagara” Approach Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore Niagara-likecore

  8. Large core vs. Small Core Out-of-order Wide fetch e.g. 4-wide Deeper pipeline Aggressive branch predictor (e.g. hybrid) Many functional units Trace cache Memory dependence speculation LargeCore SmallCore • In-order • Narrow Fetch e.g. 2-wide • Shallow pipeline • Simple branch predictor (e.g. Gshare) • Few functional units

  9. Throughput vs. Serial Performance

  10. Interconnection Networks • Basic Topologies • Bus • Crossbar • Omega Network • Hypercube • Tree • Mesh • Ring • Parameters: Cost, Latency, Contention • Example: Omega network, N=64, k=2

  11. Three issues • Interconnection networks • Cost • Latency • Contention • Cache Cohererency • Snoopy • Directory • Memory Consistency • Sequential Consistency and Mutual Exclusion

  12. Sequential Consistency • Definition: Memory sees loads/stores in program order • Examples • Why it is important: guarantees mutual exclusion

  13. Two Processors and a critical section Processor 1 Processor 2 L1=0 L2=0 A: L1 = 1 X: L2=1 B: If L2 =0 Y: If L1=0 {critical section} {critical section} C: L1=0 Z: L2=0 What can happen?

  14. How many orders can memory see A,B,C,X,Y,Z(Let’s list them!)

  15. Cache Coherence • Definition: If a cache line is present in more than one cache, it has the same values in all caches • Why is it important? • Snoopy schemes • All caches snoop the common bus • Directory schemes • Each cache line has a directory entry in memory

  16. A Snoopy Scheme

  17. Directory Scheme (p=3)

More Related