1 / 16

Multithreading

Multithreading. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Fine-Grain Multithreading.

Download Presentation

Multithreading

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multithreading Peer Instruction Lecture Materials for Computer ArchitecturebyDr. Leo Porteris licensed under aCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

  2. Fine-Grain Multithreading • Fine grain multithreading performs a switch between threads EVERY cycle. What is the primary goal of such an approach?

  3. Fine-Grain Multithreading • Fine grain multithreading performs a switch between threads EVERY cycle. What is the primary drawback of such an approach?

  4. Course-Grain Multithreading • Course grain multithreading performs a switch between threads whenever one thread encounters a high latency event. What is the primary goal of such an approach?

  5. Course-Grain Multithreading • Course grain multithreading performs a switch between threads whenever one thread encounters a high latency event. What is the primary drawback of such an approach?

  6. Context Switch • What happens on context switch? • Transfer of register state • Transfer of PC • Draining of the pipeline • Additionally: • Warm up caches • Warm up branch predictors

  7. Multithreading Issue Width Issue Width Issue Width Coarse Grain Fine Grain SMT

  8. Simultaneous Multithreading Point is – if you can just fetch from multiple streams – the processor is usually over provisioned anyway • More functional units • Larger instruction queue • Larger reorder buffer • Means to differentiate between threads in the instruction queue, regrename, and reorder buffer • Ability to fetch from multiple programs Given a modern out of order processor with register renaming, inst. queue, reorder buffer, etc. – What is REQUIRED to perform speculative multithreading

  9. Modern OOO Processor L1 Instruction Queue Reorder Buffer INT ALU Load Queue INT ALU Store Queue Register Rename Fetch Decode INT ALU FP ALU FP ALU Draw just the need to fetch more insructions

  10. SMT vs. early multi-core • The argument was between a single aggressive SMT out-of-order processor and a number of simpler processors. • At the time – the advantage for the simpler processors was a higher clock rate. • The disadvantage for the simpler processors were lack of functional units / in-order execution / smaller caches/ etc.

  11. SM vs. MP

  12. SMT vs. early CMP • SMT – 4 issue, 4 int ALU, 4 FP ALU • CMP – 2 cores each 2-issue, 2 int ALU, 2 FP ALUs • Say you have 4 threads • Say you have 2 threads – one is floating point intense and the other is integer intense • Say you have 1 thread Point out single thread drives benchmark tests – no one buys a processor which does worse!

  13. Multi-core recently • Instruction queues were taking up 20% of a core area for 4-issue, how complex would it be for 8-issue? • Simpler hardware does not mean faster CR. • Tons of die space. • Larger caches weren’t helping performance that much • Why not just replicate a single advanced processor (core)?

  14. SMT vs. CMP - Revised • SMT – 4 issue, 4 int ALU, 4 FP ALU • CMP – 2 cores each 4-issue, 4 int ALU, 4 FP ALUs • Say you have 4 threads • Say you have 2 threads – one is floating point intense and the other is integer intense. • Say you have 1 thread….

  15. Multi-core Today • 4-8 cores per chip. “Multi-core Era” • Throughput scales well with the number of cores. • Each core is frequently SMT as well (for more throughput) • Great when you have 4-8 threads (most of us have a fair number at any given time) • What to do when we get 128 cores (“Many core era”)??

  16. Multithreading Key Points • Simultaneous Multithreading • Inexpensive addition to increase throughput for multiple threads • Enables good throughput for multiple threads • Does not impact single thread performance • Single Chip Multiprocessors • ILP wall/Memory Wall/ Power Wall – all point to multi-core • Enables excellent throughput for multiple threads • Where do we find all these threads? Field of dreams argument

More Related