170 likes | 301 Views
Multithreading. Peer Instruction Lecture Materials for Computer Architecture by Dr. Leo Porter is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Fine-Grain Multithreading.
E N D
Multithreading Peer Instruction Lecture Materials for Computer ArchitecturebyDr. Leo Porteris licensed under aCreative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
Fine-Grain Multithreading • Fine grain multithreading performs a switch between threads EVERY cycle. What is the primary goal of such an approach?
Fine-Grain Multithreading • Fine grain multithreading performs a switch between threads EVERY cycle. What is the primary drawback of such an approach?
Course-Grain Multithreading • Course grain multithreading performs a switch between threads whenever one thread encounters a high latency event. What is the primary goal of such an approach?
Course-Grain Multithreading • Course grain multithreading performs a switch between threads whenever one thread encounters a high latency event. What is the primary drawback of such an approach?
Context Switch • What happens on context switch? • Transfer of register state • Transfer of PC • Draining of the pipeline • Additionally: • Warm up caches • Warm up branch predictors
Multithreading Issue Width Issue Width Issue Width Coarse Grain Fine Grain SMT
Simultaneous Multithreading Point is – if you can just fetch from multiple streams – the processor is usually over provisioned anyway • More functional units • Larger instruction queue • Larger reorder buffer • Means to differentiate between threads in the instruction queue, regrename, and reorder buffer • Ability to fetch from multiple programs Given a modern out of order processor with register renaming, inst. queue, reorder buffer, etc. – What is REQUIRED to perform speculative multithreading
Modern OOO Processor L1 Instruction Queue Reorder Buffer INT ALU Load Queue INT ALU Store Queue Register Rename Fetch Decode INT ALU FP ALU FP ALU Draw just the need to fetch more insructions
SMT vs. early multi-core • The argument was between a single aggressive SMT out-of-order processor and a number of simpler processors. • At the time – the advantage for the simpler processors was a higher clock rate. • The disadvantage for the simpler processors were lack of functional units / in-order execution / smaller caches/ etc.
SMT vs. early CMP • SMT – 4 issue, 4 int ALU, 4 FP ALU • CMP – 2 cores each 2-issue, 2 int ALU, 2 FP ALUs • Say you have 4 threads • Say you have 2 threads – one is floating point intense and the other is integer intense • Say you have 1 thread Point out single thread drives benchmark tests – no one buys a processor which does worse!
Multi-core recently • Instruction queues were taking up 20% of a core area for 4-issue, how complex would it be for 8-issue? • Simpler hardware does not mean faster CR. • Tons of die space. • Larger caches weren’t helping performance that much • Why not just replicate a single advanced processor (core)?
SMT vs. CMP - Revised • SMT – 4 issue, 4 int ALU, 4 FP ALU • CMP – 2 cores each 4-issue, 4 int ALU, 4 FP ALUs • Say you have 4 threads • Say you have 2 threads – one is floating point intense and the other is integer intense. • Say you have 1 thread….
Multi-core Today • 4-8 cores per chip. “Multi-core Era” • Throughput scales well with the number of cores. • Each core is frequently SMT as well (for more throughput) • Great when you have 4-8 threads (most of us have a fair number at any given time) • What to do when we get 128 cores (“Many core era”)??
Multithreading Key Points • Simultaneous Multithreading • Inexpensive addition to increase throughput for multiple threads • Enables good throughput for multiple threads • Does not impact single thread performance • Single Chip Multiprocessors • ILP wall/Memory Wall/ Power Wall – all point to multi-core • Enables excellent throughput for multiple threads • Where do we find all these threads? Field of dreams argument