1 / 24

ECE562/468 Advanced Computer Architecture Prof. Honggang Wang

ECE Department University of Massachusetts Dartmouth 285 Old Westport Rd. North Dartmouth, MA 02747-2300. Ch3. Limits on Instruction-Level Parallelism 1. ILP Limits 2. SMT ( S imultaneous M ulti t hreading). ECE562/468 Advanced Computer Architecture Prof. Honggang Wang.

langer
Download Presentation

ECE562/468 Advanced Computer Architecture Prof. Honggang Wang

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE Department University of Massachusetts Dartmouth285 Old Westport Rd.North Dartmouth, MA 02747-2300 Ch3.Limits on Instruction-Level Parallelism1. ILP Limits2. SMT(Simultaneous Multithreading) ECE562/468 Advanced Computer Architecture Prof. Honggang Wang Slides based on the PowerPoint Presentations created by David Patterson as part of the Instructor Resources for the textbook by Hennessy & Patterson Updated by Honggang Wang.

  2. Administrative Issues (04/01/2014) • Interim Report is due on Tuesday, April. 8 • Submit a narrative short report and PPT slides with preliminary results included • It is due on Tuesday, April 1, 2014 (Original schedule ) • Draft of Final Report is due on Thursday, April 24, 2014 • Submit a narrative semi-final report containing figures, tables, graphs and references • Final Project Report is due on Tuesday, May 6, 2014 • Submit one hardcopy & one softcopy of your complete report and PPT slides. Orally present your report with PPT slides. • My office hours: • T./TH. 1-2pm, Fri. 1:00-3:00 pm www.faculty.umassd.edu/honggang.wang/teaching.html

  3. Outline • Limits to ILP (another perspective) • 5 Assumptions for an Ideal Processor • Realizable Processors • HW vs. SW Speculation • SMT (Simultaneous Multithreading) • Thread Level Parallelism • Multithreading • Power 4 vs. Power 5 • Head to Head: VLIW vs. Superscalar vs. SMT

  4. Limits to ILP • How much ILP is available using existing mechanisms with increasing HW budgets? • Do we need to invent new HW/SW mechanisms to keep on processor performance curve? • Intel MMX, SSE (Streaming SIMD Extensions): 64 bit ints • Intel SSE2: 128 bit, including 2 64-bit Fl. Pt. per clock • Motorola AltaVec: 128 bit ints and FPs • Supersparc Multimedia ops, etc.

  5. Overcoming Limits • Advances in compiler technology + significantly new and different hardware techniques may be able to overcome limitations assumed in studies • However, unlikely such advances when coupled with realistic hardware will overcome these limits in near future

  6. Limits to ILP Initial HW Model here; MIPS compilers. Assumptions for ideal/perfect machine to start: • Register renaming – infinite virtual registers => all register WAW & WAR hazards are avoided • Branch prediction – perfect; no mispredictions • Jump prediction – all jumps perfectly predicted (returns, case statements) Assumptions 2 & 3  eliminate all control dependencies; perfect speculation & an unbounded buffer of instructions available • Memory-address alias analysis – addresses known & a load can be moved before a store provided that addresses are not equal Assumptions 1 & 4  eliminate all data dependencies but RAW • Perfect caches – 1 cycle latency for all instructions (FP *,/); unlimited instructions issued/clock cycle

  7. Limits to ILP HW Model comparison

  8. Upper Limit to ILP: Ideal Machine(Figure 3.1) FP: 75 - 150 Integer: 18 - 60 Instructions Per Clock

  9. Limits to ILP HW Model comparison

  10. More Realistic HW: Window ImpactFigure 3.2 Change from Infinite window, 2048, 512, 128, 32 FP: 9 - 150 Integer: 8 - 63

  11. Limits to ILP HW Model comparison

  12. More Realistic HW: Branch ImpactFigure 3.3 FP: 15 - 45 Change from Infinite window to examine to 2048 and maximum issue of 64 instructions per clock cycle Integer: 6 - 12 Perfect Tournament BHT (512) Profile No prediction

  13. Misprediction RatesFigure 3.4

  14. Limits to ILP HW Model comparison

  15. More Realistic HW: Renaming Register Impact (N int + N fp) Figure 3.5 FP: 11 - 45 Change 2048 instr window, 64 instr issue, 8K 2 level Prediction Integer: 5 - 15 Infinite 256 128 64 32 None

  16. Limits to ILP HW Model comparison

  17. More Realistic HW: Memory Address Alias ImpactFigure 3.6 Change 2048 instr window, 64 instr issue, 8K 2 level Prediction, 256 renaming registers FP: 4 - 45 (Fortran, no heap) Integer: 4 - 9 Global/Stack perf;heap conflicts Perfect Inspec.Assem. None

  18. Outline • Limits to ILP (another perspective) • 5 Assumptions for an Ideal Processor • Realizable Processors • HW vs. SW Speculation • SMT (Simultaneous Multithreading) • Thread Level Parallelism • Multithreading • Power 4 vs. Power 5 • Head to Head: VLIW vs. Superscalar vs. SMT

  19. Limits to ILP HW Model comparison

  20. Realistic HW: Window Impact(Figure 3.7) Perfect disambiguation (HW), 1K Selective Prediction, 16 entry return, 64 registers, issue as many as window FP: 8 - 45 Integer: 6 - 12 Infinite 256 128 64 32 16 8 4

  21. How to Exceed ILP Limits of this study? • These are not laws of physics; just practical limits for today, and perhaps overcome via research • Compiler and ISA advances could change results • WAR and WAW hazards through memory: eliminated WAW and WAR hazards through register renaming, but not in memory usage • Can get conflicts via allocation of stack frames as a called procedure reuses the memory addresses of a previous frame on the stack

  22. Outline • Limits to ILP (another perspective) • 5 Assumptions for an Ideal Processor • Realizable Processors • HW vs. SW Speculation • SMT (Simultaneous Multithreading) • Thread Level Parallelism • Multithreading • Power 4 vs. Power 5 • Head to Head: VLIW vs. Superscalar vs. SMT

  23. HW v. SW to increase ILP • Memory disambiguation: HW best • Speculation: • HW best when dynamic branch prediction better than compile time prediction • Exceptions easier for HW • HW doesn’t need bookkeeping code or compensation code • Very complicated to get right • Scheduling: SW can look ahead to schedule better • Compiler independence: does not require new compiler, recompilation to run well

  24. Outline • Limits to ILP (another perspective) • 5 Assumptions for an Ideal Processor • Realizable Processors • HW vs. SW Speculation • SMT (Simultaneous Multithreading) • Thread Level Parallelism • Multithreading • Power 4 vs. Power 5 • Head to Head: VLIW vs. Superscalar vs. SMT

More Related