1 / 48

Where are we?

Where are we?. Chapter 1: Fundamentals of Computer Design What is Computer Architecture? Trends in technology—Moore’s law How to measure performance—CPU time Benchmarks—SPECCPU Appendix B: Instruction Set Principles and Examples How to design ISA RISC vs CISC Amdahl’s law

ciro
Download Presentation

Where are we?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Where are we? • Chapter 1: Fundamentals of Computer Design What is Computer Architecture? Trends in technology—Moore’s law How to measure performance—CPUtime Benchmarks—SPECCPU • Appendix B: Instruction Set Principles and Examples How to design ISA RISC vs CISC Amdahl’s law • Appendix A: Pipelining: Basic and Intermediate Concepts Pipeline Hazards & solutions • Appendix C: Review of Memory Hierarchy Need for memory hierarch Cache performance Virtual memory • We are here! • Chapter 2: Instruction-Level Parallelism and Its Exploitation • Chapter 3: Limits on Instruction-Level Parallelism • Chapter 4: Multiprocessors and Thread-Level Parallelism • Chapter 5: Memory Hierarchy Design • Chapter 6: Storage Systems CSCI 620 NOTE4

  2. Chapter 2Instruction-Level Parallelism and Its Exploitation • Instruction Level Parallelism (ILP) • Definition: Potential to overlap the execution of instructions • Pipelining is one way • Limitations of ILP are mainly from data and control hazards • Two largely separable approaches to overcoming limitations • dynamic approaches with hardware(Chapter 2.4—2.9) • static approaches that use software (Chapter 2.2—2.3) CSCI 620 NOTE4

  3. Major Techniques to increase ILP • Techniques • Reduces • Forwarding and bypassing • Potential data hazard stalls • Delayed branches and simple branch scheduling • Control hazard stalls • Basic dynamic scheduling (scoreboarding) • Data hazard stalls from true dependences • Dynamic scheduling with renaming • Data hazard stalls and stalls from antidependences and output dependences • Dynamic branch prediction • Control stalls • Issuing multiple instructions per cycle • Ideal CPI • Speculation • Data hazards and control hazard stalls • Dynamic memory disambiguation • Data hazard stalls with memory • Loop unrolling • Control hazard stalls • Basic compiler pipeline scheduling • Data hazard stalls • Compiler dependence analysis • Ideal CPI, data hazard stalls • Software pipelining, trace scheduling • Ideal CPI, data hazard stalls • Compiler speculation • Ideal CPI, data, control stalls CSCI 620 NOTE4

  4. Dynamic Approaches by Hardware • Dynamic instruction scheduling • • Scoreboarding • • Tomasulo’s Algorithm • • Register Renaming (removing artificial dependencies • WAR/WAWs) • Dynamic Branch Prediction • Superscalar/Multiple instruction Issue • Hardware-Based Speculation CSCI 620 NOTE4

  5. Dynamically Scheduled Pipeline (by hardware) • • Hardware tries to re-schedule (rearrange) instructions to improve performance while maintaining data flow & exception behavior • So far, simple pipelines issue instructions unless data dependence exists • Forwarding logic helps reduce hazards • But, if an unavoidable hazard occurs, then stall pipeline until the data dependence resolved—to overcome this, we can use compiler or static scheduling • Are there better way than the simple pipeline? • Dynamically Scheduled pipeline! Next slide CSCI 620 NOTE4

  6. EX Integer unit takes only one clock pulse Integer unit EX FP/Integermultiply MEM WB IF ID EX These are not pipelined, so no new instruction can enter the EX(being used) until the previous instruction leaves it. Also, when an instruction cannot proceed to EX, then the pipeline will be stalled FP adder EX FP/Integerdivider Figure A.29 The MIPS pipeline with three additional unpipelined, floating-point, functional units. CSCI 620 NOTE4

  7. Scheduling Alternatives • Let’s first look at the varieties of scheduling possible • • Static Scheduling • – Compiler tries to avoid/reduce dependencies • • Dynamic Scheduling • – Hardware tries to avoid/reduce stalling • • Why hardware and not compiler? (Advantages of Dynamic over Static Scheduling) • – Code Portability • – More information available dynamically (at run-time than at compile time) * Value of variables known at run time * Dependence unknown at compile time—e.g. memory reference LD R1, 100(R2) # R2=200 LD R3, 200(R4) # R4=100 CSCI 620 NOTE4

  8. Dynamic scheduling • Basic problem with pipelining techniques used so far is that they all use in-order instruction issue. • A stall of an instruction stalls all instructions behind it. • It is possible that the instructions that follow can issue. For example: DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14 • Basic out-of-order execution : • There is no reason why the CPU can't execute the SUB.D before the ADD.D. • Doing so will reduce the penalty caused by the data dependence stall—more ILP, improved FU utilization • However, this will force out-of-order completion , which causes problems handling exceptions. CSCI 620 NOTE4

  9. How to implement to allow Out-of-Order execution? • Previously: • – Instruction Decode and Operand Fetch are in a single cycle • In order to do out-of-order execution, we need to split ID stage into two phases: • Issue(IS) – Decode instruction and check for structural hazards ( When structural hazard detected, then stall the pipeline. With multiple FUs, less structural hazards ) All instructions pass through this stage in-order • Read Operands(Rd) – Wait until there are no data hazards, then read operands ( When an instruction has data hazard, then hardware(assuming it fetched several instructions) tries to re-schedule instructions (Scoreboarding)  out-of-order execution  out-of-order-completion ) • Out-of-order execution => possibility of WAR/WAW hazards • (ADDD dependent upon F0(of DIVD), so cannot be scheduled. SUBD can be scheduled ahead of ADDD, then WAR/WAW hazards can occur) • DIVD F0, F2, F4 DIVD F0, F2, F4 • ADDD F10, F0, F8 ADDD F10, F0, F8 • SUBD F8, F8, F14 SUBD F10, F8, F14 These need to be solved--Later CSCI 620 NOTE4

  10. Scoreboarding • This technique issues instructions in order ( in-order issue.) • However, instructions can bypass other waiting instructions in the "read operands" phase. • It is named after the CDC 6600, which was the first machine to use a scoreboard. CSCI 620 NOTE4

  11. Scoreboarding— go here for simulation • • Previously: • – Instruction Decode and Operand Fetch are in a single cycle • • Now: The goal of this technique (and other dynamic scheduling methods) is to maintain an execution rate of one instruction per cycle – To support multiple instructions in ID stage, we need two • things: • Buffered storage (Instruction Buffer/Window/Queue) • Split ID into two phases Instruction ISsue Read Operands CSCI 620 NOTE4

  12.     Integer unit FP add FP divide FP mult FP mult Scoreboard Registers Data buses Data flows Control/status flows Control/status Control/status Figure A.50 The basic structure of a MIPS processor with a scoreboard Scoreboard originally proposed in CDC6600 (Seymore Cray,1964) CSCI 620 NOTE4

  13. Functions of Scoreboard(Control logic for Centralized Hazard detection & resolution) • Keep record of data dependences between instructions • Determine when an instruction can/cannot read operands—keep monitoring • Also controls when an instruction can/cannot write ID EX MEM WB Simple pipeline IF replaces IF With Scoreboard EX IS MEM RD WB I-Buffer & Scoreboard IF & MEM stages are identical Every instruction goes through the scoreboard. The scoreboard determines when an instruction can read its operands and write its results. Therefore, all hazard detection and resolution are centralized. CSCI 620 NOTE4

  14. Scoreboarding Stages – Issue • Issue (Check for Structural Hazards) • – If the needed FU is free and no other active instruction has the same destination register then issue the instruction—(to guarantee that WAW cannot happen) • – Do not issue until structural hazards cleared • – Stalled instruction stay in I-Buffer • – When an instruction is stalled, it causes the buffer to fill up • – Size of buffer can become a structural Hazard • • Have to stall Fetch if buffer fills up • Algorithm: • Assure In-Order issue • Multiple issues per cycle are allowed (usage of multiple FUs) • Check if Destination Register is already reserved for writing (WAW) • Check if Read-Operand stage of Functional Unit is free (Structural) CSCI 620 NOTE4

  15. Scoreboarding Stages –Read Operands • • Read Operands (Check for Data Hazards) • – Check scoreboard for whether source operands are • available • – Available? • • Yes if no earlier issued active instructions will write to the register • – Scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order. • Algorithm: • Wait for operands to become available • Operand Caching is allowed • Forwarding from another WB stage is allowed CSCI 620 NOTE4

  16. Scoreboarding Stages –Execution/Write Result • 3. Execution: operate on operands (EX) • The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. This stage can be sub-pipelined—can take multiple cycles depending upon the Functions Unit • 4. Write result: finish execution (WB) • Once the scoreboard is aware that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, it stalls the instruction. CSCI 620 NOTE4

  17. Control logic for Scoreboardingfor MIPS with 5 FUs • 1. Instruction status: Indicates which of 4 steps the instruction is in. • 2. Functional unit status: Indicates the state of the functional unit (FU). 9 fields for each functional unit: • Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. (Alternatively: read and cached) • 3. Register result status: Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register. CSCI 620 NOTE4

  18. Scoreboard Example • Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  19. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  20. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  21. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  22. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  23. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  24. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  25. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  26. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  27. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  28. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  29. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  30. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  31. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  32. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  33. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  34. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  35. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  36. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  37. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  38. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  39. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  40. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  41. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  42. Skip some cycles CSCI 620 NOTE4

  43. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  44. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  45. Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4

  46. Review CSCI 620 NOTE4

  47. Review CSCI 620 NOTE4

  48. Scoreboarding Limitations • • Number and type of functional units • • Number of instruction buffer entries (scoreboard • size) • • Amount of application ILP (RAW hazards) • • Presence of antidependencies (WAR) and output • dependencies (WAW) • – In-order issue for WAW/Structural Hazards limits • scheduler • – WAR stalls are critical for loops (hardware loop • unrolling) CSCI 620 NOTE4

More Related