480 likes | 718 Views
Where are we?. Chapter 1: Fundamentals of Computer Design What is Computer Architecture? Trends in technology—Moore’s law How to measure performance—CPU time Benchmarks—SPECCPU Appendix B: Instruction Set Principles and Examples How to design ISA RISC vs CISC Amdahl’s law
E N D
Where are we? • Chapter 1: Fundamentals of Computer Design What is Computer Architecture? Trends in technology—Moore’s law How to measure performance—CPUtime Benchmarks—SPECCPU • Appendix B: Instruction Set Principles and Examples How to design ISA RISC vs CISC Amdahl’s law • Appendix A: Pipelining: Basic and Intermediate Concepts Pipeline Hazards & solutions • Appendix C: Review of Memory Hierarchy Need for memory hierarch Cache performance Virtual memory • We are here! • Chapter 2: Instruction-Level Parallelism and Its Exploitation • Chapter 3: Limits on Instruction-Level Parallelism • Chapter 4: Multiprocessors and Thread-Level Parallelism • Chapter 5: Memory Hierarchy Design • Chapter 6: Storage Systems CSCI 620 NOTE4
Chapter 2Instruction-Level Parallelism and Its Exploitation • Instruction Level Parallelism (ILP) • Definition: Potential to overlap the execution of instructions • Pipelining is one way • Limitations of ILP are mainly from data and control hazards • Two largely separable approaches to overcoming limitations • dynamic approaches with hardware(Chapter 2.4—2.9) • static approaches that use software (Chapter 2.2—2.3) CSCI 620 NOTE4
Major Techniques to increase ILP • Techniques • Reduces • Forwarding and bypassing • Potential data hazard stalls • Delayed branches and simple branch scheduling • Control hazard stalls • Basic dynamic scheduling (scoreboarding) • Data hazard stalls from true dependences • Dynamic scheduling with renaming • Data hazard stalls and stalls from antidependences and output dependences • Dynamic branch prediction • Control stalls • Issuing multiple instructions per cycle • Ideal CPI • Speculation • Data hazards and control hazard stalls • Dynamic memory disambiguation • Data hazard stalls with memory • Loop unrolling • Control hazard stalls • Basic compiler pipeline scheduling • Data hazard stalls • Compiler dependence analysis • Ideal CPI, data hazard stalls • Software pipelining, trace scheduling • Ideal CPI, data hazard stalls • Compiler speculation • Ideal CPI, data, control stalls CSCI 620 NOTE4
Dynamic Approaches by Hardware • Dynamic instruction scheduling • • Scoreboarding • • Tomasulo’s Algorithm • • Register Renaming (removing artificial dependencies • WAR/WAWs) • Dynamic Branch Prediction • Superscalar/Multiple instruction Issue • Hardware-Based Speculation CSCI 620 NOTE4
Dynamically Scheduled Pipeline (by hardware) • • Hardware tries to re-schedule (rearrange) instructions to improve performance while maintaining data flow & exception behavior • So far, simple pipelines issue instructions unless data dependence exists • Forwarding logic helps reduce hazards • But, if an unavoidable hazard occurs, then stall pipeline until the data dependence resolved—to overcome this, we can use compiler or static scheduling • Are there better way than the simple pipeline? • Dynamically Scheduled pipeline! Next slide CSCI 620 NOTE4
EX Integer unit takes only one clock pulse Integer unit EX FP/Integermultiply MEM WB IF ID EX These are not pipelined, so no new instruction can enter the EX(being used) until the previous instruction leaves it. Also, when an instruction cannot proceed to EX, then the pipeline will be stalled FP adder EX FP/Integerdivider Figure A.29 The MIPS pipeline with three additional unpipelined, floating-point, functional units. CSCI 620 NOTE4
Scheduling Alternatives • Let’s first look at the varieties of scheduling possible • • Static Scheduling • – Compiler tries to avoid/reduce dependencies • • Dynamic Scheduling • – Hardware tries to avoid/reduce stalling • • Why hardware and not compiler? (Advantages of Dynamic over Static Scheduling) • – Code Portability • – More information available dynamically (at run-time than at compile time) * Value of variables known at run time * Dependence unknown at compile time—e.g. memory reference LD R1, 100(R2) # R2=200 LD R3, 200(R4) # R4=100 CSCI 620 NOTE4
Dynamic scheduling • Basic problem with pipelining techniques used so far is that they all use in-order instruction issue. • A stall of an instruction stalls all instructions behind it. • It is possible that the instructions that follow can issue. For example: DIV.D F0, F2, F4 ADD.D F10, F0, F8 SUB.D F12, F8, F14 • Basic out-of-order execution : • There is no reason why the CPU can't execute the SUB.D before the ADD.D. • Doing so will reduce the penalty caused by the data dependence stall—more ILP, improved FU utilization • However, this will force out-of-order completion , which causes problems handling exceptions. CSCI 620 NOTE4
How to implement to allow Out-of-Order execution? • Previously: • – Instruction Decode and Operand Fetch are in a single cycle • In order to do out-of-order execution, we need to split ID stage into two phases: • Issue(IS) – Decode instruction and check for structural hazards ( When structural hazard detected, then stall the pipeline. With multiple FUs, less structural hazards ) All instructions pass through this stage in-order • Read Operands(Rd) – Wait until there are no data hazards, then read operands ( When an instruction has data hazard, then hardware(assuming it fetched several instructions) tries to re-schedule instructions (Scoreboarding) out-of-order execution out-of-order-completion ) • Out-of-order execution => possibility of WAR/WAW hazards • (ADDD dependent upon F0(of DIVD), so cannot be scheduled. SUBD can be scheduled ahead of ADDD, then WAR/WAW hazards can occur) • DIVD F0, F2, F4 DIVD F0, F2, F4 • ADDD F10, F0, F8 ADDD F10, F0, F8 • SUBD F8, F8, F14 SUBD F10, F8, F14 These need to be solved--Later CSCI 620 NOTE4
Scoreboarding • This technique issues instructions in order ( in-order issue.) • However, instructions can bypass other waiting instructions in the "read operands" phase. • It is named after the CDC 6600, which was the first machine to use a scoreboard. CSCI 620 NOTE4
Scoreboarding— go here for simulation • • Previously: • – Instruction Decode and Operand Fetch are in a single cycle • • Now: The goal of this technique (and other dynamic scheduling methods) is to maintain an execution rate of one instruction per cycle – To support multiple instructions in ID stage, we need two • things: • Buffered storage (Instruction Buffer/Window/Queue) • Split ID into two phases Instruction ISsue Read Operands CSCI 620 NOTE4
Integer unit FP add FP divide FP mult FP mult Scoreboard Registers Data buses Data flows Control/status flows Control/status Control/status Figure A.50 The basic structure of a MIPS processor with a scoreboard Scoreboard originally proposed in CDC6600 (Seymore Cray,1964) CSCI 620 NOTE4
Functions of Scoreboard(Control logic for Centralized Hazard detection & resolution) • Keep record of data dependences between instructions • Determine when an instruction can/cannot read operands—keep monitoring • Also controls when an instruction can/cannot write ID EX MEM WB Simple pipeline IF replaces IF With Scoreboard EX IS MEM RD WB I-Buffer & Scoreboard IF & MEM stages are identical Every instruction goes through the scoreboard. The scoreboard determines when an instruction can read its operands and write its results. Therefore, all hazard detection and resolution are centralized. CSCI 620 NOTE4
Scoreboarding Stages – Issue • Issue (Check for Structural Hazards) • – If the needed FU is free and no other active instruction has the same destination register then issue the instruction—(to guarantee that WAW cannot happen) • – Do not issue until structural hazards cleared • – Stalled instruction stay in I-Buffer • – When an instruction is stalled, it causes the buffer to fill up • – Size of buffer can become a structural Hazard • • Have to stall Fetch if buffer fills up • Algorithm: • Assure In-Order issue • Multiple issues per cycle are allowed (usage of multiple FUs) • Check if Destination Register is already reserved for writing (WAW) • Check if Read-Operand stage of Functional Unit is free (Structural) CSCI 620 NOTE4
Scoreboarding Stages –Read Operands • • Read Operands (Check for Data Hazards) • – Check scoreboard for whether source operands are • available • – Available? • • Yes if no earlier issued active instructions will write to the register • – Scoreboard resolves RAW hazards dynamically in this step, and instructions may be sent into execution out of order. • Algorithm: • Wait for operands to become available • Operand Caching is allowed • Forwarding from another WB stage is allowed CSCI 620 NOTE4
Scoreboarding Stages –Execution/Write Result • 3. Execution: operate on operands (EX) • The functional unit begins execution upon receiving operands. When the result is ready, it notifies the scoreboard that it has completed execution. This stage can be sub-pipelined—can take multiple cycles depending upon the Functions Unit • 4. Write result: finish execution (WB) • Once the scoreboard is aware that the functional unit has completed execution, the scoreboard checks for WAR hazards. If none, it writes results. If WAR, it stalls the instruction. CSCI 620 NOTE4
Control logic for Scoreboardingfor MIPS with 5 FUs • 1. Instruction status: Indicates which of 4 steps the instruction is in. • 2. Functional unit status: Indicates the state of the functional unit (FU). 9 fields for each functional unit: • Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. (Alternatively: read and cached) • 3. Register result status: Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register. CSCI 620 NOTE4
Scoreboard Example • Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Skip some cycles CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Busy – Indicates whether the unit is busy or not • Op – Operation to perform in the unit (e.g., add or subtract) • Fi – Destination register • Fj, Fk – Source-register numbers • Qj, Qk – Functional units producing source registers Fj, Fk • Rj, Rk – Flags indicating when Fj, Fk are available and not yet read. CSCI 620 NOTE4
Review CSCI 620 NOTE4
Review CSCI 620 NOTE4
Scoreboarding Limitations • • Number and type of functional units • • Number of instruction buffer entries (scoreboard • size) • • Amount of application ILP (RAW hazards) • • Presence of antidependencies (WAR) and output • dependencies (WAW) • – In-order issue for WAW/Structural Hazards limits • scheduler • – WAR stalls are critical for loops (hardware loop • unrolling) CSCI 620 NOTE4