240 likes | 470 Views
Instruction Level Parallelism and Tomasulo’s approach. Vincent H. Berk October 7, 2005 Reading for today: chapter A.8 Reading for Monday: chapter 3.2 – 3.6 Homework #2: due Friday 14 th , 2.8, A.2, A.13, 3.6a&b, 3.10, 4.5, 4.8, (4.13 optional). Instruction Level Parallelism.
E N D
ENGS 116 Lecture 8 Instruction Level Parallelism andTomasulo’s approach • Vincent H. Berk • October 7, 2005 • Reading for today: chapter A.8 • Reading for Monday: chapter 3.2 – 3.6 • Homework #2: due Friday 14th, 2.8, A.2, A.13, 3.6a&b, 3.10, 4.5, 4.8, (4.13 optional)
ENGS 116 Lecture 8 Instruction Level Parallelism • Pipeline CPI = Ideal pipeline CPI + Structural stalls + Data hazard stalls + Control stalls • Reduce stalls, reduce CPI • Reduce CPI, increase IPC • Instruction-level parallelism (ILP) seeks to reduce stalls • Loop-level parallelism is easiest to see: • for (i=1; i<100; i=i+1) • { • A[i] = B[i] + C[i]; • D[i] = E[i] + F[i]; • }
ENGS 116 Lecture 8 Instruction Level Parallelism • ILP in SW (static) or HW (dynamic) • HW intensive ILP dominates desktop and server markets • SW compiler intensive approaches more likely seen in embedded systems
ENGS 116 Lecture 8 Dependences • Two instructions are parallel if they can execute simultaneously in a pipeline without causing any stalls (assuming no structural hazards) and can be reordered • Two instructions that are dependent are not parallel and cannot be reordered • Types of dependences • Data dependences • Name dependences • Control dependences
ENGS 116 Lecture 8 Dependences • Dependences are properties of programs • Hazards are properties of the pipeline organization • Dependence indicates the potential for a hazard • Compiler concerned about dependences in program, whether or not a HW hazard occurs depends on a given pipeline
ENGS 116 Lecture 8 Review of Hazards • Consider instructions i and j, where i occurs before j. • RAW (read after write) — j tries to read a source before i writes it, so j gets the old value • WAW (write after write) — j tries to write an operand before it is written by i (only possible in pipelines that write in more than one pipe stage or allow an instruction to proceed even when a previous instruction is stalled) • WAR (write after read) — j tries to write a destination before it is read by i, so i incorrectly gets the new value (only possible when some instructions can write results early in the pipeline and other instructions can read sources late in the pipeline)
ENGS 116 Lecture 8 Data Dependences • (True) Data dependences (RAW if a hazard for HW) • Instruction i produces a result used by instruction j, or • Instruction j is data dependent on instruction k, and instruction k is data dependent on instruction i. • Easy to determine for registers (fixed names) • Hard for memory: • Does 100(R4) = 20(R6)? • From different loop iterations, does 20(R6) = 20(R6)?
ENGS 116 Lecture 8 Name Dependences • Another kind of dependence called name dependence: two instructions use same name but don’t exchange data • Antidependence (WAR if a hazard for HW) • Instruction j writes a register or memory location that instruction i reads from and instruction i is executed first • Output dependence (WAW if a hazard for HW) • Instruction i and instruction j write the same register or memory location; ordering between instructions must be preserved
ENGS 116 Lecture 8 Name Dependences • Hard for memory accesses • Does 100(R4) = 20 (R6)? • From different loop iterations, does 20(R6) = 20(R6)? • Example of renaming: • DIV.D F0,F2,F4 DIV.D F0,F2,F4 • ADD.D F6,F0,F8 ADD.D S,F0,F8 • S.D F6, 0(R1) S.D S, 0(R1) • SUB.D F8,F10,F14 SUB.D T,F10,F14 • MUL.D F6,F10,F8 MUL.D F6,F10,T
ENGS 116 Lecture 8 Control Dependence • Final kind of dependence called control dependence • Example if pl {S1;} if p2 {S2;} • S1 is control dependent on p1 and S2 is control dependent on p2 but not on p1. • Note that S2 could be data dependent on S1.
ENGS 116 Lecture 8 Control Dependences • Two (obvious) constraints on control dependences: • An instruction that is control dependent on a branch cannot be moved before the branch so that its execution is no longer controlled by the branch • An instruction that is not control dependent on a branch cannot be moved to after the branch so that its execution is controlled by the branch • Control dependences often relaxed to get parallelism; get same effect if we preserve order of exceptions and data flow
ENGS 116 Lecture 8 Hardware Schemes for ILP • Why in hardware at run time? • Works when dependence is not known at run time • Simplifies compiler • Allows code for one machine to run well on another • Key idea: Allow instructions behind stall to proceed • DIVD F0, F2, F4 • ADDD F10, F0, F8 • SUBD F12, F8, F14 • Enables out-of-order execution out-of-order completion • ID stage checks for both structural hazards and data dependences
ENGS 116 Lecture 8 Hardware Schemes for ILP • Out-of-order execution divides ID stage: 1. Issue — decode instructions, check for structural hazards 2. Read operands — wait until no data hazards, then read operands
ENGS 116 Lecture 8 Tomasulo’s Algorithm • For IBM 360/91 about 3 years after CDC 6600 • Goal: High performance without special compilers • Differences between IBM 360 & CDC 6600 ISA • IBM has only 2 register specifiers/instruction vs. 3 in CDC 6600 • IBM has 4 FP registers vs. 8 in CDC 6600 • Differences between Tomasulo’s Algorithm & Scoreboard • Control & buffers (called “reservation stations”) distributed with functional units vs. centralized in scoreboard • Registers in instructions replaced by pointers to reservation station buffer • HW renaming of registers to avoid WAR, WAW hazards • Common data bus (CDB) broadcasts results to functional units • Load and stores treated as functional units as well • Alpha 21264, HP 8000, MIPS 10000, Pentium III, PowerPC 604, ...
ENGS 116 Lecture 8 Three Stages of Tomasulo Algorithm • 1. Issue: Get instruction from FP operation queue • If reservation station free, issues instruction & sends operands (renames registers). • 2. Execution: Operate on operands (EX) • When operands ready then execute; if not ready, watch common data bus for result. • 3. Write result: Finish execution (WB) • Write on common data bus to all awaiting units; mark reservation station available. • Common data bus: data + source (“come from” bus)
ENGS 116 Lecture 8 Tomasulo Organization From Instruction Unit FP Registers From Memory Load Buffers FP Op Queue Store Buffers Operand Bus To Memory Operation Bus FP Add Res. Station FP Mul Res. Station Reservation Stations FP Adders FP Multipliers Common data bus (CDB)
ENGS 116 Lecture 8 Reservation Station Components • Op – Operation to perform in the unit (e.g., + or – ) • Qj, Qk – Reservation stations producing source registers • Vj, Vk – Value of source operands • Rj, Rk – Flags indicating when Vj, Vk are ready • Busy – Indicates reservation station and FU is busy • Register result status – Indicates which functional unit will write each register, if one exists. Blank when no pending instructions will write that register.
ENGS 116 Lecture 8 Tomasulo Example Cycle 1
ENGS 116 Lecture 8 Tomasulo Example Cycle 2
ENGS 116 Lecture 8 Tomasulo Example Cycle 3 Register names are renamed in reservation stations Load1 completing — who is waiting for Load1?
ENGS 116 Lecture 8 Tomasulo Example Cycle 4 Load2 completing — who is waiting for it?
ENGS 116 Lecture 8 Tomasulo Example Cycle 5
ENGS 116 Lecture 8 Tomasulo Example Cycle 6
ENGS 116 Lecture 8 Tomasulo Summary • Reservation stations: renaming to larger set of registers + buffering source operands • Prevents registers as bottleneck • Avoids WAR, WAW hazards of scoreboard • Allows loop unrolling in HW • Not limited to basic blocks • (integer units get ahead, beyond branches) • Lasting Contributions • Dynamic scheduling • Register renaming • Load/store disambiguation • 360/91 descendants are Pentium III; PowerPC 604; MIPS R10000; HP-PA 8000; Alpha 21264