180 likes | 1.22k Views
CDA 3101 Discussion Section 11. Pipelining. Question 1. Suppose that time for an ALU operation can be shortened by 25% in the following figure a. Will it affect the speedup obtained from pipelining? If yes, by how much? If no, why? b. What if the ALU operation now takes 25% more time?.
E N D
CDA 3101 Discussion Section 11 Pipelining
Question 1 Suppose that time for an ALU operation can be shortened by 25% in the following figure a. Will it affect the speedup obtained from pipelining? If yes, by how much? If no, why? b. What if the ALU operation now takes 25% more time?
Question 1 • Shortening time for an ALU operation by 25% • It will not affect the speedup obtained from pipelining because the slowest stage time still remains 200ps(IF and MEM stage) • Lengthening time for ALU operation by 25% • It will affect the speedup obtained from pipelining because the slowest stage time will be 250ps. • Original speedup = 800/200 = 4 • New speedup = 850/250 = 3.4 • Therefore, speedup is 15% less
Question 2 Identify all of the data dependencies in the following code. add $3, $4, $2 sub $5, $3, $1 lw $6, 200($3) add $7, $3, $6 a. Which dependencies are data hazards that will be resolved via forwarding? b. Which dependencies are data hazards that will cause a stall?
Question 2 • Data Dependencies 1. Data dependency through $3 between the first instruction and each subsequent instructions 2. Data dependency through $6 between the last instruction and lw instruction • Dependencies that will be resolved via forwarding Dependencies between the first instruction and each subsequent instruction can be resolved via forwarding • Dependencies that will cause a stall Dependencies between the last instruction and lw instruction cannot be resolved via forwarding, so it will cause a stall
Question 3 Consider executing the following code on the pipelined datapath of Figure 4.56 lw $4, 100($2) sub $6, $4, $3 add $2, $3, $5 a. Draw a diagram that illustrates the dependencies that need to be resolved b. Provide another diagram that illustrates how the code will actually be executed c. How many cycles will it take to execute this code?
Question 3 Cont. Figure 4.56
Question 3-1 Diagram that illustrates the dependencies that need to be resolved
Question 3-2 Diagram that illustrates how the code will actually be executed
Question 3-3 Total cycles to execute the code is 8
Question 4 (4.12.6) The individual stages of the datapath latencies are as follows: Assume the instructions executed by the processor are broken down as follows: Instead of single-cycle organization, we can use a multi-cycle organization where each instruction takes multiple cycles but one instruction finishes before another is fetched. In this organization, an instruction only goes through stages it actually needs. Compare clock cycle times and execution times with single-cycle, multi-cycle, and pipelined organization.
Question 4 (4.12.6) In single-cycle, every instruction takes one clock cycle. In pipelined, a long-running program with no pipeline stalls completes on instruction in every cycle. Finally, a multi-cycle organization completes a lw in 5 cycles, a sw in 4 cycles(no WB), an ALU instruction in 4 cycles (no MEM), and a beq in 3 cycles (no WB no MEM). So we have the speed-up of pipeline. Multi-cycle (0.15 * 5 + 0.25*3 + 0.6 *4)*Cycle Time/Cycle Time = 3.9 Single-cycle 1650/500 = 3.30
Question 5 (4.13) In this exercise, we examine how data dependences affect execution in the basic five-stage pipeline. Problems in this exercise refer to the following sequence of instructions: lw $1, 40($6) add $6, $2, $2 sw $6, 50($1) • Assume there is no forwarding in this pipelined processor. Indicate hazards and add nop instructions to eliminate them. • Assume there is full forwarding. Indicate hazards and add nop instructions to eliminate them.
Question 5 cont. Assume the following clock cycle times: c) What is the total execution time of this instruction sequence without forwarding and with full forwarding? What is the speed-up achieved by adding full forwarding to a pipeline that had no forwarding? d) Add nop instruction to this code to eliminate hazards if there is ALU-ALU forwarding only(no forwarding from the MEM to the EX stage)? e) What is the total execution time of this instruction sequence with only ALU-ALU forwarding? What is the speed-up over a no-forwarding pipeline?
Question 5 • To avoid RAW hazard on $1, we need to delay I3 by one nop. To avoid RAW hazard on $6, we need to delay I3 by two nops. So together, we need to delay I3 by two nops. lw $1, 40($6) add $6, $2, $2 nop nop sw $6, 50($1) • No RAW hazard (forwarded) c) No forwarding: 9* 300 ps = 2700 ps With forwarding 7 * 400 ps = 2800 ps Speed-up0.96
Question 5 Cont. d) With ALU-ALU-only forwarding, an ALU instruction can forward to the next instruction, but not to the second-next instruction (because that would be forwarding from MEM to EX). A load cannot forward at all, because it determines the data value in MEM stage, when it is too late for ALU-ALU forwarding. We have: lw $1, 40($6) add $6, $2, $2 nop nop sw $6, 50($1) • No forwarding: 9* 300 = 2700 ps With ALU-ALU forwarding 9* 360 = 3240 ps Speedup 0.83