220 likes | 403 Views
CPE 232 Computer Organization Basic MIPS Pipelining – Part I. Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html. Single Cycle Implementation:. Cycle 1. Cycle 2. Clk. lw. sw. Waste.
E N D
CPE 232 Computer Organization Basic MIPS Pipelining – Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html
Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste multicycle clock slower than 1/5th of single cycle clock due to stage register overhead Multiple Cycle Implementation: IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 lw sw R-type Review: Single Cycle vs. Multiple Cycle Timing
How Can We Make It Even Faster? • Split the multiple instruction cycle into smaller and smaller steps • There is a point of diminishing returns where as much time is spent loading the state registers as doing the work • Start fetching and executing the next instruction before the current one has completed • Pipelining – (all?) modern processors are pipelined for performance • Remember the performance equation: • CPU time = CPI * CC * IC • Fetch (and execute) more than one instruction at a time • Superscalar processing
IFetch IFetch IFetch Exec Exec Exec Mem Mem Mem WB WB WB A Pipelined MIPS Processor • In multicycle implementation, one functional unit is used in each clock cycle. Use idle units to do something else ? • Start the next instruction before the current one has completed • improves throughput - total amount of work done in a given time • instruction latency (execution time, delay time, response time - time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Dec lw Dec sw Dec R-type • clock cycle (pipeline stage time) is limited by the slowest stage • for some instructions, some stages are wasted cycles
Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Pipeline Implementation: IFetch Dec Exec Mem WB lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type Single Cycle, Multiple Cycle, vs. Pipeline Multiple Cycle Implementation:
Pipelining Performance • Example: Comparing pipelining to single-cycle • Consider an ISA with three instruction classes: memory-reference, R-type, and branch. Compare the average time between instructions and to the total execution time in the single-cycle and pipelined implementations. • Assume the operation time for the major units (memory, ALU, and register file) is 200 ps. • Cycle time in single-cycle is determined by the load instruction and equals 1000 ps (200+200+200+200+200) • Cycle time in pipelining is determined by the slowest stage; 200 ps
Pipelining Performance – cont’d • Consider a sequence of load instructions Single Cycle Pipelining
Pipelining Performance – cont’d • Time to start executing the fourth instruction • Single-Cycle = 1000 x 3 = 3000 ps • Pipelining = 3 x 200 = 600 ps (we fetch one instruction every clock cycle • Speedup = 3000/600 = 5 • Effect on execution time for three load instructions • Single-cycle = 3 x 1000 = 3000 ps • Pipelined = 1400 ps • Speedup = 3000/1400 << 5 ???? (not enough workload) • Consider adding 1,000,000 instructions • Speedup = (3000 + 1000x 1,000,000) / (1400 + 200x1,000,000) • ~= 5
Pipelining Performance – cont’d • Assume that the register file takes only 100 ps and we have 1,000,000 load instructions • Single-cycle cycle time = 800 ps • Pipelining cycle time = 200 ps • Time to start the fourth instruction • Single-cycle = 3 x 800 = 2400 ps • Pipeline = 3x200 ps • Speed up = 2400/ 600 = 4 • Total execution time • Speedup = (1,000,000x800) / (1,000,000x200) < 5 • This is due to unbalanced stages
Pipelining Performance – cont’d • Ideally, pipeline speedup is n times faster than the single-cycle, where n is the number of pipeline stages. • In the 5-stage MIPS, the pipelined version would be 5 times faster. • When the pipeline is full, the throughput will be one instruction per cycle • Factors that affect pipelining performance • Time to fill and empty the pipeline with instructions • Number of instructions to execute • Unbalanced operation times for stages • Consider register file access to be 100 ps ?! • Pipeline Hazards (Discussed in part II) • Instruction mix
MIPS Pipeline Datapath Modifications • Notes on instruction execution in MIPS • The execution of instructions is divided into 5 steps/cycles/stages: IF, ID, EX, MEM, and WB • Instruction flow is from left to right except in two cases: • In the write-back stage where the result is written into the register file in the middle of the datapath • Choosing between the incremented PC and the branch address in the MEM stage • What do we need to add/modify in our MIPS datapath to implement pipelining • In pipeline execution, all units are operating in every cycle; we have to duplicate hardware where needed • State registers are added between stages to preserve intermediate data and control for each instruction
IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack Add Add 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 IFetch/Dec Read Address PC Read Data Dec/Exec Address Exec/Mem Write Addr ALU Read Data 2 Mem/WB Write Data Write Data Sign Extend 16 32 System Clock MIPS Pipeline Datapath Modifications • Do you see any problem ?
IF/ID ID/EX EX/MEM Add Add MEM/WB 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data Sign Extend 16 32 Corrected Datapath to Save RegWrite Addr • Need to preserve the destination register address in the pipeline state registers
MIPS Pipeline Data Flow Example –Load Instruction • Instruction Fetch • Put PC and the loaded instruction in the IF/ID register
MIPS Pipeline Data Flow Example –Load Instruction • Instruction Decode and register read • Store rs, rt, sign extended offset , and the updated PC in the ID/EX register
MIPS Pipeline Data Flow Example –Load Instruction • Execute or address calculation • Store branch address, rt, result, and zero flag in the EX/MEM register
MIPS Pipeline Data Flow Example –Load Instruction • Memory access • Store the data from memory into MEM/WB register
MIPS Pipeline Data Flow Example –Load Instruction • Write back • Copy the data loaded in the MEM/WB register to register file
Data Fields in the Pipeline Registers • Data fields are moved from one pipeline register to another every clock cycle until they are no longer needed
MIPS Pipeline Control Path • Define the control signals • The pipeline registers are updated every cycle; no separate write signals
MIPS Pipeline Control Path • All control signals can be determined during Decode • Expand the pipeline registers to store and move the control signals between stages until they are needed
MIPS Pipeline Control Details • Control signals that have to into each stage • Control signals values based on instruction type