1 / 22

CPE 232 Computer Organization Basic MIPS Pipelining – Part I

CPE 232 Computer Organization Basic MIPS Pipelining – Part I. Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html. Single Cycle Implementation:. Cycle 1. Cycle 2. Clk. lw. sw. Waste.

ronna
Download Presentation

CPE 232 Computer Organization Basic MIPS Pipelining – Part I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPE 232 Computer Organization Basic MIPS Pipelining – Part I Dr. Iyad Jafar Adapted from Dr. Gheith Abandah slides http://www.abandah.com/gheith/Courses/CPE335_S08/index.html

  2. Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste multicycle clock slower than 1/5th of single cycle clock due to stage register overhead Multiple Cycle Implementation: IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Clk Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 lw sw R-type Review: Single Cycle vs. Multiple Cycle Timing

  3. How Can We Make It Even Faster? • Split the multiple instruction cycle into smaller and smaller steps • There is a point of diminishing returns where as much time is spent loading the state registers as doing the work • Start fetching and executing the next instruction before the current one has completed • Pipelining – (all?) modern processors are pipelined for performance • Remember the performance equation: • CPU time = CPI * CC * IC • Fetch (and execute) more than one instruction at a time • Superscalar processing

  4. IFetch IFetch IFetch Exec Exec Exec Mem Mem Mem WB WB WB A Pipelined MIPS Processor • In multicycle implementation, one functional unit is used in each clock cycle. Use idle units to do something else ? • Start the next instruction before the current one has completed • improves throughput - total amount of work done in a given time • instruction latency (execution time, delay time, response time - time from the start of an instruction to its completion) is not reduced Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Dec lw Dec sw Dec R-type • clock cycle (pipeline stage time) is limited by the slowest stage • for some instructions, some stages are wasted cycles

  5. Single Cycle Implementation: Cycle 1 Cycle 2 Clk lw sw Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Pipeline Implementation: IFetch Dec Exec Mem WB lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem WB R-type Single Cycle, Multiple Cycle, vs. Pipeline Multiple Cycle Implementation:

  6. Pipelining Performance • Example: Comparing pipelining to single-cycle • Consider an ISA with three instruction classes: memory-reference, R-type, and branch. Compare the average time between instructions and to the total execution time in the single-cycle and pipelined implementations. • Assume the operation time for the major units (memory, ALU, and register file) is 200 ps. • Cycle time in single-cycle is determined by the load instruction and equals 1000 ps (200+200+200+200+200) • Cycle time in pipelining is determined by the slowest stage; 200 ps

  7. Pipelining Performance – cont’d • Consider a sequence of load instructions Single Cycle Pipelining

  8. Pipelining Performance – cont’d • Time to start executing the fourth instruction • Single-Cycle = 1000 x 3 = 3000 ps • Pipelining = 3 x 200 = 600 ps (we fetch one instruction every clock cycle • Speedup = 3000/600 = 5 • Effect on execution time for three load instructions • Single-cycle = 3 x 1000 = 3000 ps • Pipelined = 1400 ps • Speedup = 3000/1400 << 5 ???? (not enough workload) • Consider adding 1,000,000 instructions • Speedup = (3000 + 1000x 1,000,000) / (1400 + 200x1,000,000) • ~= 5

  9. Pipelining Performance – cont’d • Assume that the register file takes only 100 ps and we have 1,000,000 load instructions • Single-cycle cycle time = 800 ps • Pipelining cycle time = 200 ps • Time to start the fourth instruction • Single-cycle = 3 x 800 = 2400 ps • Pipeline = 3x200 ps • Speed up = 2400/ 600 = 4 • Total execution time • Speedup = (1,000,000x800) / (1,000,000x200) < 5 • This is due to unbalanced stages

  10. Pipelining Performance – cont’d • Ideally, pipeline speedup is n times faster than the single-cycle, where n is the number of pipeline stages. • In the 5-stage MIPS, the pipelined version would be 5 times faster. • When the pipeline is full, the throughput will be one instruction per cycle • Factors that affect pipelining performance • Time to fill and empty the pipeline with instructions • Number of instructions to execute • Unbalanced operation times for stages • Consider register file access to be 100 ps ?! • Pipeline Hazards (Discussed in part II) • Instruction mix

  11. MIPS Pipeline Datapath Modifications • Notes on instruction execution in MIPS • The execution of instructions is divided into 5 steps/cycles/stages: IF, ID, EX, MEM, and WB • Instruction flow is from left to right except in two cases: • In the write-back stage where the result is written into the register file in the middle of the datapath • Choosing between the incremented PC and the branch address in the MEM stage • What do we need to add/modify in our MIPS datapath to implement pipelining • In pipeline execution, all units are operating in every cycle; we have to duplicate hardware where needed • State registers are added between stages to preserve intermediate data and control for each instruction

  12. IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack Add Add 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 IFetch/Dec Read Address PC Read Data Dec/Exec Address Exec/Mem Write Addr ALU Read Data 2 Mem/WB Write Data Write Data Sign Extend 16 32 System Clock MIPS Pipeline Datapath Modifications • Do you see any problem ?

  13. IF/ID ID/EX EX/MEM Add Add MEM/WB 4 Shift left 2 Read Addr 1 Instruction Memory Data Memory Register File Read Data 1 Read Addr 2 Read Address PC Read Data Address Write Addr ALU Read Data 2 Write Data Write Data Sign Extend 16 32 Corrected Datapath to Save RegWrite Addr • Need to preserve the destination register address in the pipeline state registers

  14. MIPS Pipeline Data Flow Example –Load Instruction • Instruction Fetch • Put PC and the loaded instruction in the IF/ID register

  15. MIPS Pipeline Data Flow Example –Load Instruction • Instruction Decode and register read • Store rs, rt, sign extended offset , and the updated PC in the ID/EX register

  16. MIPS Pipeline Data Flow Example –Load Instruction • Execute or address calculation • Store branch address, rt, result, and zero flag in the EX/MEM register

  17. MIPS Pipeline Data Flow Example –Load Instruction • Memory access • Store the data from memory into MEM/WB register

  18. MIPS Pipeline Data Flow Example –Load Instruction • Write back • Copy the data loaded in the MEM/WB register to register file

  19. Data Fields in the Pipeline Registers • Data fields are moved from one pipeline register to another every clock cycle until they are no longer needed

  20. MIPS Pipeline Control Path • Define the control signals • The pipeline registers are updated every cycle; no separate write signals

  21. MIPS Pipeline Control Path • All control signals can be determined during Decode • Expand the pipeline registers to store and move the control signals between stages until they are needed

  22. MIPS Pipeline Control Details • Control signals that have to into each stage • Control signals values based on instruction type

More Related