1 / 135

CSC3050 – Computer Architecture

Explore the concept of pipelining in computer architecture, covering pipelined datapaths, control mechanisms, hazards, and optimizations for improved performance. Compare single-cycle and pipelined implementations to understand the advantages and trade-offs.

dawnd
Download Presentation

CSC3050 – Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC3050 – Computer Architecture Prof. Yeh-Ching Chung School of Science and Engineering Chinese University of Hong Kong, Shenzhen

  2. Outline • An overview of pipelining • A pipelined datapath • Pipelined control • Hazards:types of hazard • Handling data hazards • InsertingNOP(software) • Forwarding,R-Type-use(hardware) • Stalls,load-use(hardware) • Handlingbranch hazards • Exceptions • Superscalar and dynamic pipelining

  3. Single-Cycle Advantages & Disadvantages • Uses the clock cycle inefficiently – the clock cycle must be timed to accommodate the slowestinstruction • Especially problematic for more complex instructions, e.g., floating-point multiplication • May be wasteful of area since some functional units (e.g., adders) must be duplicated since they can not be shared during a clock cycle • But it is simple and easy to understand Cycle 1 Cycle 2 Clk lw sw waste

  4. Pipelining Analogy • Pipelined laundry: overlapping execution • Parallelism improves performance • 4-load speedup= 8/3.5 = 2.3 • 20-load speedup= 40/11.5= 3.478 • Under ideal conditions • The speedup from pipelining ≈ The number of pipe stages

  5. Five Stages of Load Instruction • Five stages, one step per stage • IF: Instruction fetch from (instruction) memory • ID: Instruction decode & register read • EX: Execute operation or calculate address • MEM: Access (data) memory operand • WB: Write result back to register

  6. EX WB IF ID MEM IF EX ID MEM WB Store IF ID EX R-type MEM WB Single Cycle VS Pipeline Cycle 1 Cycle 2 Clk Single Cycle Implementation: Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Pipeline Implementation Load

  7. Pipeline Performance (1) • Assume time for stages is • 100 ns for register read or write • 200 ns for other stages • Compare pipelined datapath with single-cycle datapath

  8. Pipeline Performance (2)

  9. Pipeline Speedup • If all stages are balanced, i.e., all take the same time • If not balanced, speedup is less • Speedup due to increased throughput • Latency(time for each instruction) does not decrease

  10. Pipeline and ISA Design • MIPS ISA designed for pipelining • All instructionsare 32-bits • Easier to fetch and decode in one cycle • c.f. x86: 1- to 17-byte instructions • Few and regularinstruction formats • Can decode and read registers in one step • Load/store addressing • Can calculate addressin the 3rd stage, access memoryin the 4th stage • Alignment of memory operands • Memory access takes only one cycle

  11. Pipeline Lessons • Does not help latency of single task, but throughput of entire • Pipeline rate limited by slowest stage • Multiple tasks working at the same time using different resources • Potential speedup = Number pipeline stages • Unbalanced stage length; time to “fill” & “drain” the pipeline reduce speedup • Stall for dependences

  12. Outline • An overview of pipelining • A pipelined datapath • Pipelined control • Hazards:types of hazard • Handling data hazards • InsertingNOP(software) • Forwarding,R-Type-use(hardware) • Stalls,load-use(hardware) • Handlingbranch hazards • Exceptions • Superscalar and dynamic pipelining

  13. Design a Pipelined Processor • Examine the datapath and control diagram • Starting with single cycle datapath • Single cycle control? • Partition datapath into stages: • IF (instruction fetch), ID (instruction decode and register file read), EX (execution or address calculation), MEM (data memory access), WB (write back) • Associate resources with stages • Ensure that flows do not conflict, or figure out how to resolve • Assert control in appropriate stage

  14. Multi-Execution Steps But, use single-cycle datapath ...

  15. Feedback Path Split Single-cycle Datapath What to add to split the datapath into stages?

  16. Add Pipeline Registers • Use registers between stages to carry data and control Pipeline registers (latches)

  17. Ifetch Reg/Dec Exec Mem Wr Consider load • IF: Instruction Fetch • Fetch the instruction from the Instruction Memory • ID: Instruction Decode • Registers fetch and instruction decode • EX: Calculate the memory address • MEM: Read the data from the Data Memory • WB: Write the data back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load

  18. 1st lw Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Pipelining load • 5 functional units in the pipeline datapath are • Instruction Memory for the Ifetch stage • Register File’s Read ports (busA and busB) for the Reg/Dec stage • ALU for the Exec stage • Data Memory for the MEM stage • Register File’s Write port (busW) for the WB stage Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Clock 2nd lw 3rd lw

  19. IF Stage of load • IR = mem[PC]; PC = PC + 4 IR, PC+4

  20. ID Stage of load • A = Reg[IR[25-21]]; B = Reg[IR[20-16]];

  21. EX Stage of load • ALUout = A + sign-ext(IR[15-0])

  22. MEM State of load • MDR = mem[ALUout]

  23. WB Stage of load • Reg[IR[20-16]] = MDR Who will supply this address?

  24. Ifetch Reg/Dec Exec Wr The Four Stages of R-type • IF: fetch the instruction from the Instruction Memory • ID: registers fetch and instruction decode • EX: ALU operates on the two register operands • WB: write ALU output back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 R-type

  25. Ops! We have a problem! Ifetch Reg/Dec Exec Wr Ifetch Reg/Dec Exec Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Wr Ifetch Reg/Dec Exec Wr Pipelining R-type and load Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type R-type Load R-type R-type

  26. 1 2 3 4 5 Load Ifetch Reg/Dec Exec Mem Wr 1 2 3 4 R-type Ifetch Reg/Dec Exec Wr Important Observation • Each functional unit can only be used once per instruction • Each functional unit must be used at the same stage for all instructions • Load uses Register File’s write port during its 5th stage • R-type uses Register File’s write port during its 4th stage Several ways to solve: forwarding, adding pipeline bubble, making instructions same length

  27. Ifetch Reg/Dec Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Ifetch Reg/Dec Exec Mem Wr Solution - Delay R-type’s Write • Delay R-type’s register write by one cycle • R-type also use Reg File’s write port at Stage 5 • MEM is a NOP stage: nothing is being done. 4 1 2 3 5 Exec Mem R-type Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Clock R-type R-type Load R-type R-type also has 5 stages R-type

  28. Ifetch Reg/Dec Exec Mem The Four Stages of store • IF: fetch the instruction from the Instruction Memory • ID: registers fetch and instruction decode • EX: calculate the memory address • MEM: write the data into the Data Memory Add an extra stage: • WB: NOP Cycle 1 Cycle 2 Cycle 3 Cycle 4 Store WB

  29. Ifetch Reg/Dec Exec The Three Stages of beq • IF: fetch the instruction from the Instruction Memory • ID: registers fetch and instruction decode • EX: • Compares the two register operand • Select correct branch target address • Latch into PC Add two extra stages: • MEM: NOP • WB: NOP Cycle 1 Cycle 2 Cycle 3 Cycle 4 Beq Mem WB

  30. Graphically Representing Pipelines • Can help with answering questions like • How many cycles to execute this code? • What is the ALU doing during cycle 4? • Help understand datapaths

  31. Example 1 - Cycle 1

  32. Example 1 - Cycle 2

  33. Example 1 - Cycle 3

  34. Example 1 - Cycle 4

  35. Example 1 - Cycle 5

  36. Example 1 - Cycle 6

  37. Outline • An overview of pipelining • A pipelined datapath • Pipelined control • Hazards:types of hazard • Handling data hazards • InsertingNOP(software) • Forwarding,R-Type-use(hardware) • Stalls,load-use(hardware) • Handlingbranch hazards • Exceptions • Superscalar and dynamic pipelining

  38. Pipeline Control - Control Signals

  39. Group Signals According to Stages • Can use control signals of single-cycle CPU

  40. Data Stationary Control (1) • Pass control signals along just like the data • Main control generates control signals during ID

  41. Data Stationary Control (2) • Signals for EX (ExtOp, ALUSrc, ...) are used 1 cycle later • Signals for MEM (MemWr, Branch) are used 2 cycles later • Signals for WB (MemtoReg, MemWr) are used 3 cycles later ID EX MEM WB ExtOp ExtOp ALUSrc ALUSrc ALUOp ALUOp Main Control RegDst RegDst Ex/MEM Register MEM/WB Register ID/Ex Register MemWr MemWr MemW IF/ID Register Branch Branch Branch MemtoReg MemtoReg MemtoReg MemtoReg RegWr RegWr RegWr RegWr

  42. WB Stage of load • Reg[IR[20-16]] = MDR Who will supply this address?

  43. Datapath with Control

  44. Let’s Try it Out lw $10, 20($1) sub $11, $2, $3 and $12, $4, $5 or $13, $6, $7 add $14, $8, $9

  45. Example 2 - Cycle 1

  46. Example 2 - Cycle 2

  47. Example 2 - Cycle 3

  48. Example 2 - Cycle 4

  49. Example 2 - Cycle 5

  50. Example 2 - Cycle 6

More Related