1 / 45

Understanding MIPS Implementations: Simplified and Pipelined Versions

This chapter provides an introduction to two different MIPS processor implementations - a simplified version and a more realistic pipelined version. It covers important aspects such as memory reference, arithmetic/logical operations, and control transfer. The chapter also discusses the concept of pipelining and its impact on performance.

mladd
Download Presentation

Understanding MIPS Implementations: Simplified and Pipelined Versions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 The Processor

  2. Introduction §4.1 Introduction • We will examine two MIPS implementations • A simplified version • A more realistic pipelined version • Simple subset, shows most aspects • Memory reference: lw, sw • Arithmetic/logical: add, sub, and, or, slt • Control transfer: beq, j Chapter 4 — The Processor — 2

  3. Uoh.blackboard.com • Login using Username : your username Password : your email password.

  4. Go to “Courses” menu

  5. Select “201401_COE308_001_3646: Computer Architecture”

  6. Select “Content “

  7. Slides

  8. First Task

  9. First Task

  10. Pipelining Analogy • Pipelined laundry: overlapping execution • Parallelism improves performance §4.5 An Overview of Pipelining • Four loads: • Speedup= 8/3.5 = 2.3 Chapter 4 — The Processor — 10

  11. MIPS Pipeline • Five stages, one step per stage • IF: Instruction fetch from memory • ID: Instruction decode & register read • EX: Execute operation or calculate address • MEM: Access memory operand • WB: Write result back to register Chapter 4 — The Processor — 11

  12. Pipeline Performance • Assume time for stages is • 100ps for register read or write • 200ps for other stages • Compare pipelined datapath with single-cycle datapath Chapter 4 — The Processor — 12

  13. Pipeline Performance Single-cycle (Tc= 800ps) Pipelined (Tc= 200ps) Chapter 4 — The Processor — 13

  14. Assembly Line Divide the execution of a task among a number of stages A task is divided into subtasks to be executed in sequence Performance improvement compared to sequential execution BasicIdea

  15. n 2 1 Sub-tasks Task Pipeline Stream of Tasks n 2 1 Pipeline

  16. 1 5 2 6 3 7 4 8 5 Tasks on 4 stage pipeline Time Task 1 Task 2 Task 3 Task 4 Task 5

  17. t t t Speedup Stream of m Tasks n 2 1 Pipeline T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Speedup = n * m/n + m -1

  18. t t t Efficiency Stream of m Tasks n 2 1 Pipeline T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Efficiency = Speedup/ n =m/(n+m-1)

  19. t t t Throughput Stream of m Tasks n 2 1 Pipeline T (Seq) = n * m * t T(Pipe) = n * t + (m-1) * t Throughput = no. of tasks executed per unit of time = m/((n+m-1) x t)

  20. Pipeline stall Some of the stages might need more time to perform its function. E.g. I2 needs 3 time units to perform its function This is called a “Bubble” or “pipeline hazard” Instruction Pipeline

  21. Instruction Dependency The operation performed by a stage depends on the operation(s) performed by other stage(s). E.g. Conditional Branch Instruction I4 can not be executed until the branch condition in I3 isevaluated and stored. The branch takes 3 units of time Pipeline and Instruction Dependency

  22. Show a Gantt chart for 10 instructions that enter a four-stage pipeline (IF, ID, IE , and IS)? Assume that I5 fetching process depends on the results of the I4 evaluation. Group Activity

  23. Answer

  24. Data Dependency: A source operand of instruction Iidepends on the results of executing a proceeding Ij i > j E.g. Ij can not be fetched unless the results of Ii are saved. Pipeline and Data Dependency

  25. ADD R1, R2, R3 R3 R1 + R2  Ii SL R3; R3  SL(R3)  Ii+1 SUB R5, R6, R4 R4  R5 – R6  Ii+2 Assume that we have five stages in the pipeline: IF (Instruction Fetch) ID (Instruction Decode) OF (Operand Fetch) IE (Instruction Execute) IS (Instruction Store) Show a Gantt chart for this code? Group Activity

  26. R3 in both Ii and Ii+1 need to be written Therefore, the problem is a Write after Write Data Dependancy Answer

  27. Write after write Read after write Write after read Read after read  does not cause stall When stalls occur in the pipeline ?

  28. Read after write

  29. Consider the execution of the following sequence of instructions on a five-stage pipeline consisting of IF, ID, OF, IE, and IS. It is required to show the succession of these instructions in the pipeline. Show all types of data dependency? Show the speedup and efficiency? Group Activity

  30. Answer

  31. Prevents Fetching the Wrong Instruction / Operand Equivalent to doing nothing No Operation Method

  32. Consider the execution of ten instructions I1–I10 on a pipeline consisting of four pipeline stages: IF, ID, IE, and IS. Assume that instruction I4 is a conditional branch instruction and that when it is executed, the branch is not taken; that is, the branch condition is not satisfied. Draw Gantt chart showing Nop? Group Activity

  33. Prevents Fetching Wrong Instruction Answer

  34. Consider the execution of the following piece of code on a five-stage pipeline (IF, ID, OF, IE, IS). Draw Gantt chart with Nop? Group Activity

  35. Prevents Fetching Wrong Operands Answer

  36. Reducing the Stalls Due to Instruction Dependency

  37. Reordering of Instructions Use of Dedicated Hardware in the Fetch Unit Speed up the fetching instruction Precomputing the Branch and Reordering the Instructions Instruction prefetch Instructions can be fetched and stored in the instruction queue. Unconditional Branch Instructions

  38. The target of the conditional branch address will not be known until the execution of the conditional branch has been completed. Delayed Branch Fill the pipeline with some instruction until the branch instruction is executed Prediction of the next instruction to be fetched It is based on that the branch outcome is random Assume that the branch is not taken If the predication is correct , we saved the time Otherwise, we redo everything Conditional Branching Instructions

  39. Before delaying After Delaying Example

  40. Reducing Pipeline Stalls due to Data Dependency

  41. Allows the result of ALU operation to be available to another ALU operation. SUB can not start until R3 is stored If we can forward R3 to the Sub at the same time of the store operation  will save a stall time Hardware Operand Forwarding

  42. Group Activity

  43. Group Activity

  44. int I, X=3; for( i=0;i<10;i++ ) { X= X+ 5 ; } Assume that we have five stages in the pipeline: IF (Instruction Fetch) ID (Instruction Decode) OF (Operand Fetch) IE (Instruction Execute) IS (Instruction Store) Show a Gantt chart for this code? Group activity

  45. int I, X=3; for( i=0;i<10;i++ ) { X= X+ 5 ;} MIPS Code li $t0, 10 # t0 is a constant 10 li $t1, 0 # t1 is our counter (i) li $t2, 3 # t2 is our x loop: beq $t1, $t0, end # if t1 == 10 we are done Add $t2, $t2, 5 #Add 5 to x addi $t1, $t1, 1 # add 1 to t1 j loop # jump back to the top end: Group activity

More Related