1 / 87

The Processor: Datapath and Control

The Processor: Datapath and Control. Outline. Goals in processor implementation Brief review of sequential logic design Pieces of the processor implementation puzzle A simple implementation of a MIPS integer instruction subset Datapath Control logic design

zeki
Download Presentation

The Processor: Datapath and Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Processor: Datapath and Control

  2. Outline • Goals in processor implementation • Brief review of sequential logic design • Pieces of the processor implementation puzzle • A simple implementation of a MIPS integer instruction subset • Datapath • Control logic design • A multi-cycle MIPS implementation • Datapath • Control logic design • Microcoded control • Exceptions • Some real microprocessor datapath and control

  3. Goals in processor implementation • Balance the rate of supply of instructions and data and the rate at which the execution core can consume them and can update memory instruction supply execution core data supply

  4. Goals in processor implementation • Recall from Chapter 2 • CPU Time = INST x CPI x CT • INST largely a function of the ISA and compiler • Objective: minimize CPI x CT within design constraints (cost, power, etc.) • Trading off CPI and CT is tricky multiplier multiplier multiplier logic logic logic

  5. Brief review of sequential logic design • State elements are clocked devices • Flip flops, etc • Combinatorial elements hold no state • ALU, caches, multiplier, multiplexers, etc. • In edge triggered clocking, state elements are only updated on the (rising) edge of the clock pulse

  6. Brief review of sequential logic design • The same state element can be read at the beginning of a clock cycle and updated at the end • Example: incrementing the PC clock 12 8 Add input 8 PC Add output 12 Add 4 PC register 8 12 clock

  7. Our processor design progression • (1) Instruction fetch, execute, and operand reads from data memory all take place in a single clock cycle • (2) Instruction fetch, execute, and operand reads from data memory take place in successive clock cycles • (3) A pipelined design

  8. Pieces of the processor puzzle • Instruction fetch • Execution • Data memory instruction supply execution core data supply

  9. Instruction fetch datapath • Memory to hold instructions • Register to hold the instruction memory address • Logic to generate the next instruction address PC +4

  10. Execution datapath • Focus on only a subset of all MIPS instructions • add, sub, and, or • lw, sw • slt • beq, j • For all instructions except j, we • Read operands from the register file • Perform an ALU operation • For all instructions except sw, beq, and j, we write a result into the register file

  11. Execution datapath • Register file block diagram • Read register 1,2: source operand register numbers • Read data 1,2: source operands (32 bits each) • Write register: destination operand register number • Write data: data written into register file • RegWrite: when asserted, enables the writing of Write Data

  12. Execution datapath • Datapath for R-type (add, sub, and, or, slt) • R-type instruction format: 31 26 25 21 20 16 15 11 10 6 5 0 op rs rt rd shamt funct

  13. Execution datapath • Datapath for beq instruction • I-type instruction format: • Zero ALU output indicates if rs=rt (branch is taken/not taken) • Branch target address is the sign extended immediate left shifted two positions, and added to PC+4 31 26 25 21 20 16 15 0 op rs rt immediate

  14. Data memory • Used for lw, sw (I-type format) • Block diagram • Address: memory location to be read or written • Read data: data out of the memory on a load • Write data: data into the memory on a store • MemRead: indicates a read operation is to be performed • MemWrite: indicates a write operation is to be performed

  15. Execution datapath + data memory • Datapath for lw, sw • Address is the sign-extended immediate added to the source operand read out of the register file • sw: data written to memory from specified register • lw: data written to register file from specified memory address

  16. Putting the pieces together • Single clock cycle for fetch, execute, and operand read from data memory • 3 MUXes • Register file operand or sign extended immediate to ALU • ALU or data memory output written to register file • PC+4 or branch target address written to PC register

  17. Datapath for R-type instructions Example: add $4, $18, $30

  18. Datapath for I-type ALU instructions Example: slti $7, $4, 100

  19. Datapath for not taken beq instruction Example: beq $28, $13, EXIT

  20. Datapath for taken beq instruction Example: beq $28, $13, EXIT

  21. Datapath for load instruction Example: lw $8, 112($2)

  22. Datapath for store instruction Example: sw $10, 0($3)

  23. Control signals we need to generate

  24. ALU operation control • ALU control input codes from Chapter 4 • Two steps to generate the ALU control input • Use the opcode to distinguish R-type, lw and sw, and beq • If R-type, use funct field to determine the ALU control input

  25. ALU operation control • Opcode used to generate a 2-bit signal called ALUOp with the following encodings • 00: lw or sw, perform an ALU add • 01: beq, perform an ALU subtract • 10: R-type, ALU operation is determined by the funct field

  26. Comparing instruction fields • Opcode, source registers, function code, and immediate fields always in same place • Destination register is • bits 15-11 (rd) for R-type • bits 20-16 (rt) for lw • MUX to select the right one 31 26 25 21 20 16 15 11 10 6 5 0 0 rs rt rd shamt funct R-type 31 26 25 21 20 16 15 0 beq 4 rs rt immediate (offset) 31 26 25 21 20 16 15 0 lw (sw) 35 (43) rs rt immediate (offset)

  27. Datapath with instr fields and ALU control

  28. Main control unit design

  29. (0) (34) (43) (4) Main control unit design • Truth table

  30. Adding support for jump instructions • J-type format • Next PC formed by shifting left the 26-bit target two bits and combining it with the 4 high-order bits of PC+4 • Now the next PC will be one of • PC+4 • beq target address • j target address • We need another MUX and control bit 31 26 25 0 2 target

  31. Adding support for jump instructions

  32. Evaluation of the simple implementation • All instructions take one clock cycle (CPI = 1) • Assume the following worst case delays • Instruction memory: 4 time units • Data memory: 4 time units (read), 2 time units (write) • ALU: 4 time units • Adders: 3 time units • Register file: 2 time units (read), 1 time unit (write) • MUXes, sign extension, gates, and shifters: 1 time unit • Large disparity in worst case delays among instruction types • R-type: 4+2+1+4+1+1 = 13 time units • beq: 4+2+1+4+1+1+1 = 14 time units • j: 4+1+1 = 6 time units • store: 4+2+4+2 = 12 time units • load: 4+2+4+4+1+1 = 16 time units

  33. Evaluation of the simple implementation • Disparity would be worse in a real machine • Even slower integer instructions (e.g., multiply/divide in MIPS) • Floating point instructions • Simple instructions take as long as complex ones

  34. A multicycle implementation • Instruction fetch, register file access, etc occur in separate clock cycles • Different instruction types take different numbers of cycles to complete • Clock cycle time should be faster

  35. High level view of datapath • New registers store results of each step • Not programmer visible! • Hardware can be shared • One ALU for PC+4, branch target calculation, EA calculation, and arithmetic operations • One memory for instructions and data

  36. Detailed multi-cycle datapath

  37. Multi-cycle control

  38. First two cycles for all instructions • Instruction fetch (1st cycle) • Load the instruction into the IR register • IR = Memory[PC] • Increment the PC • PC = PC+4 • Instruction decode and register fetch (2nd cycle) • Read register file locations rs and rt, results into the A and B registers • A=Reg[IR[25-21]] • B=Reg[IR[20-16]] • Calculate the branch target address and load into ALUOut • ALUOut = PC+(sign-extend (IR[15-0]) <<2)

  39. Instruction fetch • IR=Mem[PC]

  40. Instruction fetch • PC=PC+4

  41. Instruction decode and register fetch • A=Reg[IR[25-21]], B=Reg[IR[20-16]]

  42. Instruction decode and register fetch • ALUOut = PC+(sign-extend (IR[15-0]) <<2)

  43. Additional cycles for R-type • Execution • ALUOut = A op B • Completion • Reg[IR[15-11]] = ALUOut

  44. R-type execution cycle • ALUOut = A op B

  45. R-type completion cycle • Reg[IR[15-11]] = ALUOut

  46. Additional cycles for store • Address computation • ALUOut = A + sign-extend (IR[15-0]) • Memory access • Memory[ALUOut] = B

  47. Store address computation cycle • ALUOut = A + sign-extend (IR[15-0])

  48. Store memory access cycle • Memory[ALUOut] = B

  49. Additional cycles for load • Address computation • ALUOut = A + sign-extend (IR[15-0]) • Memory access • MDR = Memory[ALUOut] • Read completion • Reg[IR[20-16]] = MDR

  50. Load memory access cycle • MDR = Memory[ALUOut]

More Related