1 / 81

55:035 Computer Architecture and Organization

55:035 Computer Architecture and Organization. Lecture 9. Outline. Building a CPU Basic Components MIPS Instructions Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs. Overview. Brief look Digital logic CPU Datapath

Download Presentation

55:035 Computer Architecture and Organization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 55:035Computer Architecture and Organization Lecture 9

  2. Outline • Building a CPU • Basic Components • MIPS Instructions • Basic 5 Steps for CPU • Single-Cycle Design • Multi-cycle Design • Comparison of Single and Multi-cycle Designs 55:035 Computer Architecture and Organization

  3. Overview • Brief look • Digital logic • CPU Datapath • MIPS Example 55:035 Computer Architecture and Organization

  4. D-type Flip-flop D D Q Q 0 0 Clock (edge- triggered) 1 1 D Q Q EN D Clock (edge- triggered) Clock (edge- triggered) EN (enable) Digital Logic Multiplexer A F B S (Select input) D-type Flip-flop with Enable 55:035 Computer Architecture and Organization

  5. 1 Bit D Q D Q D2 D1 D0 D3 Q0 Q1 Q3 Q2 EN EN Clock (edge- triggered) Clock (edge- triggered) Digital Logic 4 Bits N Bits EN Clock (edge- triggered) Registers 55:035 Computer Architecture and Organization

  6. Digital Logic Tri-state Driver (Buffer) in out drive What is Z ?? 55:035 Computer Architecture and Organization

  7. Add/sub or ALUop Digital Logic Adder/Subtractor or ALU B A Carry-out Carry-in F 55:035 Computer Architecture and Organization

  8. Overview • Brief look • Digital logic • How to Design a CPU Datapath • MIPS Example 55:035 Computer Architecture and Organization

  9. Designing a CPU: 5 Steps • Analyze the instruction set  datapath requirements • MIPS: ADD, SUB, ORI, LW, SW, BR • Meaning of each instruction given by RTL (register transfers) • 2 types of registers: CPU/ISA registers, temporary registers • Datapath requirements  select the datapath components • ALU, register file, adder, data memory, etc • Assemble the datapath • Datapath must support planned register transfers • Ensure all instructions are supported • Analyze datapath control required for each instruction • Assemble the control logic 55:035 Computer Architecture and Organization

  10. 31 26 21 16 11 6 0 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 immediate op rs rt 6 bits 5 bits 5 bits 16 bits 31 26 0 op target address 6 bits 26 bits Step 1a: Analyze ISA • All MIPS instructions are 32 bits long. • Three instruction formats: • R-type • I-type • J-type • R: registers, I: immediate, J: jumps • These formats intentionally chosen to simplify design 55:035 Computer Architecture and Organization

  11. 31 26 21 16 11 6 0 R-type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 I-type immediate op rs rt 6 bits 5 bits 5 bits 16 bits 31 26 0 J-type op target address 6 bits 26 bits Step 1b: Analyze ISA • Meaning of the fields: • op: operation of the instruction • rs, rt, rd: the source and destination register specifiers • Destination is either rd (R-type), or rt (I-type) • shamt: shift amount • funct: selects the variant of the operation in the “op” field • immediate: address offset or immediate value • target address: target address of the jump instruction 55:035 Computer Architecture and Organization

  12. 31 26 21 16 11 6 0 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 31 26 21 16 0 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 31 26 21 16 0 op rs rt immediate 6 bits 5 bits 5 bits 16 bits MIPS ISA: subset for today • ADD and SUB • addU rd, rs, rt • subU rd, rs, rt • OR Immediate: • ori rt, rs, imm16 • LOAD and STORE Word • lw rt, rs, imm16 • sw rt, rs, imm16 • BRANCH: • beq rs, rt, imm16 55:035 Computer Architecture and Organization

  13. REGISTER FILE MIPS ISA requires 32 registers, 32b each Called a register file Contains 32 entries Each entry is 32b AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd) RdReg1 Register Numbers (5 bits ea) RdData1 RdReg2 REGFILE WrReg RdData2 WrData RegWrite Zero? Result ALU ALUop Step 2: Datapath Requirements How to implement? 55:035 Computer Architecture and Organization

  14. Step 3: Datapath Assembly • ADDU rd, rs, rt SUBU rd, rs, rt • Need an ALU • Hook it up to REGISTER FILE • REGFILE has 2 read ports (rs,rt), 1 write port (rd) Parameters Come FromInstruction Fields rs RdReg1 Zero? RdData1 rt RdReg2 REGFILE Result rd WrReg RdData2 ALU WrData Control Signals Depend Upon Instruction Fields Eg: ALUop = f(Instruction) = f(op, funct) ALUop RegWrite 55:035 Computer Architecture and Organization

  15. Zero? Result ALU ALUop Steps 2 and 3: ORI Instruction • ORI rt, rs, Imm16 • Need new ALUop for ‘OR’ function, hook up to REGFILE • 1 read port (rs), 1 write port (rt), 1 const value (Imm16) rs rt FromInstruction RdReg1 RdData1 X rt rd RdReg2 REGFILE WrReg RdData2 0 WrData Control Signals Depend Upon Instruction Fields E.g.: ALUsrc = f(Instruction) = f(op, funct) 1 ZERO-EXTEND Imm16 16-bits RegWrite ALUsrc 55:035 Computer Architecture and Organization

  16. Zero? Result ALU ALUop Steps 2 and 3 Destination Register • Must select proper destination, rd or rt • Depends on Instruction Type • R-type may write rd • I-type may write rt rs RdReg1 RdData1 FromInstruction rt RdReg2 1 REGFILE WrReg RdData2 0 rd 0 WrData 1 ZERO-EXTEND Imm16 16-bits RegWrite RegDst ALUsrc 55:035 Computer Architecture and Organization

  17. Steps 2 and 3: Load Word • LW rt, rs, Imm16 • Need Data Memory: data ← Mem[Addr] • Addr is rs+Imm16, Imm16 is signed, use ALU for + • Store in rt: rt ← Mem[rs+Imm16] rs RdReg1 RdData1 Zero? rt RdReg2 DATAMEM 1 REGFILE WrReg Addr Result RdData2 0 RdData rd 0 WrData ALU 0 SIGN/ZERO-EXTEND Imm16 1 1 RegWrite RegDst ALUsrc ALUop MemtoReg 55:035 Computer Architecture and Organization ExtOp

  18. Steps 2 and 3: Store Word • SW rt, rs, Imm16 • Need Data Memory: Mem[Addr] ← data • Addr is rs+Imm16, Imm16 is signed, use ALU for + • Store in Mem: Mem[rs+Imm16] ← rt rs RdReg1 RdData1 Zero? rt RdReg2 DATAMEM 1 REGFILE WrReg Addr Result RdData2 0 RdData rd 0 WrData ALU 1 WrData SIGN/ZERO-EXTEND Imm16 1 0 RegWrite RegDst ALUsrc ALUop MemWrite 55:035 Computer Architecture and Organization MemtoReg ExtOp

  19. Writes: Need to Control Timing • Problem: write to data memory • Data can come anytime • Addr must come first • MemWrite must come after Addr • Else? writes to wrong Addr! • Solution: use ideal data memory • Assume everything works ok • How to fix this for real? • One solution: synchronous memory • Another solution: delay MemWr to come late • Problems?: write to register file • Does RegWrite signal come after WrReg number? • When does the write to a register happen? • Read from same register as being written? 55:035 Computer Architecture and Organization

  20. Missing Pieces: Instruction Fetching • Where does the Instruction come from? • From instruction memory, of course! • Recall: stored-program concept • Alternatives? How about hard-coding wires and switches…? This is how ENIAC was programmed! • How to branch? • BEQ rs, rt, Imm16 55:035 Computer Architecture and Organization

  21. Instruction Processing • Fetch instruction • Execute instruction • Fetch next instruction • Execute next instruction • Fetch next instruction • Execute next instruction • Etc… • How to maintain sequence? Use a counter! • Branches (out of sequence) ? Load the counter! 55:035 Computer Architecture and Organization

  22. Instruction Processing • Program Counter • Points to current instruction • Address to instruction memory • Instr ← InstrMem[PC] • Next instruction: counts up by 4 • Remember: memory is byte-addressable, instructions are 4 bytes • PC ← PC + 4 • Branch instruction: replace PC contents 55:035 Computer Architecture and Organization

  23. Step 1: Analyze Instructions • Register Transfer Language… op | rs | rt | rd | shamt | funct = InstrMem[ PC ] op | rs | rt | Imm16 = InstrMem[ PC ] Instr Register Transfers ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4 SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4 ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4 BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else PC ← PC + 4 55:035 Computer Architecture and Organization

  24. Steps 2 and 3: Datapath & Assembly • PC: a register • Counter, counts by +4 • Provides address to Instruction Memory Add 4 Read address PC Instruction[31:0] Instruction [31:0] Instruction Memory 55:035 Computer Architecture and Organization

  25. Steps 2 and 3: Datapath & Assembly 0Mux1 Add Add Add result 4 Shift Left 2 PCSrc Instruction[25:21] Read address PC Instruction[20:16] Instruction [31:0] Instruction Memory Instruction[15:11] PC: a register • Counter, counts by +4 • Sometimes, must add SignExtend{Imm16||b’00’} for branch instructions Sign/ Zero Extend Instruction[15:0] (Imm16) 16 32 Note: the sign-extender for Imm16is already in the datapath(everything else is new) ExtOp

  26. Steps 2 and 3: Add Previous Datapath 0Mux1 Add Add Add result 4 Shift Left 2 RegWrite PCSrc Instruction[25:21] Read reg. 1 Read address PC Read data 1 Instruction[20:16] MemtoReg ALUSrc Zero ALU Read reg. 2 Instruction [31:0] ALU result 0Mux1 Read data 2 Addr-ess Read data Write reg. Instruction Memory 1Mux0 0Mux1 Instruction[15:11] Register File Write data RegDst Write data Data Memory Sign/ Zero Extend Instruction[15:0] (Imm16) ALU Control 16 32 MemWrite ExtOp Instruction[5:0] (funct) ALUOp

  27. What have we done? • Created a simple CPU datapath • Control still missing (next slide) • Single-cycle CPU • Every instruction takes 1 clock cycle • Clocking ? 55:035 Computer Architecture and Organization

  28. One Clock Cycle • Clock Locations • PC, REGFILE have clocks • Operation • On rising edge, PC will get new value • Maybe REGFILE will have one value updated as well • After rising edge • PC and REGFILE can’t change • New value out of PC • Instruction out of INSTRMEM • Instruction selects registers to read from REGFILE • Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc • ALU does its work • DataMem may be read (depending on instruction) • Result value goes back to REGFILE • New PC value goes back to PC • Await next clock edge Lots to do in only 1 clock cycle !! 55:035 Computer Architecture and Organization

  29. Missing Steps? • Control is missing (Steps 4 and 5 we mentioned earlier) • Generate the green signals • ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc • These are all f(Instruction), where f() is a logic expression • Will look at control strategies in upcoming lecture • Implementation Details • How to implement REGFILE? • Read port: tristate buffers? Multiplexer? Memory? • Two read ports: two of above? • Write port: how to write only 1 register? • How to control writes to memory? To register file? • More instructions • Shift instructions • Jump instruction • Etc 55:035 Computer Architecture and Organization

  30. 1-Cycle CPU Datapath 0Mux1 Add Add Add result 4 Shift Left 2 RegWrite PCSrc Instruction[25:21] Read reg. 1 Read address PC Read data 1 Instruction[20:16] MemtoReg ALUSrc Zero ALU Read reg. 2 Instruction [31:0] ALU result 0Mux1 Read data 2 Addr-ess Read data Write reg. Instruction Memory 1Mux0 0Mux1 Instruction[15:11] Register File Write data RegDst Write data Data Memory Sign/Zero Extend Instruction[15:0] (Imm16) ALU Control 16 32 MemWrite ExtOp Instruction[5:0] (funct) ALUOp

  31. 1-cycle CPU Datapath + Control Add Add Add result 4 PCSrc Shift Left 2 RegDst Branch Instruction [31:26] MemRead Con- trol MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction[25:21] Read reg. 1 Read address Read data 1 PC Instruction[20:16] Zero Read reg. 2 Instruction [31:0] ALU Read data Addr-ess ALU result Read data 2 Write reg. Instruction Memory Instruction[15:11] Data Memory Register File Write data Write data Sign/Zero Extend Instruction[15:0] ALU control Instruction[5:0]

  32. 1-cycle CPU Control – Lookup Table • Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.

  33. 1-cycle CPU + Jump Instruction Instruction[25:0] Jump address [31..0] PC + 4 [31..28] Instruction [31:26] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]

  34. 1-cycle CPU Problems? • Every instruction 1 cycle • Some instructions “do more work” • Eg, lw must read from DATAMEM • All instructions must have same clock period… • Many instructions run slower than necessary • Tricky timing on MemWrite, RegWrite(?) signals • Write signal must come *after* address is stable • Need extra resources… • PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM 55:035 Computer Architecture and Organization

  35. Performance! • Single-Cycle CPU Performance • Execute one instruction per clock cycle (CPI=1) • Clock cycle time? Note dataflow includes: • INSTRMEM read • REGFILE access • Sign extension • ALU operation • DATAMEM read • REGFILE/PC write • Not every instruction uses all resources (eg, DATAMEM read) • Can we change clock period for each instruction? • No! (Why not?) • One clock period: the worst case! • This is why a single-cycle CPU is not good for performance 55:035 Computer Architecture and Organization

  36. 1-cycle CPU Datapath + Controller Instruction[25:0] Jump address [31..0] PC + 4 [31..28] Instruction [31:26] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]

  37. 1-cycle CPU Summary • Operation • 1 cycle per instruction • Control signals held fixed during entire cycle (except BRANCH) • Only 2 registers • PC, updated every clock cycle • REGFILE, updated when required • During clock cycle, data flows from register-outputs to register-inputs • Fixed clock frequency / period • Performance • 1 instruction per cycle • Slowest instruction determines clock frequency • Outstanding issue: MemWrite timing • Assume this signal writes to memory at end of clock cycle 55:035 Computer Architecture and Organization

  38. Multi-cycle CPU Goals • Improve performance • Break each instruction into smaller steps / multiple cycles • LW instruction  5 cycles • SW instruction  4 cycles • R-type instruction  4 cycles • Branch, Jump  3 cycles • Aim for 5x clock frequency • Complex instructions (eg, LW)  5 cycles  same performance as before • Simple instructions (eg, ADD)  fewer cycles  faster • Save resources (gates/transistors) • Re-use ALU over multiple cycles • Put INSTR + DATA in same memory • MemWrite timing solved? 55:035 Computer Architecture and Organization

  39. M M M M M u u u u u x x x x x Multi-cycle CPU Datapath PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] • Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB) • Move signal paths (+4, Shift Left 2)

  40. M M M M M u u u u u x x x x x Multi-cycle CPU Datapath PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] • Add registers + control signals (IR, MDR, A, B, ALUOut) • Registers with no control signal load value every clock cycle (eg, PC)

  41. Instruction Execution Example • Execute a “Load Word” instruction • LW rt, 0(rs) • 5 Steps • Fetch instruction • Read registers • Compute address • Read data • Write registers 55:035 Computer Architecture and Organization

  42. M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 1. Fetch Instruction InstructionRegister ← Mem[PC]

  43. M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 2. Read Registers A ← Registers[Rs]

  44. M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 3. Compute Address ALUOut ← A + {SignExt(Imm16),b’00’}

  45. M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 4. Read Data MDR ← Memory[ALUOut]

  46. M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 5. Write Registers Registers[Rt] ← MDR

  47. M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] All 5 Steps Shown

  48. Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC] 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR • Missing Steps? 55:035 Computer Architecture and Organization

  49. Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR • Missing Steps? • Must increment the PC • Do it as part of the instruction fetch (in step 1) • Need PCWrite control signal 55:035 Computer Architecture and Organization

  50. Multi-cycle R-Type Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers A ← Registers[Rs]; B ← Registers[Rt] 3. Compute Value ALUOut ← A op B 4. Write Registers Registers[Rd] ← ALUOut • RTL describes data flow action in each clock cycle • Control signals determine precise data flow • Each step implies unique control values 55:035 Computer Architecture and Organization

More Related