810 likes | 992 Views
55:035 Computer Architecture and Organization. Lecture 9. Outline. Building a CPU Basic Components MIPS Instructions Basic 5 Steps for CPU Single-Cycle Design Multi-cycle Design Comparison of Single and Multi-cycle Designs. Overview. Brief look Digital logic CPU Datapath
E N D
Outline • Building a CPU • Basic Components • MIPS Instructions • Basic 5 Steps for CPU • Single-Cycle Design • Multi-cycle Design • Comparison of Single and Multi-cycle Designs 55:035 Computer Architecture and Organization
Overview • Brief look • Digital logic • CPU Datapath • MIPS Example 55:035 Computer Architecture and Organization
D-type Flip-flop D D Q Q 0 0 Clock (edge- triggered) 1 1 D Q Q EN D Clock (edge- triggered) Clock (edge- triggered) EN (enable) Digital Logic Multiplexer A F B S (Select input) D-type Flip-flop with Enable 55:035 Computer Architecture and Organization
1 Bit D Q D Q D2 D1 D0 D3 Q0 Q1 Q3 Q2 EN EN Clock (edge- triggered) Clock (edge- triggered) Digital Logic 4 Bits N Bits EN Clock (edge- triggered) Registers 55:035 Computer Architecture and Organization
Digital Logic Tri-state Driver (Buffer) in out drive What is Z ?? 55:035 Computer Architecture and Organization
Add/sub or ALUop Digital Logic Adder/Subtractor or ALU B A Carry-out Carry-in F 55:035 Computer Architecture and Organization
Overview • Brief look • Digital logic • How to Design a CPU Datapath • MIPS Example 55:035 Computer Architecture and Organization
Designing a CPU: 5 Steps • Analyze the instruction set datapath requirements • MIPS: ADD, SUB, ORI, LW, SW, BR • Meaning of each instruction given by RTL (register transfers) • 2 types of registers: CPU/ISA registers, temporary registers • Datapath requirements select the datapath components • ALU, register file, adder, data memory, etc • Assemble the datapath • Datapath must support planned register transfers • Ensure all instructions are supported • Analyze datapath control required for each instruction • Assemble the control logic 55:035 Computer Architecture and Organization
31 26 21 16 11 6 0 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 immediate op rs rt 6 bits 5 bits 5 bits 16 bits 31 26 0 op target address 6 bits 26 bits Step 1a: Analyze ISA • All MIPS instructions are 32 bits long. • Three instruction formats: • R-type • I-type • J-type • R: registers, I: immediate, J: jumps • These formats intentionally chosen to simplify design 55:035 Computer Architecture and Organization
31 26 21 16 11 6 0 R-type op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 I-type immediate op rs rt 6 bits 5 bits 5 bits 16 bits 31 26 0 J-type op target address 6 bits 26 bits Step 1b: Analyze ISA • Meaning of the fields: • op: operation of the instruction • rs, rt, rd: the source and destination register specifiers • Destination is either rd (R-type), or rt (I-type) • shamt: shift amount • funct: selects the variant of the operation in the “op” field • immediate: address offset or immediate value • target address: target address of the jump instruction 55:035 Computer Architecture and Organization
31 26 21 16 11 6 0 op rs rt rd shamt funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 31 26 21 16 0 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 31 26 21 16 0 op rs rt immediate 6 bits 5 bits 5 bits 16 bits 31 26 21 16 0 op rs rt immediate 6 bits 5 bits 5 bits 16 bits MIPS ISA: subset for today • ADD and SUB • addU rd, rs, rt • subU rd, rs, rt • OR Immediate: • ori rt, rs, imm16 • LOAD and STORE Word • lw rt, rs, imm16 • sw rt, rs, imm16 • BRANCH: • beq rs, rt, imm16 55:035 Computer Architecture and Organization
REGISTER FILE MIPS ISA requires 32 registers, 32b each Called a register file Contains 32 entries Each entry is 32b AddU rd,rs,rt or SubU rd,rs,rt Read two sources rs, rt Operation rs + rt or rs – rt Write destination rd ← rs+/-rt Requirements Read two registers (rs, rt) Perform ALU operation Write a third register (rd) RdReg1 Register Numbers (5 bits ea) RdData1 RdReg2 REGFILE WrReg RdData2 WrData RegWrite Zero? Result ALU ALUop Step 2: Datapath Requirements How to implement? 55:035 Computer Architecture and Organization
Step 3: Datapath Assembly • ADDU rd, rs, rt SUBU rd, rs, rt • Need an ALU • Hook it up to REGISTER FILE • REGFILE has 2 read ports (rs,rt), 1 write port (rd) Parameters Come FromInstruction Fields rs RdReg1 Zero? RdData1 rt RdReg2 REGFILE Result rd WrReg RdData2 ALU WrData Control Signals Depend Upon Instruction Fields Eg: ALUop = f(Instruction) = f(op, funct) ALUop RegWrite 55:035 Computer Architecture and Organization
Zero? Result ALU ALUop Steps 2 and 3: ORI Instruction • ORI rt, rs, Imm16 • Need new ALUop for ‘OR’ function, hook up to REGFILE • 1 read port (rs), 1 write port (rt), 1 const value (Imm16) rs rt FromInstruction RdReg1 RdData1 X rt rd RdReg2 REGFILE WrReg RdData2 0 WrData Control Signals Depend Upon Instruction Fields E.g.: ALUsrc = f(Instruction) = f(op, funct) 1 ZERO-EXTEND Imm16 16-bits RegWrite ALUsrc 55:035 Computer Architecture and Organization
Zero? Result ALU ALUop Steps 2 and 3 Destination Register • Must select proper destination, rd or rt • Depends on Instruction Type • R-type may write rd • I-type may write rt rs RdReg1 RdData1 FromInstruction rt RdReg2 1 REGFILE WrReg RdData2 0 rd 0 WrData 1 ZERO-EXTEND Imm16 16-bits RegWrite RegDst ALUsrc 55:035 Computer Architecture and Organization
Steps 2 and 3: Load Word • LW rt, rs, Imm16 • Need Data Memory: data ← Mem[Addr] • Addr is rs+Imm16, Imm16 is signed, use ALU for + • Store in rt: rt ← Mem[rs+Imm16] rs RdReg1 RdData1 Zero? rt RdReg2 DATAMEM 1 REGFILE WrReg Addr Result RdData2 0 RdData rd 0 WrData ALU 0 SIGN/ZERO-EXTEND Imm16 1 1 RegWrite RegDst ALUsrc ALUop MemtoReg 55:035 Computer Architecture and Organization ExtOp
Steps 2 and 3: Store Word • SW rt, rs, Imm16 • Need Data Memory: Mem[Addr] ← data • Addr is rs+Imm16, Imm16 is signed, use ALU for + • Store in Mem: Mem[rs+Imm16] ← rt rs RdReg1 RdData1 Zero? rt RdReg2 DATAMEM 1 REGFILE WrReg Addr Result RdData2 0 RdData rd 0 WrData ALU 1 WrData SIGN/ZERO-EXTEND Imm16 1 0 RegWrite RegDst ALUsrc ALUop MemWrite 55:035 Computer Architecture and Organization MemtoReg ExtOp
Writes: Need to Control Timing • Problem: write to data memory • Data can come anytime • Addr must come first • MemWrite must come after Addr • Else? writes to wrong Addr! • Solution: use ideal data memory • Assume everything works ok • How to fix this for real? • One solution: synchronous memory • Another solution: delay MemWr to come late • Problems?: write to register file • Does RegWrite signal come after WrReg number? • When does the write to a register happen? • Read from same register as being written? 55:035 Computer Architecture and Organization
Missing Pieces: Instruction Fetching • Where does the Instruction come from? • From instruction memory, of course! • Recall: stored-program concept • Alternatives? How about hard-coding wires and switches…? This is how ENIAC was programmed! • How to branch? • BEQ rs, rt, Imm16 55:035 Computer Architecture and Organization
Instruction Processing • Fetch instruction • Execute instruction • Fetch next instruction • Execute next instruction • Fetch next instruction • Execute next instruction • Etc… • How to maintain sequence? Use a counter! • Branches (out of sequence) ? Load the counter! 55:035 Computer Architecture and Organization
Instruction Processing • Program Counter • Points to current instruction • Address to instruction memory • Instr ← InstrMem[PC] • Next instruction: counts up by 4 • Remember: memory is byte-addressable, instructions are 4 bytes • PC ← PC + 4 • Branch instruction: replace PC contents 55:035 Computer Architecture and Organization
Step 1: Analyze Instructions • Register Transfer Language… op | rs | rt | rd | shamt | funct = InstrMem[ PC ] op | rs | rt | Imm16 = InstrMem[ PC ] Instr Register Transfers ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4 SUBU R[rd] ← R[rs] – R[rt]; PC ← PC + 4 ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LOAD R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 STORE MEM[ R[rs] + sign_ext(Imm16) ] ← R[rt]; PC ← PC + 4 BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + { sign_ext(Imm16)] || b’00’ } else PC ← PC + 4 55:035 Computer Architecture and Organization
Steps 2 and 3: Datapath & Assembly • PC: a register • Counter, counts by +4 • Provides address to Instruction Memory Add 4 Read address PC Instruction[31:0] Instruction [31:0] Instruction Memory 55:035 Computer Architecture and Organization
Steps 2 and 3: Datapath & Assembly 0Mux1 Add Add Add result 4 Shift Left 2 PCSrc Instruction[25:21] Read address PC Instruction[20:16] Instruction [31:0] Instruction Memory Instruction[15:11] PC: a register • Counter, counts by +4 • Sometimes, must add SignExtend{Imm16||b’00’} for branch instructions Sign/ Zero Extend Instruction[15:0] (Imm16) 16 32 Note: the sign-extender for Imm16is already in the datapath(everything else is new) ExtOp
Steps 2 and 3: Add Previous Datapath 0Mux1 Add Add Add result 4 Shift Left 2 RegWrite PCSrc Instruction[25:21] Read reg. 1 Read address PC Read data 1 Instruction[20:16] MemtoReg ALUSrc Zero ALU Read reg. 2 Instruction [31:0] ALU result 0Mux1 Read data 2 Addr-ess Read data Write reg. Instruction Memory 1Mux0 0Mux1 Instruction[15:11] Register File Write data RegDst Write data Data Memory Sign/ Zero Extend Instruction[15:0] (Imm16) ALU Control 16 32 MemWrite ExtOp Instruction[5:0] (funct) ALUOp
What have we done? • Created a simple CPU datapath • Control still missing (next slide) • Single-cycle CPU • Every instruction takes 1 clock cycle • Clocking ? 55:035 Computer Architecture and Organization
One Clock Cycle • Clock Locations • PC, REGFILE have clocks • Operation • On rising edge, PC will get new value • Maybe REGFILE will have one value updated as well • After rising edge • PC and REGFILE can’t change • New value out of PC • Instruction out of INSTRMEM • Instruction selects registers to read from REGFILE • Instruction controls ALUop, ALUsrc, MemWrite, ExtOp, etc • ALU does its work • DataMem may be read (depending on instruction) • Result value goes back to REGFILE • New PC value goes back to PC • Await next clock edge Lots to do in only 1 clock cycle !! 55:035 Computer Architecture and Organization
Missing Steps? • Control is missing (Steps 4 and 5 we mentioned earlier) • Generate the green signals • ALUsrc, MemWrite, MemtoReg, PCSrc, RegDst, etc • These are all f(Instruction), where f() is a logic expression • Will look at control strategies in upcoming lecture • Implementation Details • How to implement REGFILE? • Read port: tristate buffers? Multiplexer? Memory? • Two read ports: two of above? • Write port: how to write only 1 register? • How to control writes to memory? To register file? • More instructions • Shift instructions • Jump instruction • Etc 55:035 Computer Architecture and Organization
1-Cycle CPU Datapath 0Mux1 Add Add Add result 4 Shift Left 2 RegWrite PCSrc Instruction[25:21] Read reg. 1 Read address PC Read data 1 Instruction[20:16] MemtoReg ALUSrc Zero ALU Read reg. 2 Instruction [31:0] ALU result 0Mux1 Read data 2 Addr-ess Read data Write reg. Instruction Memory 1Mux0 0Mux1 Instruction[15:11] Register File Write data RegDst Write data Data Memory Sign/Zero Extend Instruction[15:0] (Imm16) ALU Control 16 32 MemWrite ExtOp Instruction[5:0] (funct) ALUOp
1-cycle CPU Datapath + Control Add Add Add result 4 PCSrc Shift Left 2 RegDst Branch Instruction [31:26] MemRead Con- trol MemtoReg ALUOp MemWrite ALUSrc RegWrite Instruction[25:21] Read reg. 1 Read address Read data 1 PC Instruction[20:16] Zero Read reg. 2 Instruction [31:0] ALU Read data Addr-ess ALU result Read data 2 Write reg. Instruction Memory Instruction[15:11] Data Memory Register File Write data Write data Sign/Zero Extend Instruction[15:0] ALU control Instruction[5:0]
1-cycle CPU Control – Lookup Table • Also: I-type instructions (ORI) & ExtOp (sign-extend control), etc.
1-cycle CPU + Jump Instruction Instruction[25:0] Jump address [31..0] PC + 4 [31..28] Instruction [31:26] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]
1-cycle CPU Problems? • Every instruction 1 cycle • Some instructions “do more work” • Eg, lw must read from DATAMEM • All instructions must have same clock period… • Many instructions run slower than necessary • Tricky timing on MemWrite, RegWrite(?) signals • Write signal must come *after* address is stable • Need extra resources… • PC+4 adder, ALU for BEQ instruction, DATAMEM+INSTRMEM 55:035 Computer Architecture and Organization
Performance! • Single-Cycle CPU Performance • Execute one instruction per clock cycle (CPI=1) • Clock cycle time? Note dataflow includes: • INSTRMEM read • REGFILE access • Sign extension • ALU operation • DATAMEM read • REGFILE/PC write • Not every instruction uses all resources (eg, DATAMEM read) • Can we change clock period for each instruction? • No! (Why not?) • One clock period: the worst case! • This is why a single-cycle CPU is not good for performance 55:035 Computer Architecture and Organization
1-cycle CPU Datapath + Controller Instruction[25:0] Jump address [31..0] PC + 4 [31..28] Instruction [31:26] Instruction[25:21] Instruction[20:16] Instruction[15:11] Instruction[15:0] Instruction[5:0]
1-cycle CPU Summary • Operation • 1 cycle per instruction • Control signals held fixed during entire cycle (except BRANCH) • Only 2 registers • PC, updated every clock cycle • REGFILE, updated when required • During clock cycle, data flows from register-outputs to register-inputs • Fixed clock frequency / period • Performance • 1 instruction per cycle • Slowest instruction determines clock frequency • Outstanding issue: MemWrite timing • Assume this signal writes to memory at end of clock cycle 55:035 Computer Architecture and Organization
Multi-cycle CPU Goals • Improve performance • Break each instruction into smaller steps / multiple cycles • LW instruction 5 cycles • SW instruction 4 cycles • R-type instruction 4 cycles • Branch, Jump 3 cycles • Aim for 5x clock frequency • Complex instructions (eg, LW) 5 cycles same performance as before • Simple instructions (eg, ADD) fewer cycles faster • Save resources (gates/transistors) • Re-use ALU over multiple cycles • Put INSTR + DATA in same memory • MemWrite timing solved? 55:035 Computer Architecture and Organization
M M M M M u u u u u x x x x x Multi-cycle CPU Datapath PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] • Add multiplexers + control signals (IorD, MemtoReg, ALUSrcA, ALUSrcB) • Move signal paths (+4, Shift Left 2)
M M M M M u u u u u x x x x x Multi-cycle CPU Datapath PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] • Add registers + control signals (IR, MDR, A, B, ALUOut) • Registers with no control signal load value every clock cycle (eg, PC)
Instruction Execution Example • Execute a “Load Word” instruction • LW rt, 0(rs) • 5 Steps • Fetch instruction • Read registers • Compute address • Read data • Write registers 55:035 Computer Architecture and Organization
M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 1. Fetch Instruction InstructionRegister ← Mem[PC]
M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 2. Read Registers A ← Registers[Rs]
M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 3. Compute Address ALUOut ← A + {SignExt(Imm16),b’00’}
M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 4. Read Data MDR ← Memory[ALUOut]
M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] 5. Write Registers Registers[Rt] ← MDR
M M M M M u u u u u x x x x x Load Word Instruction Sequence PC Instruction [25:21] RdReg1 Address A RdData1 Instruction [20:16] Memory ALU Zero RdReg2 ALUOut MemData Registers Instruction [15:0] ALUresult Write reg Instruction [15:11] B Writedata RdData2 Instruction Register 4 Write data Instr[15:0] Memory Data Register Sign Extend Shift Left 2 Instruction[5:0] All 5 Steps Shown
Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC] 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR • Missing Steps? 55:035 Computer Architecture and Organization
Multi-cycle Load Word: Recap 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers A ← Registers[Rs] 3. Compute Address ALUOut ← A + {SignExt(Imm16)} 4. Read Data MDR ← Memory[ALUOut] 5. Write Registers Registers[Rt] ← MDR • Missing Steps? • Must increment the PC • Do it as part of the instruction fetch (in step 1) • Need PCWrite control signal 55:035 Computer Architecture and Organization
Multi-cycle R-Type Instruction 1. Fetch Instruction InstructionRegister ← Mem[PC]; PC ← PC + 4 2. Read Registers A ← Registers[Rs]; B ← Registers[Rt] 3. Compute Value ALUOut ← A op B 4. Write Registers Registers[Rd] ← ALUOut • RTL describes data flow action in each clock cycle • Control signals determine precise data flow • Each step implies unique control values 55:035 Computer Architecture and Organization