450 likes | 686 Views
EECS 318 CAD Computer Aided Design. LECTURE 7: Multicycle CPU. Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow. MIPS instructions. ALU alu $rd,$rs,$rt $rd = $rs <alu> $rt .
E N D
EECS 318 CADComputer Aided Design LECTURE 7: Multicycle CPU Instructor: Francis G. Wolff wolff@eecs.cwru.edu Case Western Reserve University This presentation uses powerpoint animation: please viewshow CWRU EECS 318
MIPS instructions ALUalu $rd,$rs,$rt$rd = $rs <alu> $rt ALUialui $rd,$rs,value$rd = $rs <alu> value Data lw $rt,offset($rs)$rt = Mem[$rs + offset]Transfer sw $rt,offset($rs)Mem[$rs + offset] = $rt Branch beq $rs,$rt,offset $pc = ($rd == $rs)? (pc+4+offset):(pc+4); Jump j addresspc = address CWRU EECS 318
MIPS fixed sized instruction formats R - Format ALUalu $rd,$rs,$rt func shamt rs op rt rd rs op rt value or offset op absolute address J - Format Jump j address ALUi alui $rt,$rs,value I - Format Data lw $rt,offset($rs)Transfer sw $rt,offset($rs) Branch beq $rs,$rt,offset CWRU EECS 318
Assembling Instructions func shamt rs op rt rd ALUalu $rd,$rs,$rt rs op rt value or offset 0x00400020 addu$23, $0, $31 001001:00000:11111:10111:00000:000000 ALUi alui $rt,$rs,value 0x00400024 addi $17, $0, 5 001000:00000:00101:0000000000000101 Suppose there are 32 registers, addu opcode=001001, addi op=001000 CWRU EECS 318
MIPS instruction formats 1 . I m m e d i a t e a d d r e s s i n g o p r s r t I m m e d i a t e 2 . R e g i s t e r a d d r e s s i n g o p r s r t r d . . . f u n c t R e g i s t e r s R e g i s t e r 3 . B a s e a d d r e s s i n g M e m o r y o p r s r t A d d r e s s + B y t e H a l f w o r d W o r d R e g i s t e r 4 . P C - r e l a t i v e a d d r e s s i n g M e m o r y o p r s r t A d d r e s s + W o r d P C 5 . P s e u d o d i r e c t a d d r e s s i n g M e m o r y o p A d d r e s s W o r d Arithmeticaddi $rt, $rs, value add $rd,$rs,$rt Data Transferlw $rt,offset($rs)sw $rt,offset($rs) Conditional branchbeq $rs,$rt,offset Unconditional jumpj address CWRU EECS 318
MIPS registers and conventions NameNumberConventional usage$0 0 Constant 0$v0-$v1 2-3 Expression evaluation & function return$a0-$a3 4-7 Arguments 1 to 4$t0-$t9 8-15,24,35 Temporary (not preserved across call)$s0-$s7 16-23 Saved Temporary (preserved across call)$k0-$k1 26-27 Reserved for OS kernel$gp 28 Pointer to global area$sp 29 Stack pointer$fp 30 Frame pointer$ra 31 Return address (used by function call) CWRU EECS 318
C function to MIPS Assembly Language Exit condition of a while loop is if ( i >= y ) then goto w2 int power_2(int y) { /* compute x=2^y; */ register int x, i; x=1; i=0; while(i<y) { x=x*2; i=i+1; } return x;} Assember .s Commentsaddi $t0, $0, 1 # x=1; addu $t1, $0, $0 # i=0;w1:bge $t1,$a0,w2 # while(i<y) { /* bge= greater or equal */addu $t0, $t0, $t0 # x = x * 2; /* same as x=x+x; */addi $t1,$t1,1 # i = i + 1;beq $0,$0,w1 # }w2:addu $v0,$0,$t0 # return x;jr $ra# jump on register ( pc = ra; ) CWRU EECS 318
Power_2.s: MIPS storage assignment Byte address, not word address 2 words after pc fetch after bge fetch pc is 0x00400030 plus 2 words is 0x00400038 .text 0x00400020 addi $8, $0, 1 # addi $t0, $0, 1 0x00400024 addu $9, $0, $0 # addu $t1, $0, $0 0x00400028 bge $9, $4, 2 # bge $t1, $a0, w2 0x0040002c addu $8, $8, $8 # addi $t0, $t0, $t0 0x00400030 addi $9, $9, 1 # addi $t1, $t1, 1 0x00400034 beq $0, $0, -3 # beq $0, $0, w1 0x00400038 addu $2, $0, $8 # addu $v0, $0, $t0 0x0040003c jr $31 # jr $ra CWRU EECS 318
Machine Language Single Stepping Values changes after the instruction! $pc $v0 $a0 $t0 $t1 $ra $2 $4 $8 $9 $31 00400020 ? 0 ? ? 700018 addi $t0, $0, 1 0040003c 1 0 1 0 700018 jr $ra 00700018 ? 0 1 0 700018 … Assume power2(0); is called; then $a0=0 and $ra=700018 00400024 ? 0 1 ? 700018 addu $t1, $0, $0 00400028 ? 0 1 0700018 bge $t1,$a0,w2 00400038 ? 0 1 0 700018 add $v0,$0,$t0 CWRU EECS 318
Von Neuman & Harvard CPU Architectures ALU I/O ALU I/O Address bus Data bus instructions and data instructions data Von Neuman architectureArea efficient but requires higher bus bandwidth because instructions and data must compete for memory. Harvard architecture was coined to describe machines with separate memories.Speed efficient: Increased parallelism. CWRU EECS 318
Multi-cycle Processor Datapath CWRU EECS 318
Multi-cycle Datapath: with controller CWRU EECS 318
Multi-cycle using Finite State Machine Finite State Machine( hardwired control ) C o m b i n a t i o n a l c o n t r o l l o g i c D a t a p a t h c o n t r o l o u t p u t s O u t p u t s I n p u t s N e x t s t a t e I n p u t s f r o m i n s t r u c t i o n S t a t e r e g i s t e r r e g i s t e r o p c o d e f i e l d CWRU EECS 318
Finite State Machine: program overview T1 T2 T3 T4 T5 Fetch Decode Rformat1 BEQ1 JUMP1 Mem1 Rformat1+1 LW2 SW2 LW2+1 CWRU EECS 318
The Four Stages of R-Format Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Wr R-Format • Fetch: • Fetch the instruction from the Instruction Memory • Decode: • Registers Fetch and Instruction Decode • Exec: ALU • ALU operates on the two register operands • Update PC • Write: Reg • Write the ALU output back to the register file CWRU EECS 318
R-Format State Machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Decode Clock=1 Ifetch Reg/Dec Exec Wr R-Format Exec Fetch Write Clock=1 Clock=1 Clock=1 CWRU EECS 318
The Five Stages of Load Instruction Ifetch Reg/Dec Exec Mem Wr • Fetch: • Fetch the instruction from the Instruction Memory • Decode: • Registers Fetch and Instruction Decode • Exec: Offset • Calculate the memory offset • Mem: • Read the data from the Data Memory • Wr: • Write the data back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load CWRU EECS 318
R-Format & I-Format State Machine Need to check instruction format Clock=1 AND I-Format=1 Clock=1 Decode ExecOffset Exec ALU Fetch Clock=1 AND opcode=LW Write Reg Mem Read Need to check opcode WriteReg Clock=1 Clock=1 Clock=1 AND R-Format=1 Clock=1 Clock=1 CWRU EECS 318
Multi-Instruction sequence Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk Load Store R-type Ifetch Reg Exec Mem Wr Ifetch Reg Exec Mem Ifetch CWRU EECS 318
State machine stepping: T1 Fetch (Done in parallel) IRMEMORY[PC] & PC PC + 4 PC IR CWRU EECS 318
T1 Fetch: State machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Decode Ifetch Reg/Dec Exec Wr R-Format Write Reg Exec Start MemRead=1, MemWrite=0IorD=1 (MemAddrPC)IRWrite=1 (IRMem[PC])ALUSrcA=0 (=PC)ALUSrcB=1 (=4)ALUOP=ADD (PC4+PC)PCWrite=1, PCSource=1 (=ALU)RegWrite=0, MemtoReg=X, RegDst=X Instruction Fetch CWRU EECS 318
T2Decode (read $rs and $rt and offset+pc) & ALUOutPC+signext(IR[15-0]) <<2 PC $rs$rt offset AReg[IR[25-21]] & BReg[IR[20-16]] CWRU EECS 318
T2 Decode State machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Ifetch Reg/Dec Exec Wr R-Format Write Reg Exec Fetch MemRead=0, MemWrite=0IorD=XIRWrite=0ALUSrcA=0 (=PC)ALUSrcB=3 (=signext(IR<<2))ALUOP=0 (=add)PCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X Start Instr. Decode & Register Fetch CWRU EECS 318
T3 ExecALU (ALU instruction) ALUOut A op(IR[31-26]) B op(IR[31-26]) CWRU EECS 318
T3ExecALU State machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Decode Ifetch Reg/Dec Exec Wr R-Format Write Reg Fetch Start R-Format Execution MemRead=0, MemWrite=0IorD=XIRWrite=0ALUSrcA=1 (=A =Reg[$rs])ALUSrcB=0 (=B =Reg[$rt])ALUOP=2 (=IR[28-26])PCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X CWRU EECS 318
T4WrReg (ALU instruction) Reg[ IR[15-11] ] ALUOut CWRU EECS 318
T4 WrReg State machine Cycle 1 Cycle 2 Cycle 3 Cycle 4 Decode Ifetch Reg/Dec Exec Wr R-Format Fetch Start Exec R-Format Write Register MemRead=0, MemWrite=0IorD=XIRWrite=0ALUSrcA=XALUSrcB=XALUOP=X PCWrite=0, PCSource=XRegWrite=1, (Reg[$rd] ALUout) MemtoReg=0, (=ALUout)RegDst=1 (=$rd) CWRU EECS 318
Review Moore Machine IR[31-0] Processor Control LinesMemReadMemWriteIorD ... Next State CWRU EECS 318
Moore Output State Tables: O(State) T2 0 0 X 0 0 + 0 =PC 3 =offset 0 X 0 X X T3-R 0 0 X 0 2 =op 1 =A =$rs 0 =B =$rt 0 X 0 X X T4-R 0 0 X 0 X X X 0 X 1 0 =ALUOut 1 =$rd T1 1 0 0 =PC 1 0 + 0 =PC 1 =4 1 0 =ALU 0 X X State MemRead MemWrite MUX IorD IRWrite ALUOP MUX ALUSrcA MUX ALUSrcB PCWrite MUX PCSource RegWrite MUX MemtoReg MUX RegDst CWRU EECS 318
Review: The Five Stages of Load Instruction Ifetch Reg/Dec Exec Mem Wr • Fetch: • Fetch the instruction from the Instruction Memory • Decode: • Registers Fetch and Instruction Decode • Exec: Offset • Calculate the memory offset • Mem: • Read the data from the Data Memory • Wr: • Write the data back to the register file Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Load CWRU EECS 318
Review: R-Format & I-Format State Machine Need to check instruction format Clock=1 AND I-Format=1 Clock=1 Decode ExecOffset Exec ALU Fetch Clock=1 AND opcode=LW Write Reg Mem Read Need to check opcode WriteReg Clock=1 Clock=1 Clock=1 AND R-Format=1 Clock=1 Clock=1 CWRU EECS 318
T3–I Mem1 (common to both load & store) ALUOut A + sign_extend(IR[15-0]) CWRU EECS 318
T3 Mem1 I-Format State Machine =$rs + offset Clock=1 AND I-Format=1 Decode Exec ALU Fetch Write Reg Mem Read WriteReg Clock=1 AND R-Format=1 MemRead=0, MemWrite=0IorD=XIRWrite=0ALUOP=0 +ALUSrcA=1 =A =Reg[$rs]ALUSrcB=2 =signext(IR[15-0])PCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X Clock=1 AND opcode=LW I-Format ExecutionALUout=$rs+offset CWRU EECS 318
T4 –LW1 : load instruction, read memory MDR Memory[ALUOut] CWRU EECS 318
T4 LW2 I-Format State Machine =Mem[ALU] Decode ExecOffset Exec ALU Fetch Write Reg WriteReg Clock=1 AND I-Format=1 Clock=1 AND R-Format=1 Clock=1 AND opcode=LW I-Format Memory Read MemRead=1, MemWrite=0IorD=1IRWrite=0ALUOP=XALUSrcA=XALUSrcB=XPCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X Clock=1 AND opcode=LW CWRU EECS 318
T5 –LW2 Load instruction, write to register Reg[IR[20-16]] MDR CWRU EECS 318
T5 LW2 I-Format State Machine $rt=MDR Decode ExecOffset Exec ALU Fetch Write Reg Mem Read Clock=1 AND I-Format=1 Clock=1 AND R-Format=1 Clock=1 AND opcode=LW I-Format Register Write MemRead=1, MemWrite=0IorD=1IRWrite=0ALUOP=XALUSrcA=XALUSrcB=XPCWrite=0, PCSource=XRegWrite=1, MemtoReg=1, RegDst=1 Clock=1 AND opcode=LW CWRU EECS 318
T4–SW2 Store instruction, write to memory Memory[ ALUOut ] B CWRU EECS 318
T4 SW2 I-Format State Machine Mem[ALU]= Decode ExecOffset Exec ALU Fetch Write Reg Clock=1 AND I-Format=1 Clock=1 AND R-Format=1 Clock=1 AND opcode=SW I-Format Memory Write MemRead=0, MemWrite=1IorD=1IRWrite=0ALUOP=XALUSrcA=XALUSrcB=XPCWrite=0, PCSource=XRegWrite=0, MemtoReg=X, RegDst=X Store not Load! CWRU EECS 318
T3 BEQ1 (Conditional branch instruction) If (A - B == 0) { PC ALUOut; } Zero ALUOut = Address computed in T2 ! CWRU EECS 318
T3 BEQ1 I-Format State Machine =$rs + offset Decode Exec ALU Fetch Write Reg Clock=1 AND opcode=branch Clock=1 AND R-Format=1 MemRead=0, MemWrite=0IorD=XIRWrite=0ALUOP=0 =subtractALUSrcA=1 =A =Reg[$rs]ALUSrcB=0 =B =Reg[$rt]PCWrite=0, PCWriteCond=1, PCSource=1 =ALUoutRegWrite=0, MemtoReg=X, RegDst=X B-Format Execution CWRU EECS 318
T3 Jump1 (Jump Address) PC PC[31-28] || IR[25-0]<<2 CWRU EECS 318
Moore Output State Tables: O(State) T4-SW 0 1 1=ALU 0 X X X 0 X 0 X X T4-R 0 0 X 0 X X X 0 X 1 0=ALU 1=$rd T2 0 0 X 0 0 0 3 0 X 0 X X T3-R 0 0 X 0 2=op 1=A=$rs 0=B=$rt 0 X 0 X X T4-LW 1 0 1=ALU 0 X X X 0 X 0 X X T1 1 0 0=PC 1 0=+ 0=PC 1=4 1 0=AL 0 X X T3-I 0 0 X 0 0=add 1=A=$rs 2=sign 0 X 0 X X T5-LW 0 0 X 0 X X X 0 X 1 1=MDR 1=$rt State MemRead MemWrite MUX IorD IRWrite ALUOP MUX ALUSrcA MUX ALUSrcB PCWrite MUX PCSource RegWrite MUX MemtoReg MUX RegDst CWRU EECS 318
Multi-cycle: 5 execution steps • T1 (a,lw,sw,beq,j) Instruction Fetch • T2 (a,lw,sw,beq,j) Instruction Decode and Register Fetch • T3 (a,lw,sw,beq,j) Execution, Memory Address Calculation, or Branch Completion • T4 (a,lw,sw) Memory Access or R-type instruction completion • T5 (a,lw) Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES! CWRU EECS 318
Multi-cycle Approach All operations in each clock cycle Ti are done in parallel not sequential! For example, T1, IR = Memory[PC] and PC=PC+4 are done simultaneously! T1 T2 T3 T4 T5 Between Clock T2 and T3 the microcode sequencer will do a dispatch 1 CWRU EECS 318