550 likes | 709 Views
EEM 486 : Computer Architecture Lecture 4 Designing a Multicycle Processor. Processor. Input. Control. Memory. Datapath. Output. The Big Picture. Designing a Multiple Clock Cycle Datapath. OPcode. Control Logic / Store (PLA, ROM). Decode. microinstruction. Conditions.
E N D
EEM 486: Computer ArchitectureLecture 4Designing a Multicycle Processor
Processor Input Control Memory Datapath Output The Big Picture • Designing a MultipleClock CycleDatapath
OPcode Control Logic / Store (PLA, ROM) Decode microinstruction Conditions Instruction Control Points Datapath Single-CycleProcessor In our single-cycle processor, each instruction is realized by exactly one control command or microinstruction
Main Control op ALU control fun ALUSrc Equal ExtOp MemWr MemWr RegWr MemRd RegDst nPC_sel ALUctr Reg. Wrt ALU Register Fetch Ext Mem Access PC Instruction Fetch Next PC Data Mem Abstract View ofSingle Cycle-Processor
What’s Wrong with CPI=1 Processor? Arithmetic & Logical PC Inst Memory Reg File ALU setup • Long Cycle Time • All instructions take as much time as the slowest • Real memory is not as nice as our idealized memory • Cannot always get the job done in one (short) cycle mux mux Load PC Inst Memory Reg File ALU Data Mem setup mux mux Critical Path Store Inst Memory PC Reg File ALU Data Mem mux Branch PC Inst Memory Reg File cmp mux
Storage Array selected word line storage cell address bit line address decoder sense amps mem. bus proc. bus memory L2 Cache Cache Processor 1 time-period 20 - 50 time-periods 2-3 time-periods Memory Access Time • Physics => fast memories are small (large memories are slow) • => Use a hierarchy of memories
Multicycle Approach • Break up the instructions into steps: • Let each step take one “smaller” clock cycle - Balancethe amount of work to be done - Restrict each cycle to use only one major functional unit Major functional units: Memory, Register File, and ALU • Let different instructions take different numbers of cycles • Use a functional unit more than once within execution of one instruction (Less hardware) • A single memory unit for both instructions and data • A single ALU, rather than an ALU and two adders • At the end of a cycle • store values for use in later cycles • introduce additional “internal” registers
Equal Partitioning the CPI=1 Datapath • Add registers between smallest steps MemWr MemWr RegWr MemRd RegDst nPC_sel ALUSrc ExtOp ALUctr Reg. File Exec Operand Fetch Instruction Fetch Mem Access PC Next PC Data Mem Write back Memory access Execution Instruction fetch Decode and Operand fetch
Recall: Step-by-step Processor Design Step 1: ISA => Logical Register Transfers Step 2: Components of the Datapath Step 3: RTL + Components => Datapath Step 4: Datapath + Logical RTs => Physical RTs Step 5: Physical RTs => Control
Step 4: R-type (add, sub, . . .) inst Logical Register Transfers ADDU R[rd]<–R[rs] + R[rt]; PC <– PC + 4 Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4 Step 2. Instruction Decode and Register Fetch A ← R[rs], B ← R[rt] Step 3. Execution ALUOut ← A op B Step 4. Write-back R[rd] ← ALUOut
MemRead=1 nPCWrite=1 IRWrite=1 Address PC Memory Instruction register MemData ALU Write Data 4 ALUctr=Add R-type - Fetch
RegWrite=0 nPCWrite=0 MemRead=0 IRWrite=0 PC Address Instruction [25-21] Rs Read data 1 Memory A MemData Instruction [20-16] Rt Registers Write data B Read data 2 4 Instruction [15-11] Rw Write data Instruction register ALU ALUctr=x R-type – Decode/Register Fetch
ALUSrcA=1 RegWrite=0 nPCWrite=0 MemRead=0 IRWrite=0 PC Rs Address Instruction [25-21] Read data 1 0 Memory A 1 Rt MemData Instruction [20-16] Registers ALU ALU Out Write data Read data 2 0 B Rw Instruction [15-11] 1 Write data 4 Instruction register ALUctr= Func ALUSrcB=0 R-type - Execution
ALUSrcA=x RegWrite=1 nPCWrite=0 MemRead=0 IRWrite=0 PC Address Instruction [25-21] Rs A B Read data 1 Memory MemData Instruction [20-16] Rt ALU Out Registers Write data Read data 2 Instruction [15-11] Rw 4 Write data Instruction register ALU 0 0 1 1 ALUctr=x ALUSrcB=x R-type – Write Back
Step 4: Logical immed inst Logical Register Transfers ORI R[rt] <– R[rs] OR ZExt(Im16);PC <– PC + 4 Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4 Step 2. Instruction Decode and Register Fetch A ← R[rs] Step 3. Execution ALUOut ← A OR ZExt(Im16) Step 4. Write-back R[rt] ← ALUOut
Logical immediate - Execution ALUSrcA=1 RegWrite=0 nPCWrite=0 MemRead=0 IRWrite=0 Address PC Instruction [25-21] Rs 0 Read data 1 Memory A 1 MemData Instruction [20-16] Rt ALU Out ALU Registers Write data Inst [15-11] 0 B Read data 2 Instruction [15-0] Rw 1 4 Write data Instruction register 2 Zero extend 16 32 ALUctr=Or ALUSrcB=2
RegDst=0 ALUSrcA=x RegWrite=1 nPCWrite=0 MemRead=0 IRWrite=0 Address PC Instruction [25-21] Rs 0 Read data 1 Memory A 1 MemData Instruction [20-16] Rt ALU Out ALU Registers Write data 0 B Read data 2 0 Inst [15-11] Instruction [15-0] Rw 1 1 4 Write data Instruction register 2 Zero extend 16 32 ALUctr=x ALUSrcB=x Logical immediate – Write Back
Step 4 : Load inst Logical Register Transfers LW R[rt] <– MEM[R[rs] + SExt(Im16)];PC <– PC + 4 Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4 Step 2. Instruction Decode and Register Fetch A ← R[rs] Step 3. Memory address computation ALUOut ← A + SExt(Im16) Step 4. Memory access MDR ← Memory[ALUOut] Step 5. Load completion R[rt] ← MDR
Load: Address Calculation RegDst=x ALUSrcA=1 RegWrite=0 nPCWrite=0 MemRead=0 IRWrite=0 Address PC Instruction [25-21] Rs 0 Read data 1 Memory A 1 MemData Instruction [20-16] Rt ALU Out Registers Write data ALU 0 B Read data 2 0 Inst [15-11] Instruction [15-0] Rw 1 1 4 Write data Instruction register 2 Zero/ Sign extend 16 32 ALUctr=Add ALUSrcB=2 ExtOp=Sign
0 1 0 1 Load: Memory Read RegDst=x ALUSrcA=x MemRead=1 RegWrite=0 nPCWrite=0 IRWrite=0 Instruction [31-26] Address 0 PC Instruction [25-21] Rs Memory Read data 1 1 A MemData Instruction [20-16] Rt ALU Out ALU Registers Write data 0 B Read data 2 Inst [15-11] Instruction [15-0] Rw 1 4 Write data Instruction register 2 MDR Extender 16 32 ALUctr=x ALUSrcB=x ExtOp=x IorD=1
RegDst=0 ALUSrcA RegWrite=1 nPCWrite=0 MemRead=0 IRWrite=0 Address PC Instruction [25-21] Rs 0 0 Memory Read data 1 A 1 1 MemData Instruction [20-16] Rt ALU ALU Out Registers Write data 0 B Read data 2 0 Inst [15-11] Instruction [15-0] Rw 1 1 4 Write data Instruction register 2 0 1 MDR Extender 16 32 ALUctr=x ALUSrcB=x ExtOp=x MemtoReg=1 IorD=x Load: Write Back
Step 4 : Store inst Logical Register Transfers SW MEM[R[rs] + SExt(Im16)] <– R[rt];PC <– PC + 4 Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4 Step 2. Instruction Decode and Register Fetch A ← R[rs], B ← R[rt] Step 3. Memory address computation ALUOut ← A + SExt(Im16) Step 4. Memory access Memory[ALUOut] ← B
RegDst=x ALUSrcA=1 RegWrite=0 nPCWrite=0 MemRead=0 IRWrite=0 Address PC Instruction [25-21] Rs 0 0 Memory Read data 1 A 1 1 MemData Instruction [20-16] Rt ALU Out Registers Write data ALU 0 B Read data 2 0 Inst [15-11] Instruction [15-0] Rw 1 1 4 Write data Instruction register 2 0 1 MDR Extender 16 32 ALUctr=Add ALUSrcB=2 ExtOp=Sign MemtoReg=x IorD=x Store: Address Calculation
RegDst=x ALUSrcA=x RegWrite=1 MemWrite=1 MemRead=0 nPCWrite=0 IRWrite=0 Address 0 PC Instruction [25-21] Rs 0 Memory Read data 1 1 A 1 MemData Instruction [20-16] Rt ALU ALU Out Registers Write data B 0 Read data 2 0 0 Inst [15-11] Instruction [15-0] Rw 1 1 1 4 Write data Instruction register 2 Extender MDR 16 32 IorD=1 ALUctr=x ALUSrcB=x MemtoReg=x ExtOp=x Store: Memory Write
Step 4 : Branch inst Logical Register Transfers BEQ if R[rs] == R[rt]then PC <= PC + 4+SExt(Im16) || 00 else PC <= PC + 4 Step 1. Instruction Fetch IR ← MEM[PC], PC ← PC + 4 Step 2. Instruction Decode and Register Fetch A ← R[rs], B ← R[rt] ALUOut ← PC +SExt(Im16) || 00 Step 3. Branch completion If A = B, PC ← ALUOut
RegDst=0 ALUSrcA=0 RegWrite=0 MemWrite=0 MemRead=0 IRWrite=0 Instruction [31-26] Address PC Instruction [25-21] 0 Rs 0 Memory Read data 1 1 A 1 MemData Instruction [20-16] Rt ALU Out Registers Write data ALU 0 B 0 0 Read data 2 Inst [15-11] Instruction [15-0] Rw 1 1 1 4 Write data Instruction register 2 3 Shift left 2 MDR Extender 16 32 IorD=x ALUctr=Add ExtOp=Sign ALUSrcB=3 MemtoReg=x Branch – Address Calculation
PCWriteCond=1 PCWrite=0 PCSource=1 RegDst=x ALUSrcA=1 RegWrite=0 MemWrite=0 MemRead=0 IRWrite=0 1 0 Instruction [31-26] 0 Address PC Instruction [25-21] Rs 0 1 Memory Read data 1 A 1 MemData Instruction [20-16] Zero Rt ALU Out Registers Write data 0 0 ALU 0 B Read data 2 1 1 Inst [15-11] Instruction [15-0] Rw 1 4 Write data Instruction register 2 3 Shift left 2 MDR Extender 16 32 IorD=0 ALUctr=Sub ExtOp=x ALUSrcB=0 MemtoReg=x Branch:Execution
PCWriteCond ALUOp PCWrite PCSource IorD ALUSrcB MemRead ALUSrcA Control MemWrite RegWrite ExtOp Op [5-0] MemtoReg RegDst IRWrite Instruction [31-26] 0 1 0 Address PC Instruction [25-21] 0 1 1 Rs Memory Read data 1 A MemData Instruction [20-16] Zero Rt ALU ALU Out 0 0 Registers Write data 0 B 1 1 Read data 2 Inst [15-11] Instruction [15-0] Rw 1 4 Write data Instruction register 2 3 Shift left 2 ALU Control MDR Extender 16 32 Instruction [ 5-0] Multicycle Processor
Simple Questions • How many cycles will it take to execute this code? lw $t2, 0($t3) lw $t3, 4($t3) beq $t2, $t3, Label #assume not add $t5, $t2, $t3 sw $t5, 8($t3)Label: ... • What is going on during the 8th cycle of execution? • In what cycle does the actual addition of $t2 and $t3 takes place?
inputs (conditions) Next State Logic Control State Output Logic outputs (control points) Finite State Machine (FSM) Controller • State specifies control points for Register Transfer • Transfer occurs upon exiting state (same falling edge)
P C W r i t e P C W r i t e C o n d I o r D M e m R e a d M e m W r i t e I R W r i t e C o n t r o l l o g i c M e m t o R e g P C S o u r c e A L U O p O u t p u t s A L U S r c B A L U S r c A R e g W r i t e R e g D s t N S 3 N S 2 N S 1 I n p u t s N S 0 5 4 3 2 1 0 p p p p p p 3 2 1 0 O O O O O O S S S S I n s t r u c t i o n r e g i s t e r S t a t e r e g i s t e r o p c o d e f i e l d FSM for Control
IR <= MEM[PC] PC <= PC + 4 instruction fetch decode / operand fetch A <= R[rs], B <= R[rt] S <= PC + SX || 00 R-type ORi BEQ PC <= Next(PC,Equal) execute S <= A fun B S <= A or ZX S <= A + SX LW SW M<=MEM[S] MEM[S] <= B memory R[rd] <= S R[rt] <= S write-back R[rt] <= M Step 4 Control Specification
Step 5 (datapath + state diagram control) • Translate RTs into control points • Assign states • Then go build the controller
Instruction fetch Instruction decode / register fetch PCSource= 0 PCWrite IorD= 0 MemRead IRWrite ALUSrcA= 0 ALUSrcB= 01 ALUOp= 000 IorD= 1 MemRead IorD= 1 MemWrite RegDst= 0 RegWrite MemtoReg= 1 RegDst= 0 RegWrite MemtoReg= 0 ALUSrcA= 0 ALUSrcB= 11 ALUOp= 000 ExtOp= 1 RegDst= 1 RegWrite MemtoReg= 0 ALUSrcA= 1 ALUSrcB= 00 ALUOp= 100 Branch LW / SW ORi R-type ALUSrcA= 1 ALUSrcB= 00 ALUOp= 001 PCSource= 1 PCWriteCond ALUSrcA= 1 ALUSrcB= 10 ALUOp= 000 ExtOp= 1 ALUSrcA= 1 ALUSrcB= 10 ALUOp= 010 ExtOp= 0 SW LW Mapping RTs to Control Points
IR <= MEM[PC] PC <= PC + 4 0000 A <= R[rs], B <= R[rt] S <= PC + SX || 00 0001 R-type ORi BEQ PC <= Next(PC,Equal) S <= A + SX S <= A fun B S <= A or ZX 1000 0110 0100 0011 LW SW M<=MEM[S] MEM[S] <= B 1001 1011 R[rd] <= S R[rt] <= S R[rt] <= M 0101 0111 1010 Assigning States
Instruction fetch Instruction decode / register fetch PCSource= 0 PCWrite IorD= 0 MemRead IRWrite ALUSrcA= 0 ALUSrcB= 01 ALUOp= 000 IorD= 1 MemWrite ALUSrcA= 0 ALUSrcB= 11 ALUOp= 000 ExtOp= 1 ALUSrcA= 1 ALUSrcB= 00 ALUOp= 100 RegDst= 1 RegWrite MemtoReg= 0 RegDst= 0 RegWrite MemtoReg= 0 IorD= 1 MemRead RegDst= 0 RegWrite MemtoReg= 1 1 0 Branch LW / SW ORi R-type 3 ALUSrcA= 1 ALUSrcB= 00 ALUOp= 001 PCSource= 1 PCWriteCond ALUSrcA= 1 ALUSrcB= 10 ALUOp= 000 ExtOp= 1 ALUSrcA= 1 ALUSrcB= 10 ALUOp= 010 ExtOp= 0 4 6 8 SW LW 11 9 5 7 10 Control Logic – Datapath Control Outputs
Instruction fetch Instruction decode / register fetch PCSource= 0 PCWrite IorD= 0 MemRead IRWrite ALUSrcA= 0 ALUSrcB= 01 ALUOp= 000 IorD= 1 MemWrite ALUSrcA= 0 ALUSrcB= 11 ALUOp= 000 ExtOp= 1 ALUSrcA= 1 ALUSrcB= 00 ALUOp= 100 RegDst= 1 RegWrite MemtoReg= 0 RegDst= 0 RegWrite MemtoReg= 0 IorD= 1 MemRead RegDst= 0 RegWrite MemtoReg= 1 1 0 Branch LW / SW ORi R-type 3 ALUSrcA= 1 ALUSrcB= 00 ALUOp= 001 PCSource= 1 PCWriteCond ALUSrcA= 1 ALUSrcB= 10 ALUOp= 000 ExtOp= 1 ALUSrcA= 1 ALUSrcB= 10 ALUOp= 010 ExtOp= 0 4 6 8 SW LW 11 9 5 7 10 Control Logic – Next State Function
Performance Evaluation • What is the average CPI? • State diagram gives CPI for each instruction type • Workload gives frequency of each type Type CPIi for type Frequency CPIi x freqIi Arith/Logic 4 40% 1.6 Load 5 30% 1.5 Store 4 10% 0.4 branch 3 20% 0.6 Average CPI:4.1
I n s t r u c t i o n d e c o d e / I n s t r u c t i o n f e t c h r e g i s t e r f e t c h 0 M e m R e a d 1 A L U S r c A = 0 I o r D = 0 A L U S r c A = 0 I R W r i t e A L U S r c B = 1 1 S t a r t A L U S r c B = 0 1 A L U O p = 0 0 A L U O p = 0 0 P C W r i t e ) P C S o u r c e = 0 0 ' ) e Q ) p y ' t E - J R ' B = ' p = = O ( ) p ' p W M e m o r y a d d r e s s S O ' O = B r a n c h ( J u m p p ( O ( c o m p u t a t i o n r o E x e c u t i o n c o m p l e t i o n ) ' c o m p l e t i o n W L ' = p O ( 2 6 8 9 A L U S r c A = 1 A L U S r c A = 1 A L U S r c B = 0 0 A L U S r c A = 1 P C W r i t e A L U S r c B = 1 0 A L U O p = 0 1 A L U S r c B = 0 0 P C S o u r c e = 1 0 A L U O p = 0 0 P C W r i t e C o n d A L U O p = 1 0 P C S o u r c e = 0 1 ( ) O ' p W = L ' ' S = W ' ) p M e m o r y M e m o r y O ( a c c e s s a c c e s s R - t y p e c o m p l e t i o n 3 5 7 R e g D s t = 1 M e m R e a d M e m W r i t e R e g W r i t e I o r D = 1 I o r D = 1 M e m t o R e g = 0 W r i t e - b a c k s t e p 4 R e g D s t = 0 R e g W r i t e M e m t o R e g = 1 Address Select Logic
I n s t r u c t i o n d e c o d e / I n s t r u c t i o n f e t c h r e g i s t e r f e t c h 0 M e m R e a d 1 A L U S r c A = 0 I o r D = 0 A L U S r c A = 0 I R W r i t e A L U S r c B = 1 1 S t a r t A L U S r c B = 0 1 A L U O p = 0 0 A L U O p = 0 0 P C W r i t e ) P C S o u r c e = 0 0 ' ) e Q ) p y ' t E - J R ' B = ' p = = O ( ) p ' p W M e m o r y a d d r e s s S O ' O = B r a n c h ( J u m p p ( O ( c o m p u t a t i o n r o E x e c u t i o n c o m p l e t i o n ) ' c o m p l e t i o n W L ' = p O ( 2 6 8 9 A L U S r c A = 1 A L U S r c A = 1 A L U S r c B = 0 0 A L U S r c A = 1 P C W r i t e A L U S r c B = 1 0 A L U O p = 0 1 A L U S r c B = 0 0 P C S o u r c e = 1 0 A L U O p = 0 0 P C W r i t e C o n d A L U O p = 1 0 P C S o u r c e = 0 1 ( ) O ' p W = L ' ' S = W ' ) p M e m o r y M e m o r y O ( a c c e s s a c c e s s R - t y p e c o m p l e t i o n 3 5 7 R e g D s t = 1 M e m R e a d M e m W r i t e R e g W r i t e I o r D = 1 I o r D = 1 M e m t o R e g = 0 W r i t e - b a c k s t e p 4 R e g D s t = 0 R e g W r i t e M e m t o R e g = 1 Dispatch ROMs Dispatch ROM 1 Op Opcode name Value 000000 R-format 0110 jmp 000010 1001 beq 000100 1000 lw 100011 0010 sw 101011 0010 Dispatch ROM 2 Op Opcode name Value lw 100011 0011 sw 101011 0101
Microprogramming Microprogramming: Designing the control as a program that implements the machine instructions in terms of microinstructions
Microinstruction ??? User program plus Data this can change! Main Memory ADD SUB AND . . . one of these is mapped into one of these DATA execution unit AND microsequence e.g., Fetch Calc Operand Addr Fetch Operand(s) Calculate Save Answer(s) CPU control memory
Microprogramming a Multicycle Processor 1) Choose datapath and sequencer architecture 2) Assign states and sequence of each (multicycle)instruction (i.e., define the controller FSM) 3) Choose microinstruction format (minimum bits to describe all allowable functions of sequencer and datapath) 4) Map instructionsinto microinstruction sequences
Designing a Microinstruction Set 1) Start with list of control signals 2) Group signals together that make sense: called “fields” 3) Place fields in some logical order (e.g., ALU operation & ALU operands first andmicroinstruction sequencing last) 4) Create a symbolic legend for the microinstruction format, showing name of field values and how they set the control signals 5) To minimize the width, encode operations that will never be used at the same time
PCWriteCond ALUOp PCWrite PCSource IorD ALUSrcB MemRead ALUSrcA Control MemWrite RegWrite ExtOp Op [5-0] MemtoReg RegDst IRWrite Instruction [31-26] 0 1 0 Address PC Instruction [25-21] 0 1 1 Rs Memory Read data 1 A MemData Instruction [20-16] Zero Rt ALU ALU Out 0 0 Registers Write data 0 B 1 1 Read data 2 Inst [15-11] Instruction [15-0] Rw 1 4 Write data Instruction register 2 3 Shift left 2 ALU Control MDR Extender 16 32 Instruction [ 5-0] Multicycle Processor