180 likes | 302 Views
Chapter 5: Datapath and Control (Part 3). CS 447 Jason Bakos. Single-Cycle CPU. CPI of the single cycle CPU from the last lecture had a CPI of 1 Clock cycle is determined by the longest possible path in the machine loads are the worst – they use 5 functional units in series
E N D
Chapter 5: Datapath and Control(Part 3) CS 447 Jason Bakos
Single-Cycle CPU • CPI of the single cycle CPU from the last lecture had a CPI of 1 • Clock cycle is determined by the longest possible path in the machine • loads are the worst – they use 5 functional units in series • Performance, utilization, and efficiency are not going to be good, because most instructions don’t need such a long clock cycle • A variable-speed clock could be used to solve this problem, but hinders parallelism • Pipelining overlaps instruction executions
Multicycle Implementation • Break instructions into steps, where each step requires one clock cycle • We want to reuse functional units within an instruction instead of just across instructions • Reduces hardware • Use single memory for instructions and data • Single ALU instead of one ALU and two adders • Add registers to functional units to hold intermediate results (state data) for future cycles • Use within instruction executions • Register file and memory hold state data to be used across instruction executions • These are programmer-visible • We will need a FSM to control CPU
Registers • Locations of registers is determined by the following: • What combinatorial units will fit in one clock cycles • Assume memory access, regfile access (two reads or one write), or ALU operation • Any data needed by these operations must be stored in a temporary register • Instruction Register, Memory Data Register, A, B, and ALUOut registers added to design • All these except IR only need to hold data between two adjacent clock cycles • What data are needed in later cycles implementing the instruction
Multiplexors • Need to add extra multiplexors (or expand existing muxes) to facilitate the reuse of the ALU within instructions • Add mux to first ALU input • Expand mux to second ALU input
Breaking Instruction Execution into Clock Cycles • Goal is to balance the latency of the operations performed during each clock cycle • At most one of the following can occur in series: • One ALU operation • One register file access (or multiple in parallel) • One memory access (this is a joke, but we’ll accept this for now)
Execution Stages • In order to clearly define the CPU operation for each step in the operation, we’ll use RTL (register transfer language) • Architecture research has defined 5 standard phases of instruction execution • Instruction fetch • Decode • Fetch register values from register file • Execute • Perform arithmetic/logic operation • Memory • Load/Store memory • Write back • Write register result back to register file
Execution Stages • Fetch • IR=Memory[PC] • PC=PC+4 • Decode • A=Reg[IR[25..21]] • B=Reg[IR[20..16]] • ALUOut=PC+(sign_extend(IR[15..0]) << 2
Execution Stages • Execute • Memory access • ALUOut=A+sign_extend(IR[15..0]) • R-type • ALUOut=A op B • Branch (beq) • if (A==B) PC=ALUOut • PC=PC[31..28] || (IR[25..0]<<2)
Execution Stages • Memory Access/Write Back • Load • MDR=Memory[ALUOut] • Store • Memory[ALUOut]=B • R-type • Reg[IR[15..11]]=ALUOut • Memory Read Completion • Load • Reg[IR[20..16]]=MDR
Control Signals • Control Unit signals • Refer to figure 5.34 (pg. 384) in the book • ALU Control signals • Provide an appropriate ALUOp signal based on what the ALU is being used for (if for an R-type, perform lookup based on function code)
Control Signals • All that’s left is for us to build the control unit as a FSM and the ALU control as a lookup table
Control Unit • The fetch and decode stages are the same for every instruction...
Control Unit • Here’s the states and transitions for the memory-reference instructions
Control Unit • Here’s the states and transitions for R-type, branch, and jump instructions
Control Unit • Final control unit FSM...
Problems to Think About • How could we add bne, blt, and bgez instructions to our CPU? • Do do you calculate CPI for our CPU if we are given instruction-type distributions?