370 likes | 385 Views
CS-447– Computer Architecture M,W 10-11:20am Lecture 11 Single Cycle Datapath. October 3rd, 2007 Majd F. Sakr msakr@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/. Lecture Objectives. Learn what a datapath is, and how does it provide the required functions.
E N D
CS-447– Computer Architecture M,W 10-11:20amLecture 11Single Cycle Datapath October 3rd, 2007 Majd F. Sakrmsakr@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/
Lecture Objectives • Learn what a datapath is, and how does it provide the required functions. • Appreciate why different implementation strategies affects the clock rate and CPI of a machine. • Understand how the ISA determines many aspects of the hardware implementation.
Implementation vs. Performance Performance of a processor is determined by • Instruction count of a program • CPI • Clock cycle time (clock rate) The compiler & the ISA determine the instruction count. The implementation of the processor determines the CPI and the clock cycle time.
Possible Execution Steps of Any Instructions • Instruction Fetch • Instruction Decode and Register Fetch • Execution of the Memory Reference Instruction • Execution of Arithmetic-Logical operations • Branch Instruction • Jump Instruction
Instruction Processing • Five steps: • Instruction fetch (IF) • Instruction decode and operand fetch (ID) • ALU/execute (EX) • Memory (not required) (MEM) • Write-back (WB) WB IF EX ID MEM
Datapath & Control Control
Datapath Elements The data path contains 2 types of logic elements: • Combinational: (e.g. ALU) Elements that operate on data values. Their outputs depend on their inputs. • State: (e.g. Registers & Memory) Elements with internal storage. Their state is defined by the values they contain.
Pentium Processor Die • State • Registers • Memory • Control ROM • Combinational logic (Compute) REG
Single Cycle Implementation • This simple processor can compute ALU instructions, access memory or compute the next instruction's address in a single cycle.
Program Counter If each instruction needs 4 memory locations then, Next PC <= PC + 4
PC Datapath – Branch Offset PC <= PC + Branch Offset
The Register File • Arithmetic & Logical instructions (R-type), read the contents of 2 registers, perform an ALU operation, and write the result back to a register. • Registers are stored in the register file. The register file has inputs to specify the registers, outputs for the data read, input for the data written and 1 control signal to decide if data should be written in. In addition we will need an ALU to perform the operations.
R-Type Instructions • Assembly (e.g., register-register signed addition) • ADD rdreg rsreg rtreg • Machine encoding • Semantics • if MEM[PC] == ADD rd rs rt • GPR[rd] ← GPR[rs] + GPR[rt] • PC ← PC + 4
I-Type ALU Instructions • Assembly (e.g., register-immediate signed additions)ADDI rtreg rsreg immediate16 • Machine encoding • Semantics if MEM[PC] == ADDI rt rs immediate GPR[rt] ← GPR[rs] + sign-extend (immediate) PC ← PC + 4
Data Memory • The element needed to implement load and store instructions are data memory. In addition we use the existing ALU to compute the address to access. • The data memory has 2 x-bit inputs: the address and the write data, and 1 x-output: the read data. In addition it has 2 control lines: MemWrite and MemRead.
Load Instruction • Assembly (e.g., load 4-byte word) LW rtreg offset16 (basereg) • Machine encoding • Semantics if MEM[PC]==LW rt offset16 (base) EA = sign-extend(offset) + GPR[base] GPR[rt] ← MEM[ translate(EA) ] PC ← PC + 4
Branch Equal • The beq (branch if equal) instruction has 3 operands two registers that are compared for equality and a n-bit offset used to compute the branch address relative to the PC.
Unconditional Jump • Assembly J immediate26 • Machine encoding • Semantics if MEM[PC]==J immediate26 target = { PC[31:28], immediate26, 2’b00 } PC ← target
Combining ALU and Memory Instructions • The ALU datapath and the Memory datapath are similar. The differences are: • The second input to the ALU is a register (R-type) or the offset (I-type). • The value stored into the destination register comes from the ALU (R-type) or from memory (I-type) . • Using 2 multiplexers (Mux) we can combine both datapaths.
What’s Wrong with Single Cycle? • All instructions run at the speed of the slowest instruction. • Adding a long instruction can hurt performance • What if you wanted to include multiply? • You cannot reuse any parts of the processor • We have 3 different adders to calculate PC+1, PC+1+offset and the ALU • No profit in making the common case fast • Since every instruction runs at the slowest instruction speed • This is particularly important for loads as we will see later
What’s Wrong with Single Cycle? 1 ns – Register read/write time 2 ns – ALU/adder 2 ns – memory access 0 ns – MUX, PC access, sign extend, ROM add: 2ns + 1ns + 2ns + 1ns = 6 ns beq: 2ns + 1ns + 2ns = 5 ns sw: 2ns + 1ns + 2ns + 2ns = 7 ns lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns Get read ALU mem write Instr reg operation reg
Computing Execution Time Assume: 100 instructions executed 25% of instructions are loads, 10% of instructions are stores, 45% of instructions are adds, and 20% of instructions are branches. Single-cycle execution: 100 * 8ns = 800ns Optimal execution: 25*8ns + 10*7ns + 45*6ns + 20*5ns = 640ns