EE204 Computer Architecture

EE204Computer Architecture Single Cycle Data path Performance Hina Anwar Khan 2011

Performance of Single-Cycle Machines • Let's assume that the operation time for the following units is: Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register file - 1 ns. We will assume that MUXs, control, sign-extension, PC accesses, and wires have no delays. • Which implementation is faster? 1. Every instruction operates in 1 clock cycle of fixed length.2. Every instruction operates in a varying length clock cycle. • Lets look at the time needed by each instruction: Inst. Fetch Reg. Rd ALU op Memory Reg. Wr TotalR-Type 2 1 2 0 1 6nsLoad 2 1 2 2 1 8nsStore 2 1 2 2 7nsBranch 2 1 2 5nsJump 2 2ns Hina Anwar Khan Spring 2011

Fixed vs. Variable Cycle Length • Lets Assume a program has the following instruction mix: 24% loads, 12% stores, 44% R-type, 18% branches, 2% jumps. • For the fixed cycle length the cycle time is 8 ns, long enough for the longest instruction (load). Thus each instruction takes 8 ns to execute. • For the variable cycle time the average CPU clock cycle is:8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns • It is obvious that the variable clock implementation is faster but it is extremely hard to implement. • Variable clock implementation is 8/6.3 = 1.27 times faster • When adding instructions such as multiply and divide which can take tens of cycles this scheme is too slow. Hina Anwar Khan Spring 2011

Observations on the Single Cycle Design • The single-cycle datapath is straightforward, but... • It has to use 3 separate ALU’s • It has separate Instruction and Data memories • Cycle time is determined by worst-case path • A multi-cycle datapath might be better • We can reuse some of the hardware • We can combine the memories • Cycle time is still constant, but instructions may take differing numbers of cycles Hina Anwar Khan Spring 2011

Multi-Cycle Implementation • Multi-Cycle Implementation • Each step in execution = 1 clock • Each Instruction of different clock cycles • Functional unit can be used more than once per instruction as long as it is used on different clock cycles • Reduce and Share Hardware units Hina Anwar Khan Spring 2011

Multicycle Datapath Single Instruction & Data Memory Single ALU Registers Hina Anwar Khan Spring 2011

Multicycle Execution • Instruction Register (IR) • Holds instruction until end of execution • Memory Data Register (MDR) • A Register • B Register • ALUOut Register Hina Anwar Khan Spring 2011

Multicycle Datapath Branch target address Address Register Block Address Inst/Data Memory Instruction PC = PC +4 ALU Data Arithmetic/branch Instruction lw/sw Instruction Hina Anwar Khan Spring 2011

Multicycle Datapath Hina Anwar Khan Spring 2011

MultiCycle Datapath & Control Signals Hina Anwar Khan Spring 2011

One Single ALU • One single ALU is used to perform all of the necessary functions: • An arithmetic operation on two register operands • Add a register to a sign-extended constant, for computing memory addresses in lw/sw instructions • Compute PC+4 to increment the PC • Add a sign-extended, shifted offset to (PC+4) for branches Hina Anwar Khan Spring 2011

Implications of Shared Functional Units • Need to add multiplexors or expand existing multiplexors • e.g. Memory unit now contains both instructions (address in PC) and data (address in ALUOut) • e.g. ALU now must accommodate all inputs from previous ALU and adders. Hina Anwar Khan Spring 2011

Two extra multiplexers • To enable all the actions listed for the ALU, two extra multiplexers are needed • A 2-to-1 mux, ALUsrcA, selects whether the first ALU input is the PC or a register • A 4-to-1 mux, ALUSrcB, selects the 2nd input from among • the register file • a constant 4 • a sign-extended constant, and • a sign-extended and shifted constant Hina Anwar Khan Spring 2011

One single memory • One single memory is used in both the instruction fetch and data access stages. • The address for this memory may come from: • the PC register, when fetching an instruction • the ALU output, when doing a lw/sw instruction and need the effective memory address. • => add a 2-to-1 mux, IorD, to select whether the memory is being accessed for instructions or for data. Hina Anwar Khan Spring 2011

EE204 Computer Architecture