1 / 28

EE204 Computer Architecture

EE204 Computer Architecture. Single Cycle Data path Performance. Performance of Single-Cycle Machines.

vea
Download Presentation

EE204 Computer Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE204Computer Architecture Single Cycle Data path Performance Hina Anwar Khan 2011

  2. Performance of Single-Cycle Machines • Let's assume that the operation time for the following units is: Memory - 2 nanoseconds (ns), ALU and adders - 2 ns, Register file - 1 ns. We will assume that MUXs, control, sign-extension, PC accesses, and wires have no delays. • Which implementation is faster? 1. Every instruction operates in 1 clock cycle of fixed length.2. Every instruction operates in a varying length clock cycle. • Lets look at the time needed by each instruction: Inst. Fetch Reg. Rd ALU op Memory Reg. Wr TotalR-Type 2 1 2 0 1 6nsLoad 2 1 2 2 1 8nsStore 2 1 2 2 7nsBranch 2 1 2 5nsJump 2 2ns Hina Anwar Khan Spring 2011

  3. Fixed vs. Variable Cycle Length • Lets Assume a program has the following instruction mix: 24% loads, 12% stores, 44% R-type, 18% branches, 2% jumps. • For the fixed cycle length the cycle time is 8 ns, long enough for the longest instruction (load). Thus each instruction takes 8 ns to execute. • For the variable cycle time the average CPU clock cycle is:8*24% + 7*12% + 6*44% + 5*18% + 2*2% = 6.3 ns • It is obvious that the variable clock implementation is faster but it is extremely hard to implement. • Variable clock implementation is 8/6.3 = 1.27 times faster • When adding instructions such as multiply and divide which can take tens of cycles this scheme is too slow. Hina Anwar Khan Spring 2011

  4. Observations on the Single Cycle Design • The single-cycle datapath is straightforward, but... • It has to use 3 separate ALU’s • It has separate Instruction and Data memories • Cycle time is determined by worst-case path • A multi-cycle datapath might be better • We can reuse some of the hardware • We can combine the memories • Cycle time is still constant, but instructions may take differing numbers of cycles Hina Anwar Khan Spring 2011

  5. Multi-Cycle Implementation • Multi-Cycle Implementation • Each step in execution = 1 clock • Each Instruction of different clock cycles • Functional unit can be used more than once per instruction as long as it is used on different clock cycles • Reduce and Share Hardware units Hina Anwar Khan Spring 2011

  6. Multicycle Datapath Single Instruction & Data Memory Single ALU Registers Hina Anwar Khan Spring 2011

  7. Multicycle Execution • Instruction Register (IR) • Holds instruction until end of execution • Memory Data Register (MDR) • A Register • B Register • ALUOut Register Hina Anwar Khan Spring 2011

  8. Multicycle Datapath Branch target address Address Register Block Address Inst/Data Memory Instruction PC = PC +4 ALU Data Arithmetic/branch Instruction lw/sw Instruction Hina Anwar Khan Spring 2011

  9. Multicycle Datapath Hina Anwar Khan Spring 2011

  10. MultiCycle Datapath & Control Signals Hina Anwar Khan Spring 2011

  11. One Single ALU • One single ALU is used to perform all of the necessary functions: • An arithmetic operation on two register operands • Add a register to a sign-extended constant, for computing memory addresses in lw/sw instructions • Compute PC+4 to increment the PC • Add a sign-extended, shifted offset to (PC+4) for branches Hina Anwar Khan Spring 2011

  12. Implications of Shared Functional Units • Need to add multiplexors or expand existing multiplexors • e.g. Memory unit now contains both instructions (address in PC) and data (address in ALUOut) • e.g. ALU now must accommodate all inputs from previous ALU and adders. Hina Anwar Khan Spring 2011

  13. Two extra multiplexers • To enable all the actions listed for the ALU, two extra multiplexers are needed • A 2-to-1 mux, ALUsrcA, selects whether the first ALU input is the PC or a register • A 4-to-1 mux, ALUSrcB, selects the 2nd input from among • the register file • a constant 4 • a sign-extended constant, and • a sign-extended and shifted constant Hina Anwar Khan Spring 2011

  14. One single memory • One single memory is used in both the instruction fetch and data access stages. • The address for this memory may come from: • the PC register, when fetching an instruction • the ALU output, when doing a lw/sw instruction and need the effective memory address. • => add a 2-to-1 mux, IorD, to select whether the memory is being accessed for instructions or for data. Hina Anwar Khan Spring 2011

  15. Breaking Instruction into Clock Cycles • Goal: balance the amount of work done in each cycle so that we can minimize clock period. • Restrict each step to contain at most 1 of • ALU operation • Register File Access • Memory Access • Clock cycle time will be longest of above operations. Hina Anwar Khan Spring 2011

  16. Complete Multicycle Datapath Hina Anwar Khan Spring 2011

  17. Arithmetic Instruction Steps • Instruction Fetch • IR = Mem[PC] • PC = PC + 4 • Instruction Decode • A = Reg[IR[25-21]] • B = Reg[IR[20-16]] • Instruction Execution • ALUOut = A op B • Store Result • Reg[IR[15-11]] = ALUOut Hina Anwar Khan Spring 2011

  18. lw Instruction Steps • Instruction Fetch • IR = Mem[PC] • PC = PC + 4 • Instruction Decode • A = Reg[IR[25-21]] • Address calculate • ALUOut = A + sign-extd. (IR[15 – 0]) • Memory Access • MDR = Memory[ALUOut] • Memory read completion • Reg[IR[20-16]] = MDR Hina Anwar Khan Spring 2011

  19. sw Instruction Steps • Instruction Fetch • IR = Mem[PC] • PC = PC + 4 • Instruction Decode • A = Reg[IR[25-21]] • B = Reg[IR[20-16]] • Address calculate • ALUOut = A + sign-extd. (IR[15 – 0]) • Memory write completion • Mem[ALUOut] = B Hina Anwar Khan Spring 2011

  20. Branch Instruction Steps • Instruction Fetch • IR = Mem[PC] • PC = PC + 4 • Instruction Decode • A = Reg[IR[25-21]] • B = Reg[IR[20-16]] • ALUOut = PC + (sign-extd.(IR[15-0]) << 2) • Branch Execution • If (A == B) PC = ALUOut Hina Anwar Khan Spring 2011

  21. Jump Instruction • Instruction Fetch • IR = Mem[PC] • PC = PC + 4 • Jump Execution • PC = PC[31-28] || (IR[25-0] <<2) Hina Anwar Khan Spring 2011

  22. Breaking instruction into steps • Instruction Fetch • IR = Mem[PC] all instructions • PC = PC + 4 all instructions • Instruction Decode • A = Reg[IR[25-21]] all inst. except jump • B = Reg[IR[20-16]] arith. & branch • ALUOut = PC + (sign-extd.(IR[15-0]) << 2) branch inst. only Hina Anwar Khan Spring 2011

  23. Breaking instruction into steps • Execution, Mem. Address calc. or branch • ALUOut = A + sign-extd. (IR[15 – 0]) lw/sw inst. • ALUOut = A op B arith. inst. • If (A == B) PC = ALUOut branch inst. • PC = PC[31-28] || (IR[25-0] <<2) jump inst. • Memory Access or R-type Inst. Completion • MDR = Memory[ALUOut] lw inst. • Mem[ALUOut] = B sw inst. • Reg[IR[15-11]] = ALUOut arith. inst. Hina Anwar Khan Spring 2011

  24. Break Instruction into steps • Memory read completion • Reg[IR[20-16]] = MDR lw inst. Hina Anwar Khan Spring 2011

  25. Finite State Machine Control Hina Anwar Khan Spring 2011

  26. Sh.Left2 2 0 1 Registers PC 0 0 Read reg num A Read address Read reg data A 1 1 Memory Read reg num B Zero Read data Result Write address 0 Write reg num 0 Read reg data B 1 Write data 1 Write reg data 2 1 ALUcontrol 3 0 Sh.Left2 signextend Instr. [31-0] Instr. Reg Cycle 1 All instructions Instruction Fetch PCSource 28 26 Concat. 32 0 1 x PCWrite Control 4 PCWriteCond 0 ALUOp Zero Inst[25-0] [31-28] Inst[31-26] 0 IorD MemRead 1 1 x 0 x 0 ALUSelA MemWrite MemToReg RegWrite IRWrite RegDest PCorPC+4 0 [25-21] A [20-16] ALUOut ALUSelB 1 [15-11] B 4 IorD=0MemRead=1MemWrite=0IRWrite=1ALUSelA=0 ALUSelB=1 MDR ALUOp=0PCWrite=1PCSource=0RegWrite=0 16 32 [15-0] [5-0] Hina Anwar Khan Spring 2011

  27. Cycle 2 All instructions Sh.Left2 2 0 1 Registers PC 0 0 Read reg num A Read address Read reg data A 1 1 Memory Read reg num B Zero Read data Result Write address 0 Write reg num 0 Read reg data B 1 Write data 1 Write reg data 2 1 ALUcontrol 3 0 Sh.Left2 signextend Instr. [31-0] Instr. Reg Instr. Decode/Reg. Fetch PCSource 28 26 Concat. 32 x 0 0 PCWrite Control 4 PCWriteCond 0 ALUOp Zero Inst[25-0] [31-28] Inst[31-26] 0 x IorD MemRead x 0 0 x 0 ALUSelA MemWrite MemToReg RegWrite IRWrite RegDest PCorPC+4 0 [25-21] A [20-16] ALUOut ALUSelB 3 [15-11] B 4 MDR MemRead=0MemWrite=0IRWrite=0ALUSelA=0 ALUSelB=3 ALUOp=0PCWrite=0PCWriteCond=0RegWrite=0 16 32 [15-0] [5-0] Hina Anwar Khan Spring 2011

  28. Fetch & Decode Instruction State Diagram Hina Anwar Khan Spring 2011

More Related