700 likes | 862 Views
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013. Chapter 4. The Processor. Zhao Zhang Iowa State University Revised from original slides provided by MKP. Week 8 Overview. CPU design overview Datapath and Control Control Unit ALU Control Unit. Announcements.
E N D
CprE 381 Computer Organization and Assembly Level Programming, Fall 2013 Chapter 4 The Processor Zhao Zhang Iowa State University Revised from original slides provided by MKP
Week 8 Overview • CPU design overview • Datapath and Control • Control Unit • ALU Control Unit Chapter 1 — Computer Abstractions and Technology — 2
Announcements • Mini-project B starts in week 9 • Mini-projects B and C will be revised • The grading scale will be discussed by Friday (week 8) Chapter 1 — Computer Abstractions and Technology — 3
Introduction §4.1 Introduction • CPU performance factors • Instruction count • Determined by ISA and compiler • CPI and Cycle time • Determined by CPU hardware • We will examine two MIPS implementations • A simplified, single-cycle version • A more realistic, pipelined version Chapter 4 — The Processor — 4
Nine-Instruction MIPS • We will first use a MIPS subset of nine instructions, then extend the subset • It’s enough to illustrate the most aspects of CPU design, particularly datapath and control design Memory reference: LW and SW Arithmetic/logic: ADD, SUB, AND, OR, SLT Branch: BEQ, BNE Chapter 1 — Computer Abstractions and Technology — 5
Instruction Execution • PC instruction memory, Fetchinstruction • Register numbers register file, Read registers Then, depending on instruction class • Execute: Use ALU to calculate • Arithmetic result • Memory address for load/store • Branch target address • Memory access: Access data memory for load/store • Register writeback: Write data back to registers PC update (for all): PC target address or PC + 4 Chapter 4 — The Processor — 6
CPU Overview A Sketchy view Next Sequential PC = PC + 4 Branch Target = (PC+4)+offset An instruction may change PC (all instructions) Some register (arithmetic/logic, load) Some memory word/halfword/byte (store) Chapter 4 — The Processor — 7
Multiplexers • Can’t just join wires together • Use multiplexers What would happen if you just join signals in VHDL? Chapter 4 — The Processor — 8
Control Control signals: mux select, read/write enable, ALU opcode, etc. Chapter 4 — The Processor — 9
Logic Design Basics • Combinational element • Operate on data • Output is a function of input • State (sequential) elements • Store information • Output is a function of internal state and input §4.2 Logic Design Conventions Chapter 4 — The Processor — 10
A Y B A A Mux I0 Y + Y Y I1 ALU B B S F Combinational Elements • AND-gate • Y = A & B • Adder • Y = A + B • Arithmetic/Logic Unit • Y = F(A, B) • Multiplexer • Y = S ? I1 : I0 Chapter 4 — The Processor — 11
D Q Clk Clk D Q Sequential Elements • Register: stores data in a circuit • Uses a clock signal to determine when to update the stored value • Edge-triggered: update when Clk changes from 0 to 1 • Data output Q is stable for a clock cycle Chapter 4 — The Processor — 12
Clk D Q Write Write D Clk Q Sequential Elements • Register with write control • Only updates on clock edge when write control input is 1 • VHDL: rising_edge(Clk) AND Write • Used when stored value is required later Chapter 4 — The Processor — 13
Clocking Methodology • Combinational logic transforms data during clock cycles • Input from state elements • Output must stabilize within one cycle • Longest delay determines clock period • Output to state element at the next rising edge Chapter 4 — The Processor — 14
Clocking Methodology • Processor is a big state machine • Works like a Moore machine in non-I/O phase • Output is a function of the state • States include PC, all registers and memory contents Chapter 1 — Computer Abstractions and Technology — 15
Building a Datapath • Datapath elements • Elements that process data and addressesin the CPU • Registers, ALUs, mux’s, memories, … • We will build a MIPS datapath incrementally • Refining the overview design §4.3 Building a Datapath Chapter 4 — The Processor — 16
Instruction Fetch Increment by 4 for next instruction 32-bit register Datapath elements: PC register, instruction memory, 32-bit adder Chapter 4 — The Processor — 17
R-Format Instructions • Read two register operands • Perform arithmetic/logical operation • Write register result Datapath elements: Register file, ALU Chapter 4 — The Processor — 18
Load/Store Instructions • Read register operands • Calculate address using 16-bit offset • Use ALU, but sign-extend offset • Load: Read memory and update register • Store: Write register value to memory Datapath elements: Data memory, sign extender Chapter 4 — The Processor — 19
Branch Instructions • Read register operands • Compare operands • Use ALU, subtract and check Zero output • Calculate target address • Sign-extend displacement • Shift left 2 places (word displacement) • Add to PC + 4 • Already calculated by instruction fetch Chapter 4 — The Processor — 20
Branch Instructions Justre-routes wires Sign-bit wire replicated New: Shifter, 2nd 32-bit Adder Chapter 4 — The Processor — 21
Composing the Elements • First-cut data path does an instruction in one clock cycle • Each datapath element can only do one function at a time • Hence, we need separate instruction and data memories • Use multiplexers where alternate data sources are used for different instructions Chapter 4 — The Processor — 22
R-Type/Load/Store Datapath Chapter 4 — The Processor — 23
Full Datapath Chapter 4 — The Processor — 24
Performance Issues • Longest delay determines clock period • Critical path: load instruction • Instruction memory register file ALU data memory register file • Not every instruction requires the same time Chapter 4 — The Processor — 25
Performance Issues • Some instructions may take substantially longer time, e.g. multiply/division • Not feasible to vary clock cycle for different instructions • Must use the worst-case delay as the clock cycle • Violates design principle making the common case fast • We will improve performance by pipelining Chapter 4 — The Processor — 26
ALU Control • ALU used for • Load/Store: F = add • Branch: F = subtract • R-type: F depends on funct field §4.4 A Simple Implementation Scheme Chapter 4 — The Processor — 27
ALU Control • Assume 2-bit ALUOp derived from opcode • Combinational logic derives ALU control Chapter 4 — The Processor — 28
VHDL Notes • How to program the ALU control? -- Behavior style process (alu_op, funct) begin case alu_op is when ‘00’ => alu_code <= ‘0010’; when ’01’ => … end case; end process; Chapter 1 — Computer Abstractions and Technology — 29
0 4 35 or 43 rs rs rs rt rt rt rd address address shamt funct 31:26 31:26 31:26 25:21 25:21 25:21 20:16 20:16 20:16 15:11 10:6 15:0 15:0 5:0 The Main Control Unit • Control signals derived from instruction R-type Load/Store Branch opcode always read read, except for load write for R-type and load sign-extend and add Chapter 4 — The Processor — 30
Datapath With Control Chapter 4 — The Processor — 31
Summary of Control Signals • RegDst: Write to register rt or rd? • ALUSrc: Immediate to ALU? • MemtoReg: Write memory or ALU output? • RegWrite: Write to regfile at all? • MemRead: Read from Data Memory? • MemWrite: Write to the Data Memory? • Branch: Is it a branch intruction? • ALUOp[1:0]: ALU control field Chapter 1 — Computer Abstractions and Technology — 32
R-Type Instruction Chapter 4 — The Processor — 33
R-Type: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] 1(write to rd) 0 (No immediate) 0 (wrote not from memory) 1 (does write regfile) 0 (no memory read) 0 (no memory write) 0 (does write regfile) 10 (R-type ALU op) Chapter 1 — Computer Abstractions and Technology — 34
Load Instruction Chapter 4 — The Processor — 35
Load: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] 0 1 1 1 1 0 0 00 Chapter 1 — Computer Abstractions and Technology — 36
Store: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] X 1 X 0 0 1 0 00 Chapter 1 — Computer Abstractions and Technology — 37
Branch-on-Equal Instruction Chapter 4 — The Processor — 38
BEQ: Control Signals RegDst ALUSrc MemtoReg RegWrite MemRead MemWrite Branch ALUOp[1:0] X 0 X 0 0 0 1 01 Chapter 1 — Computer Abstractions and Technology — 39
Control Signal Setting • What’re the control signal values for each instruction or instruction type? Note: “R-” means R-format Chapter 1 — Computer Abstractions and Technology — 40
VHDL Notes • How to program the control? entity control is port (op_code : in m32_6bits; reg_dst : out m32_1bit; alu_src : out m32_1bit; mem_to_reg : out m32_1bit; reg_write : out m32_1bit; mem_read : out m32_1bit; mem_write : out m32_1bit; branch : out m32_1bit; alu_op : out m32_2bits); end control; Chapter 1 — Computer Abstractions and Technology — 41
VHDL Notes architecture rom of control is subtype code_t is m32_vector(8 downto 0); type rom_t is array (0 to 63) of code_t; -- The ROM content for control signals signal rom : rom_t := ( 00 => "100100010", -- R-type 35 => "011110000", -- LW … -- More for other instructions others=>"000000000"); begin (reg_dst, alu_src, mem_to_reg, reg_write, mem_read, mem_write, branch, alu_op(1), alu_op(0)) <= rom(to_integer(unsigned(op_code))); end rom; Chapter 1 — Computer Abstractions and Technology — 42
2 address 31:26 25:0 Implementing Jumps • Jump uses word address • Update PC with concatenation of • Top 4 bits of old PC • 26-bit jump address • 00 • Need an extra control signal decoded from opcode Jump Chapter 4 — The Processor — 43
Datapath With Jumps Added Chapter 4 — The Processor — 44
Grading Scale • Tentative grading scale A: 90, A-: 87 B+: 84, B: 80, B-: 75 C+: 70, C: 65, C-: 60 D: 50 • There will be a bonus in lab projects Chapter 1 — Computer Abstractions and Technology — 45
Mini-Project B, Tentative Implement single-cycle processor (SCP). There will be three parts • Part 1, SCPv1: Implement the nine-instruction ISA plus the J instruction • Part 2, SCPv2a: Support all the instructions needed to run bubble sorting • Part 3, SCPv2b: Detailed modeling of data elements Chapter 1 — Computer Abstractions and Technology — 46
Mini-Project B • Bonus part, SCPv3: Support all integer instructions on the green sheet, due in the last lab • Some support files will be provided • High-level modeling of Register File, ALU, Adder, to be used in Parts 1 and 2 • Partial sample VHDL code will be provided Chapter 1 — Computer Abstractions and Technology — 47
Mini-Project B • The CPU composition must be strongly structural • Parts 1 and 2 may use behavior/dataflow modeling for data elements • Part 3 must use detailed modeling for data elements – Reuse your VHDL code in the labs Chapter 1 — Computer Abstractions and Technology — 48
Extend Single-Cycle MIPS Consider the following instructions • addi: add immediate • sll: Shift left logic by a constant • bne: branch if not equal • jal: Jump and link • jr: Jump register Chapter 1 — Computer Abstractions and Technology — 49
SCPv0: R-Format, LW/SW, BEQ Chapter 4 — The Processor — 50