270 likes | 290 Views
Understanding the CPU performance factors such as instruction count, CPI, and cycle time. Learn the steps of processor design and overview of instruction execution in a computer system. Includes logic design conventions, MIPS instruction subset, and execution cycle details. Processor organization overview with a focus on datapath components and control logic. Introduction to memory reference, arithmetic, logical operations, and control transfer in processor design. Exploring the performance perspective and components influencing clock cycle time and CPI. Processor design steps and implementation considerations covered in depth.
E N D
Computer OrganizationCS224 Fall 2012 Lesson 22
The Big Picture The Five Classic Components of a Computer Chapter 4 Topic: Processor Design Processor Input Control Memory Datapath Output
Introduction §4.1 Introduction • CPU performance factors • Instruction count • Determined by ISA and compiler • CPI and Cycle time • Determined by CPU hardware • We will examine two MIPS implementations • A simplified version • A more realistic pipelined version • Simple subset, shows most aspects • Memory reference: lw, sw • Arithmetic/logical: add, sub, and, ori, slt • Control transfer: beq, j
The Performance Perspective Performance of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction Processor design (datapath and control) will determine: Clock cycle time--CCT Clock cycles per instruction--CPI This week: Single cycle processor (datapath + control) Advantage: One clock cycle per instruction Disadvantage: long cycle time CPI Inst. Count Cycle Time
Processor Design Steps 1. Analyze instruction set => datapath requirements the meaning of each instruction is given by the register transfers (ISA model => RTL model) datapath must include storage element for ISA registers possibly more datapath must support each register transfer 2. Select set of datapath components and establish clocking methodology 3. Assemble datapath meeting the RTL requirements
Processor Design (cont’d) 4. Analyze implementation of each instruction to determine setting of control points that effect the register transfer. 5. Assemble the control logic 6. RTL datapath and control design are refined to track physical design and functional validation Changes made for timing and errata (a.k.a. “bug”) fixes Amount of work varies with capabilities of CAD tools and degree of optimization for cost/performance
Subset of Instructions To simplify our study of processor design, we will focus on a subset of the MIPS instructions Memory: lw and sw Arithmetic: add, sub, and, ori, and slt Branch: beq and j Example in lecture uses ori rather than or covered in text, to demonstrate one more category of instructions The method of implementing other instructions should come naturally from these
MIPS Format Review R-Format add rd, rs, rt sub rd, rs, rt Bits 6 5 5 5 5 6 OP=0 rs rd sa funct rt function code second sourceregister first sourceregister resultregister shift amount
MIPS Format Review (cont) I-Format lw rt, rs, imm sw rt, rs, imm beq rs, rt, imm ori rt, rs, imm Reminders Branch uses PC Relative addressing (PC + 4 + 4 × imm) Bits 6 5 5 16 OP rs imm rt immediate second sourceregister first sourceregister
MIPS Format Review (cont) J-Format j target Reminders Uses pseudodirect addressing (target × 4) to allow addressing 228 bits directly Uses top 4 bits from PC Bits 6 26 OP target jump target address
Execution Cycle Instruction Fetch Obtain instruction from program storage Instruction Decode Determine required actions and instruction size Locate and obtain operand data Operand Fetch Compute result value or status Execute Result Store Deposit results in storage for later use Next Instruction Determine successor instruction
What Happens? It’s hard to see how we should go about organizing the processor To start thinking about it, look at what happens on each instruction The instruction specified by the PC is fetched from memory One or two registers are read (lw vs. add for instance) The ALU must be used to add, subtract, etc. The results are stored (to memory or a register)
Instruction Execution • PC instruction memory, fetch instruction • Register numbers register file, read registers • Depending on instruction class • Use ALU to calculate • Arithmetic result • Memory address for load/store • Branch target address • Access data memory for load/store • PC target address or PC + 4
Processor Overview • Data flows through memory and functional units
Multiplexers • Can’t just join wires together • Use multiplexers
Logic Design Basics • Information encoded in binary • Low voltage = 0, High voltage = 1 • One wire per bit • Multi-bit data encoded on multi-wire buses • Combinational element • Operate on data • Output is a function of input • Example: ALU • State (sequential) elements • Store information or state • Example: Register File §4.2 Logic Design Conventions
1 bit ALU Using a MUX we can add the AND, OR, and adder operations into a single ALU Cin ALUOp A Result Mux 1-bit Full Adder B Cout
4 bit ALU ALUop ALUop CIn0 3 A0 1-bit ALU A Result0 4 B0 COut0 CIn1 A1 1-bit ALU Result1 B1 COut1 CIn2 A2 1-bit ALU Result2 B2 COut2 CIn3 A3 1-bit ALU Result3 COut3 B3 B 4 COut3
Combinational Elements Select Carry_In A 32 A 32 Adder Sum 32 32 MUX Y 32 B Carry B 32 Adder MUX OP A 32 ALU Result 32 B Zero 32 ALU
D Latches Modified SR Latch Latches value when C is asserted C Q Q D
D Flip Flop Uses Master/Slave D Latches Q D Q Q D D D Latch D Latch Q C C Q Q CLK
Storage Element: Register Register Similar to D Flip Flop N bit input and output Write Enable input Write Enable 0: Data Out will not change 1: Data Out will become Data In Data changes only on falling edge! Write Enable Data In Data Out N N Clk
Storage Element: Reg File Register File consists of 32 registers Two 32 bit output busses busA and busB One 32 bit input bus busW Register 0 hard wired to value 0 Register selected by RA selects register to put on busA RB selects register to put on busB RW selects register to be written via busW when Write Enable is 1 Clock input (CLK) CLK input is a factor only for write operation During read, behaves as combinational logic block RA or RB stable busA or busB valid after “access time” Minor simplification of reality RW RA RB Write Enable 5 5 5 busA busW 32 32 32 32-bit Registers busB Clk 32
Storage Element: Memory Memory One input bus: Data In One output bus: Data Out Address selection Address selects the word to put on Data Out To write to address, set Write Enable to 1 Clock input (CLK) CLK input is a factor only for write operation During read, behaves as combinational logic block Valid Address Data Out valid after “access time” Minor simplification of reality Address Write Enable Data In Data Out 32 32 Clk
Some Logic Design… All storage elements have same clock Edge-triggered clocking “Instantaneous” state change (simplification!) Timing always work if the clock is slow enough Cycle Time = Clk-to-Q + Longest Delay + Setup + Clock Skew . . . . . . . . . . . . Clk Setup Hold Setup Hold Don’t Care