1 / 146

Introduction to Processor Implementation: Datapath and Control

Explore the implementation of a processor, starting with a simple design and gradually adding complexity. Learn about functional units, control signals, clocking methodology, and bus width.

wrowley
Download Presentation

Introduction to Processor Implementation: Datapath and Control

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Organization and ArchitectureChapter 5: The Processor: Datapath and Control Yu-Lun Kuo Computer Sciences and Information Engineering University of Tunghai, Taiwan sscc6991@gmail.com

  2. 5.1 Introduction • The performance of a machine • Instruction count • Clock cycle time • Clock cycles per instruction (CPI) • The compiler and the instruction set architecture • Determine the instruction count required for a given instruction

  3. 5.1 Introduction • Both the clock cycle time and the number of CPI • Determined by the implementation of the processor • We construct the datapath and control unit for two different implementations of the MIPS instruction set • Single cycle implementation • Multi cycle implementation

  4. 5.1 Introduction • We are going to see how the processor is implemented • starting with a very simple processor, and adding some more complexity

  5. Basic MPIS Implementation • Include a subset of the MIPS instruction • Memory-reference instructions: lw and sw • The ALU instructions: add, sub, and, or, slt • Control flow instructions: beq and j • Generic Implementation • Use the program counter (PC) to supply instruction address • Fetch the instruction from memory • Read one/two registers • Use the instruction to decide exactly what to do

  6. Basic MPIS Implementation • All instructions use the ALU after reading the registers (except jump) • Memory-reference instructions use ALU for address calculation • Arithmetic-logical instructions for the operation execution • Branches for comparison

  7. Our Processor, sort of… • What’s missing • How to combine input that are “joined” together • How to tell which component what to do?

  8. Multiplexers and Controllers • In the previous figure we have two or more “wires” going into the input of a component • This is because depending on the instruction being executed different input should be provided • So, based on the instruction, we need to decide which input should be selected • This is done with a multiplexer (多工器) M U X input 1 . . . selected output input n control: ceil(log2(n)) bits

  9. What about the Control? • So great, now we can control multiplexers • Need a controller sends the appropriate control bits to all the multiplexers and the components • Besides, there are other things to control • Example: the ALU has a bunch of control bits, that tells it what to do: 00: ADD 01: SUB 10: MUL 11: SHIFT 2-bit control

  10. . . . instruction register Control Unit (Simplified) . . . offset 0 or 1 M U X PC input 1 input 0 Add 4

  11. A More Complete Picture

  12. 5.2 Logic Design Conventions • The functional units (功能單元) in the MIPS implementationconsist of two different types of logic elements • Elements that operate on data values (combinational) • Outputs depend only on the current inputs • Always produces the same output • It has no internal storage • Elements that contain state (sequential) • Has at least two inputs and one output • Data value to be written into the element • Clock: determine when the data value is written • The value that was written in a previous clock cycle

  13. Clocking Methodology falling edge rising edge clock cycle • Clocking methodology • When signals can be read and when they can be written • If a signal is written at the same time it is read. Computer designs cannot tolerate such unpredictability • The clock cycle/period is divided into two portions • high clock • low clock

  14. Edge-triggered Clocking Combinational logic State element 2 State element 1 Clock cycle • Edge-triggered clocking(邊緣觸發) • meaning that state changes (in state elements) occur only at a clock edge • Using either the rising edge or the falling edge • Typical execution: • Read contents of some state elements • Send values through some combinational logic • Write results to one or more state elements

  15. The Clock • In the above, we want to use the value in state element #1 to modify the value in state element #2: It takes one cycle • We need all signals to be stabilized state element #1 state element #2 combinatorial circuit stable stable by edge updated on edge clock cycle

  16. Read/Write in a Clock Cycle • A great implication of edge-triggered clocking • A state element can be read and written in the same clock cycle • We will say things like: “reads happen in the first half of the clock cycle, writes happen in the second half” state element #1 state element #2 combinatorial circuit stable stable by edge updated on edge

  17. Write Control Signal (p.291) • Both the clock signal and the write control signal are inputs • The state element is changed only when • The write control signal is asserted • Clock edge occurs • Assuming a rising edge update: • While the control bit stays at 0, nothing happen • If we set the control bit to 1, the state element will be updated at the next rising edge

  18. Busses and bus width • Many of the state elements and combinational elements take multi-bit inputs (often 32-bit inputs) • The term “bus” refers to a wire that carries more than one bit • multiple 1-bit wires, really • We simply indicate the width of the busses as follows: 16 control signal 8

  19. Building a Datapath • A datapath is an element in the processor that is supposed to operate on or hold data • instruction memory, data memory, register file, ALU, adders • Let’s re-examine the datapath elements we only barely introduced earlier

  20. Building a Datapath • Start by looking at which datapath elememts each instruction needs • Also show their control signals • Program Counter (PC) (程式計數器) • (Register) Memory unit to store the instructions of a program and supply instructions given an address • 32 bits register that will written at the end of every clock cycle (not need a write control signal) • Adder (加法器) • Increment the PC to the address of the next instruction • Combinational. Built from the ALU

  21. The Three Elements • Two state element are needed to store and access instructions • The instruction memory only provide read • Output at any time reflects the contents of the location specified by the address input • An adder is needed to compute the next instruction address (+4 Bytes) • ALU wired to always perform an add

  22. add Fetching Instructions read address, instruction retrieved from instruction memory 32 PC 4 32 • PC +4 latched into PC read address Instruction 32 Instruction Memory The PC gets updated in 1 clock cycle because we use edge-triggered clocking

  23. Register File Clock 5 bits 32 bits 5 bits 5 bits 32 bits 32 bits Control signal • The processor’s 32 general-purpose registers • Stored in a structure called register file • Register file • Collection of registers in which any register can beread or written by specifying the number of the register in the file

  24. Datapath: Instruction Store/Fetch & PC Increment Three elements used to store and fetch instructions and increment the PC Datapath

  25. Animating the Datapath Instruction <- MEM[PC] PC <- PC + 4

  26. What about R-type instructions • These instructions take 3 registers as arguments: • 1 output register • 2 input registers • Example: add $t1, $t2, $t3 • Whichreads$t2and$t3andwrites$t1 • We need an input that contains data to be written into the output register • Typically comes from the ALU • We need a Writesignal to trigger the register write on the next clock edge • A write anytime during the clock cycle could lead to race conditions if that register is also read

  27. Datapath: R-Type Instruction Two elements used to implement R-type instructions Datapath

  28. ALU Register File and ALU Extracted from the 32-bit instruction code 5 Read register 1 5 Read data 1 Register number Read register 2 32 zero 32 5 Write register 32 32 Read data 2 Operation 4 32 Write data 32 Register File RegWrite

  29. ALU Add t1, t1, t2 (sketch) i n s t r u c t i o n 5 Read register 1 t1 5 Read data 1 Read register 2 t2 32 5 Write register t1 zero Read data 2 Operation 4 32 Write data 32 Register File RegWrite (must be set only at the next edge)

  30. Animating the Datapath (R-type) add rd, rs, rt R[rd] <- R[rs] + R[rt];

  31. What about the Load/Store • Ex. lw t1, offset(t2) • The memory @ is computed by adding the 16-bit signed offset to the input register • The offset of 16-bit, but memory addresses are 32-bit • Therefore, the offset must be sign-extended into a 32-bit value before being added to the input register • The memory has both read and write control • MemWrite control signal • MemRead control signal

  32. Datapath: Load/Store Instruction Two additional elements used To implement load/stores Datapath

  33. Implementing Load/Store MemWrite Read data Address sign extend 32 32 32 16 Write data 32 Data Memory Sign-extension Unit MemRead Data Memory Unit

  34. add sign extend 32 16 Implementing lw s1,offset(s2) 5 Read register 1 5 Read data 1 Read register 2 32 5 Write register 32 Read data 2 Write data 32 Register File i n s t r u c t i o n MemWrite (not set) RegWrite (set on next edge) s1 Read data Address 32 32 s2 Write data offset 32 32 Data Memory MemRead (set)

  35. Animating the Datapath (Load) lw rt, offset(rs) R[rt] <- MEM[R[rs]+s_extend(offset)];

  36. Animating the Datapath (Store) sw rt, offset(rs) MEM[R[rs]+sign_extend(offset)] <- R[rt]

  37. What about the Branch (beq) • 2 registers that are compared • To do a branch we must • Compute the branch’s target address based on its offset • Decide whether the branch is taken or not taken • Taken: branch target address becomes the new PC PC = (PC+4)+4*(target field) • Not taken: if the operands are not equal, PC=PC+4 as usual

  38. Branch Datapath No shift hardware required: simply connect wires from input to output, each shifted left 2 bits Datapath

  39. Animating the Datapath (branch) beq rs, rt, offset • if (R[rs] == R[rt]) then PC <- PC+4 + s_extend(offset<<2)

  40. Putting it altogether • The simplest design is one in which • all instructions are executed in a single clock cycle • In this case, every element of the datapath is used only once per clock cycle • No duplication of hardware needed • Or only of a few adders perhaps here and there • And we need separate Data and Instruction memories • Let’s at first put together the pieces for the R-type (ALU) instructions and the memory instructions as they are quite similar.

  41. Altogether (not quite) Combining the datapaths for R-type instructions and load/stores using two multiplexors We “simply” add multiplexer (多工器) for choosing between the datapath for the ALU instructions and the memory instructions

  42. Animating the Datapath: R-type Instruction add rd,rs,rt

  43. Animating the Datapath: Load Instruction lw rt,offset(rs)

More Related