Outline Introduction Version 0 EMY CPU : Unpipelined EMY CPU It executes only integer instructions

Outline • Introduction • Version 0 EMY CPU : Unpipelined EMY CPU • It executes only integer instructions • How a memory hierarchy can be attached to the unpipelined CPU is also studied • Handout to use • EMY CPU CS 2214

Introduction • On the microarchitecture layer, a computer is a collection of at least three interconnected digital systems • A central processing unit (CPU) • A (main) memory • An I/O controller to control an I/O device, such as the disk • There can be several I/O controllers to control different I/O devices Introduction CPU Disk I/O Controller Interconnection System Memory CS 2214

Digital Systems • A digital system performs microoperations • It consists of a datapath (data unit) and a control unit • The datapath actually performs the microoperations • The control unit determines which microoperation happens when Introduction ALUs Registers Buses Datapath Sequencer Control Unit Status signals Control signals CS 2214

Digital Systems • The datapath (data unit) has registers, ALUs and buses to perform the microoperations • Registers keep information temporarily • ALUs perform arithmetic/logic operations • Buses interconnect the registers and ALUs • Other components are used include • Multiplexers (MUXes), decoders, encoders, comparators, counters, etc. Introduction CS 2214

Digital Systems • The control unit has a sequencer circuit that determines the sequence of microoperations • The sequencer needs status signals from the data unit to know what is happening there • Then, it determines which microoperations to be performed and indicates to the datapath by means of control signals Introduction CS 2214

Designing Digital systems • Datapath design is simpler than the control unit since it has highly regular (duplicated) circuits • A 32-bit ADDer is composed of 2 16-bit identical ADDers • A 32-bit comparator consists of 4 8-bit identical comparators, etc. • Control unit design is more difficult due to • Large amounts of random logic • A lot of effort is needed to make sure there are no timing problems • Microoperations must start at the right time and end at the right time ! Introduction CS 2214

Designing digital systems • We will use the finite-state machine (FSM) technique to design the EMY CPU where the FSM state diagram will have states with microoperations • The state diagram shows which state follows which state precisely • Each state indicates which microoperations to perform • The state diagram shows which states are needed when for which machine language instruction Introduction CS 2214

Designing digital systems • We will design the EMY CPU by using the finite-state machine (FSM) technique • More specifically, we will obtain the following for the complete EMY CPU design • A high-level-state diagram to show which microoperation happens when • The datapath from the high-level state diagram • The low-level state diagram from the high-level sate diagram and the datapath • The control unit from the low-level state diagram • It can be implemented by hardwiring and/or microprogramming Introduction CS 2214

Designing the microarchitecture level of a computer • There are two tasks in this design • Develop the CPU and memory digital systems so that instructions can be run • Develop the memory and I/O controller digital systems so that I/O can happen • We will concentrate on the CPU and memory digital systems Introduction CS 2214

Designing the CPU and memory digital systems • First we focus on the CPU digital system while we make a few design decisions on the memory quickly • We will design the CPU as a slow CPU running only integer instructions : No pipelining • This is Version0 • We will assume the memory is fast which is not realistic today • Then, we will see how a memory hierarchy with cache memories, etc. can be incorporated • Then, we will improve the CPU speed by using pipelining, but still running integer instructions • This is Version 1 • We will assume the memory is fast which is not realistic today • Then, we will see how a memory hierarchy with cache memories, etc. can be incorporated • This CPU coverage will be in another PowerPoint presentation • For both versions the memory will be a black box with a few details Introduction CS 2214

Designing the CPU as a Digital System • The EMY CPU digital system • We will concentrate on designing the EMY CPU for nine integer instructions in the beginning • High-level state diagram of the EMY CPU • Datapath of the CPU • Low-level state diagram of the CPU • Control unit of the CPU Introduction CS 2214

Designing the CPU digital system • To design the EMY CPU, we will start with the EMY architecture • What is the connection between the architecture and the CPU? • A computer processes digital information, by running machine language instructions • A program is a list of instructions each of which specifies operations on data (arguments) • An instruction specifies architectural operations • Each architectural operation is implemented by microoperations Introduction CS 2214

Designing the CPU Digital System • In order to perform an architectural operation, the CPU performs a series of microoperations in a number of clock periods • That is an architectural operation is broken down into smaller operations called microoperations • That is, to run a machine language instruction, the CPU performs microoperations • The CPU performs some microoperations alone and some in cooperation with the memory and the I/O controllers Introduction CS 2214

Designing the CPU Digital System • Architectural operations • An architectural operation is what we describe as the semantics of the instruction, such as • The architectural operation specified by the ADD instruction • Rd  Rs + Rt • The architectural operation specified by the SUB instruction • Rd  Rs - Rt • The architectural operation specified by the SLT instruction • If Rs < Rt then Rd  1 else Rd  0 • The architectural operation specified by the J instruction • PC[27-0]  (Address * 4) • It is the CPU that contributes the most to the execution of an instruction since it performs most of the microoperations needed for an architectural operation Introduction CS 2214

Designing the CPU Digital System • Typical CPU digital system microoperations • Add, subtract, multiply • In the past, a 32-bit addition was completed in 1clock period. • Today, a 32-bit addition is completed in several clock periods • AND, OR, XOR • Shift right, Shift left • Read data from memory, write data to memory • In the past, a memory access was completed in 1clock period. • Today, it is completed in several clock periods • Read instructions from memory (fetch) • Increment the program counter • Transfer a register to another register • … Introduction CS 2214

Designing the CPU as a Digital System • Other machines, especially CISC machines, require other microoperations such as • Reading indirect address(es) from the memory • Effective address calculation for • Indexing • Autoincrement • Autodecrement • Alignment for • Instructions • Data • Addresses Introduction CS 2214

Designing the CPU Digital System • Architecture’s effect on microoperations • The decisions made on architecture determine the microoperations needed for the execution of the instructions • General microoperations found on most CPUs • The ones mentioned on previous slides • Specific microoperations for certain CPUs • Specific microoperations for Memory Management Units (MMUs), caches, I/O controllers • The architecture also determines the characteristics of each microoperation • If the 26-bit PC-direct addressing mode is used, the rightmost 26 bits of IR are catenated the leftmost 4 bits of PC and the resulting 30 bits are shifted to the left by 2 • Thus, each machine language instruction requires a number of certain microoperations taking a certain time : the CPIi Introduction CS 2214

Designing the CPU Digital System • Microoperations • The CPU can perform one or more microoperations per clock period, depending on the complexity of the microoperation and the availability of the hardware resources • Most often a microoperation can be completed in one clock period unless it is a complex microoperation • If a complex microoperations is desired to be run in a clock period, the clock period needs to be longer • The more and complex the microoperations are, the longer it takes to run the machine language instruction • CISC instructions take longer time to execute (larger CPIi) Introduction CS 2214

Designing the CPU Digital System • Calculating CPIi • The time it takes to run an instruction, CPIi, is then determined by • The number of microoperations needed for it • The complexity of the microoperations • The number of clock periods for an instruction, CPIi, becomes a matter of figuring out the microoperations and distributing them to individual clock periods • One can come up with 5-10 simple microoperations to be performed one after another, resulting in a CPIi of 5-10 • But, since microoperations are simple, the clock period is short • Alternatively, one can come up with 2-4 complex microoperations, resulting in a CPIi of 2-4 • But, the clock period is longer Introduction CS 2214

Designing the CPU Digital System • Calculating CPIi • What can we do ? • Few long clock periods vs. many but shorter clock periods ? • Since increasing the clock frequency is important for marketing purposes the second option would weigh in substantially • It turns out that if pipelining is implemented, having many shorter clock periods would be beneficial as we will see • CPIi figures will be large but CPIave will be close to 1 (one) ! • Today’s microprocessors have instruction CPIi values in the range of 10-30, but CPIave figures for their targeted applications are even less than 1 (one) ! • Because they employ advanced pipelining techniques, such as superscalar execution, hyperthreading, etc. Introduction CS 2214

Designing the CPU Digital System • Determining microoperations for a machine language instruction • Some microoperations are performed for all the instructions • Usually at the same point in time during the execution of every instruction • Fetching the instruction is always the first microoperation to perform for all CPUs • Updating PC (PC  PC + 4) so that it points at the next instruction is also universal • The other microoperations depend on the instruction, the addressing mode, where the arguments are, the length of the arguments, etc. Introduction CS 2214

Designing the CPU Digital System • Determining microoperations for a machine language instruction • We would list all the microoperations for each instruction, by making sure that we are consistent in terms of • Bus usage • We often decide an approximate number of buses we need for our datapath • Today’s CPUs have at least three internal buses to complete an integer arithmetic microoperation in one clock period • Two buses carry the numbers from two registers and the third bus carries the result to a register • ALU usage • An ALU is expensive and so we try to limit the number of them Introduction CS 2214

Designing the CPU Digital System • Determining microoperations for a machine language instruction • We would list all the microoperations for each instruction, by making sure that we are consistent in terms of • Register usage • Additional registers not visible to the architecture level are used to keep temporary values : microarchitecture registers • Typically, the more registers are used, the more clock periods we spend for an instruction since temporary values will be passed from one register in one clock period to another register to be used the following clock period • But, sometimes we have to use microarchitecture registers, such as the instruction register that keep the current instruction • Control unit usage Introduction CS 2214

Designing the CPU Digital System • Determine how each EMY architectural operation is implemented by microoperations • Most microoperations must be simple enough to be completed in less than one clock period • A few microoperations may not be completed in a clock period • For example a memory read may take several clock periods since the memory is slower • These long microoperations should be accommodated in the high-level state diagram, the datapath, low-level state diagram and the control unit • We will assume in the beginning that every microoperation is completed in one clock period Introduction CS 2214

Designing the CPU Digital System • The EMY microoperations implied by the EMY machine language instructions include • Instruction fetch, performedalways • Update PC for next instruction, performed always • Effective address calculation for Displacement and relative addressing modes • Sign extension or catenation of 0s for data/addresses • Reading data from the memory • Writing data to the memory • Perform an arithmetic/logic • Register transfer • Testing a condition Introduction CS 2214

Unpipelined EMY CPU : Version 0 • By using the EMY CPU Handout • The most interesting component of a computer is the CPU • We know that the CPU has registers, buses, ALUs and a sequencer, among other • Note that whether hardwiring or microprogramming is used, the datapath stays the same, at least theoretically • The datapath performs microoperations on data • It uses registers, buses and the ALU for that purpose • The microoperations are in turn controlled by the control unit. Unpipelined EMY CPU Design : Version 0 CS 2214

Overview • We are now ready for the organizational design of the EMY • We know the architecture of EMY • We will design • The EMY CPU that will have • A control unit with a sequencer • A datapath containing registers, buses and the ALU • The datapath performs the microoperations and the control unit determines the timing and sequence of these microoperations Unpipelined EMY CPU Design : Version 0 CS 2214

Overview • The way the EMY computer is covered indicates that the authors organized the computer similar to the commercial EMY systems where • There is an integer EMY CPU • A system control coprocessor (CP0) responsible for memory management and cache control. • A FP coprocessor (CP1) • The integer EMY CPU registers are either architectural or microarchitectural (temporary registers) • There are two other coprocessors, CP2 and CP3 that are reserved for future use Unpipelined EMY CPU Design : Version 0 CS 2214

Overview • Designing the EMY CPU for all of instructions is prohibitive • First, we will design a EMY CPU to execute only integer instructions that include • LW, SW • ADD, SUB, SLT, AND, OR • BEQ, J • These integer instructions use the three format : R, I and J formats Unpipelined EMY CPU Design : Version 0 CS 2214

Overview • The EMY CPU will have all the architectural registers needed by these nine integer instructions • 32 32-bit GPRs • 32-bit PC Unpipelined EMY CPU Design : Version 0 CS 2214

New Microarchitectural registers • These (temporary) registers are not a part of the state (hence architecture) • 32-bit instruction register, IR, to keep the current instruction • IR contains the instruction until it is completely executed • 32-bit A and B registers • They keep the content of Rs and Rt registers of the current instruction • 32-bit register ALUout • It contains a memory address or A/L operation result • 32-bit Memory Data Register, MDR, register • It keeps the data read from the memory for Load instructions Unpipelined EMY CPU Design : Version 0 CS 2214

New Microarchitectural registers • 32-bit A and B registers I format 5 16 6 5 Opcode Displacement/Offset/Immediate Rt Rs Opcode Rs Function Rd Unpipelined EMY CPU Design : Version 0 Rt Shamt 6 5 5 R format To register A To register B CS 2214

New Microarchitectural registers • Even if an instruction does not have Rs and Rt fields, such as a J-format instruction, Rs and Rt field bits are used to move Rs and Rt content to A and B, respectively • The values of A and B registers will not be used ! • The reason for moving to A and B is to make the common case fast where we think most instructions are R-format or I-format and require this move ! J format 5 5 Rt Rs Opcode Offset26 6 26 Unpipelined EMY CPU Design : Version 0 To register A To register B Jump CS 2214

New Microarchitectural registers • Note that the Displacement used for loads and stores is signed • The offset of BEQ is also signed • We have to sign extend the 16-bit Displacement, Offset and Immediate (DOImm) value for some of the integer instructions • These include LW, SW, BEQ • We will use DOImm+ to indicate a sign-extended value from now on Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • The design of a CPU is very complex • We have to consider the space (hardware) and time (speed) • The design, analysis, description, testing, modification, optimization, servicing and maintenance can be more efficient if there are efficient tools around • These include HDLs and CAD tools • The textbook uses a typical register transfer language (RTL) notation in Appendix A to describe the execution of instructions • We will use the same RTL notation which is also used in the handout • To quickly see the execution steps of the integer machine language instructions, a high-level state diagram a CPU datapat, a low-level state diagram are developed in the handout • Additionally, timing diagrams and tables need to be studied to understand the CPU design Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • An instruction goes through several phases when executed • We give a name to each phase of an instruction execution • A phase is also called major cycle • Each major cycle will take one or more minor cycles (clock periods) • Each minor cycle is a state • Each minor cycle takes typically one clock period • Each major cycle often has at least one microoperation • Often the name of a major cycle is derived from the major microoperation of the cycle Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • The number of major cycles and their complexity are small for RISC systems and larger for CISC systems • Often for RISC systems, the CPIi for most frequently used instructions is between 4 and 6 • However, this number has to be larger to have deep pipelining and high clock frequencies • In simple systems like RISC systems sharing of hardware among different major cycles is not necessary • A hardware resource is often needed in one major cycle only • The hardware for each major cycle can then be easily identified and often named stage • So, the execution of an instruction is the movement of the instruction through some or all of the stages of the CPU ! Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • The EMY integer instructions go through at most five major cycles during the execution • However, even for this RISC machine, it is difficult to name 5 cycle names because not all instructions do similar things in a major cycle • Some microoperations will be performed in advance in anticipation of a frequent operation • The early operations will not alter the state and will not cause longer clock periods, but will slightly increase the hardware Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • The EMY CPU major cycles for integer instructions • Instruction fetch cycle • Abbreviated as IF, standing for instruction fetch • Same for all EMY instructions. • Instruction decode/Register fetch cycle • Abbreviated as ID, standing for instruction decode • Same for all EMY instructions. • Execution/effective address cycle • Abbreviated as EX, standing for execution • Memory access cycle • Abbreviated as MEM, standing for memory • Write-back cycle • Abbreviated as WB, standing for write-back Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • Emphasizing again that designing a CPU is determining which microoperation happens when for each architectural operation (the semantics of the instruction) • For the EMY, like many other CPUs, the IF and ID stages are identical for all instructions • The same microoperations are performed for all instructions • These microoperations implement portions of the architectural operation • For the EMY, the remaining portions of the architectural operation are performed in the EX, MEM and WB cycles Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • Architectural operations of I-format instructions among the integer instructions • Load/Store instructions • LW Rt, Disp(Rs)  Rt  M[Rs + Disp+] • SW Rt, Disp(Rs)  M[Rs + Disp+] Rt I format 5 16 6 5 Opcode Displacement/Offset/Immediate Rt Rs Unpipelined EMY CPU Design : Version 0 Superscript + indicates sign extension Architectural operations of Load/Store instructions ≡ Semantics CS 2214

The EMY CPU state diagram • Architectural operations of I-format instructions among the integer instructions • Branch instruction • BEQ Rs, Rt, Offset  If Rs = Rt, then PC  PC + (Offset+ x 4) I format 5 16 6 5 Opcode Displacement/Offset/Immediate Rt Rs Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • Architectural operations of R-format instructions among the integer instructions • Arithmetic/Logic instructions • ADD Rd, Rs, Rt  Rd  Rs + Rt • SUB Rd, Rs, Rt  Rd  Rs - Rt • AND Rd, Rs, Rt  Rd  Rs & Rt • OR Rd, Rs, Rt  Rd  Rs | Rt • SLT Rt, Rs, Rt  If Rs < Rt then Rt  1 else Rt  0 R format 5 6 5 6 5 5 Rs Function Rd Rt Opcode Shamt Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • Architectural operations of J-format instructions among the integer instructions • Jump instruction • PC[27-0]  (Address x 4) J format 5 5 Rt Rs Opcode Offset26 Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY CPU state diagram • The major cycles of the DLX CPU are shown by the high-level state diagram given in the EMY CPU handout • Registers A and B are used to prepare operands for an ALU operation • Each state takes 1 clock period • Later, we will change it to one or more clock periods • Memory accesses and complex arithmetic operations can take more than one clock period to perform • The state that has a memory access or a complex arithmetic operation will take more than one clock period • All microoperations mentioned in a state are performed in parallel, so their order does not matter • If a state takes more than one clock period, one has to be careful about the parallel operations • We now obtain the state diagram and the datapath hardware of the EMY CPU Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY major cycles and states • The instruction fetch cycle • It is performed for all the instructions • There are two microoperations performed • In general, all CPUs, regardless of their architecture do these two microoperations • Read the machine language instruction pointed by the program counter (PC) to the instruction register (IR) • Update the program counter so that it points at the instruction that follows the instruction being read from the memory Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY major cycles and states • The instruction fetch cycle • Read the machine language instruction pointed by the program counter (PC) to the instruction register (IR) • IR ← M[PC] • Note the RTL notation that we use an equal sign (=) if the destination is a wire or a bus and an arrow sign () if the destination is a register, such as IR Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY major cycles and states • The instruction fetch cycle • Read the machine language instruction pointed by the program counter (PC) to the instruction register (IR) • IR ← M[PC] • Then, the read of the instruction in terms buses is as follows : • Note again the three microoperations implement the instruction read and they happen at the same and their order does not matter • Note the RTL notation that we use an equal sign (=) if the destination is a wire or a bus, such as MABUS and an arrow sign () if the destination is a register, such as IR MABUS = PC ; MemRead = 1 ; IR  MRBUS Unpipelined EMY CPU Design : Version 0 CS 2214

The EMY major cycles and states • The instruction fetch cycle • Update the program counter so that it points at the next instruction • PC ← PC + 4 • Since an instruction is four bytes long, we need to add 4 to PC • We will use the general ALU to do the addition, at the expense of increasing the complexity of the ALU input logic Unpipelined EMY CPU Design : Version 0 CS 2214

Outline Introduction Version 0 EMY CPU : Unpipelined EMY CPU It executes only integer instructions