October 8th, 2007 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f07/

CS-447– Computer Architecture M,W 10-11:20amLecture 12Multiple Cycle Datapath October 8th, 2007 Majd F. Sakrmsakr@qatar.cmu.edu www.qatar.cmu.edu/~msakr/15447-f07/

Implementation vs. Performance Performance of a processor is determined by • Instruction count of a program • CPI • Clock cycle time (clock rate) The compiler & the ISA determine the instruction count. Theimplementationof the processor determines the CPI and the clock cycle time.

Possible Execution Steps of Any Instructions • Instruction Fetch • Instruction Decode and Register Fetch • Execution of the Memory Reference Instruction • Execution of Arithmetic-Logical operations • Branch Instruction • Jump Instruction

Instruction Processing • Five steps: • Instruction fetch (IF) • Instruction decode and operand fetch (ID) • ALU/execute (EX) • Memory (not required) (MEM) • Write-back (WB) WB IF EX ID MEM

Single Cycle Implementation

Multiple ALUs and Memory Units

Single Cycle Datapath

What’s Wrong with Single Cycle? • All instructions run at the speed of the slowest instruction. • Adding a long instruction can hurt performance • What if you wanted to include multiply? • You cannot reuse any parts of the processor • We have 3 different adders to calculate PC+1, PC+1+offset and the ALU • No profit in making the common case fast • Since every instruction runs at the slowest instruction speed • This is particularly important for loads as we will see later

What’s Wrong with Single Cycle? 1 ns – Register read/write time 2 ns – ALU/adder 2 ns – memory access 0 ns – MUX, PC access, sign extend, ROM add: 2ns + 1ns + 2ns + 1ns = 6 ns beq: 2ns + 1ns + 2ns = 5 ns sw: 2ns + 1ns + 2ns + 2ns = 7 ns lw: 2ns + 1ns + 2ns + 2ns + 1ns = 8 ns Get read ALU mem write Instr reg operation reg

Computing Execution Time Assume: 100 instructions executed 25% of instructions are loads, 10% of instructions are stores, 45% of instructions are adds, and 20% of instructions are branches. Single-cycle execution: 100 * 8ns = 800ns Optimal execution: 25*8ns + 10*7ns + 45*6ns + 20*5ns = 640ns

Single Cycle Problems • A sequence of instructions: • LW (IF, ID, EX, MEM, WB) • SW (IF, ID, EX, MEM) • etc Single Cycle Implementation: Cycle 1 Cycle 2 Clk Load Store Waste • what if we had a more complicated instruction like floating point? • wasteful of area

Multiple Cycle Solution • use a “smaller” cycle time • have different instructions take different numbers of cycles • a “multicycle” datapath:

Multicycle Approach • We will be reusing functional units • ALU used to compute address and to increment PC • Memory used for instruction and data • We’ll use a finite state machine for control

IF ID Ex Mem WB The Five Stages of an Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 • IF: Instruction Fetch and Update PC • ID: Instruction Decode and Registers Fetch • Ex: Execute R-type; calculate memory address • Mem: Read/write the data from/to the Data Memory • WB: Write the result data into the register file

Multicycle Implementation • Break up the instructions into steps, each step takes a cycle • balance the amount of work to be done • restrict each cycle to use only one major functional unit • At the end of a cycle • store values for use in later cycles (easiest thing to do) • introduce additional “internal” registers

IF ID Ex Mem WB The Five Stages of Load Instruction Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 • IF: Instruction Fetch and Update PC • ID: Instruction Decode and Registers Fetch • Ex: Execute R-type; calculate memory address • Mem: Read/write the data from/to the Data Memory • WB: Write the result data into the register file lw

IFetch IFetch Dec Dec Exec Exec Mem Mem WB WB Multiple Cycle Implementation • Break the instruction execution into Clock Cycles • Different instructions require a different number of clock cycles • Clock cycle is limited by the slowest stage • Instruction latency is not reduced (time from the start of an instruction to its completion) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw sw

IFetch Dec Exec Mem WB IFetch Dec Exec Mem IFetch Single Cycle vs. Multiple Cycle Single Cycle Implementation: Cycle 1 Cycle 2 Clk Load Store Waste Multiple Cycle Implementation: Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk lw sw R-type

Multicycle Implementation • Break up the instructions into steps, each step takes a cycle • balance the amount of work to be done • restrict each cycle to use only one major functional unit • At the end of a cycle • store values for use in later cycles (easiest thing to do) • introduce additional “internal” registers

Instructions from ISA perspective • Consider each instruction from perspective of ISA. • Example: • The add instruction changes a register. • Register specified by bits 15:11 of instruction. • Instruction specified by the PC. • New value is the sum (“op”) of two registers. • Registers specified by bits 25:21 and 20:16 of the instruction Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] • In order to accomplish this we must break up the instruction. (kind of like introducing variables when programming)

Breaking down an instruction • ISA definition of arithmetic:Reg[Memory[PC][15:11]] <= Reg[Memory[PC][25:21]] op Reg[Memory[PC][20:16]] • Could break down to: • IR <= Memory[PC] • A <= Reg[IR[25:21]] • B <= Reg[IR[20:16]] • ALUOut <= A op B • Reg[IR[20:16]] <= ALUOut • We forgot an important part of the definition of arithmetic! • PC <= PC + 4

Idea behind multicycle approach • We define each instruction from the ISA perspective (do this!) • Break it down into steps following our rule that data flows through at most one major functional unit (e.g., balance work across steps) • Introduce new registers as needed (e.g, A, B, ALUOut, MDR, etc.) • Finally try and pack as much work into each step (avoid unnecessary cycles)while also trying to share steps where possible (minimizes control, helps to simplify solution)

Five Execution Steps • Instruction Fetch • Instruction Decode and Register Fetch • Execution, Memory Address Computation, or Branch Completion • Memory Access or R-type instruction completion • Write-back step INSTRUCTIONS TAKE FROM 3 - 5 CYCLES!

Step 1: Instruction Fetch • Use PC to get instruction and put it in the Instruction Register. • Increment the PC by 4 and put the result back in the PC. • Can be described succinctly using RTL "Register-Transfer Language" IR <= Memory[PC]; PC <= PC + 4;Can we figure out the values of the control signals?What is the advantage of updating the PC now?

Step 2: Instruction Decode and Register Fetch • Read registers rs and rt in case we need them • Compute the branch address in case the instruction is a branch • RTL: A <= Reg[IR[25:21]]; B <= Reg[IR[20:16]]; ALUOut <= PC + (sign-extend(IR[15:0]) << 2); • We aren't setting any control lines based on the instruction type (we are busy "decoding" it in our control logic)

Step 3 (instruction dependent) • ALU is performing one of three functions, based on instruction type • Memory Reference: ALUOut <= A + sign-extend(IR[15:0]); • R-type: ALUOut <= A op B; • Branch: if (A==B) PC <= ALUOut;

Step 4 (R-type or memory-access) • Loads and stores access memory MDR <= Memory[ALUOut]; or Memory[ALUOut] <= B; • R-type instructions finish Reg[IR[15:11]] <= ALUOut;

Write-back step • Reg[IR[20:16]] <= MDR; Which instruction needs this?

Summary:

Multiple Cycle Implementation

Review: finite state machines • Finite state machines: • a set of states and • next state function (determined by current state and the input) • output function (determined by current state and possibly input) • We’ll use a Moore machine (output based only on current state)

Implementing the Control • Value of control signals is dependent upon: • what instruction is being executed • which step is being performed • Use the information we’ve accumulated to specify a finite state machine • specify the finite state machine graphically, or • use microprogramming • Implementation can be derived from specification

Graphical Specification of FSM

Finite State Machine for Control • Implementation:

October 8th, 2007 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f07/

October 8th, 2007 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f07/

Presentation Transcript

Driving Safely in Qatar

smart storage company in Qatar

Qatar Railways Delivery of an Integrated Railway Network

Michael Kahn Research and Innovation Associates, South Africa

Qatar Economic Forum 17 April 2006 Investing in Qatar: Energy City Qatar

Towards a Freedom of Information Law in Qatar

October 1st, 2007 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f07/

Sep 3 rd , 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

August 27 th , 2007 qatar.cmu

Qatar - Yemen - Argentina

September 17th, 2008 Majd F. Sakr msakr@qatar.cmu qatar.cmu/~msakr/15447-f08/

Qatar Exchange Misnad Al- Misnad CRD Director

Qatar Economic Horizons

Qatar Recruitment Consultant –Manpower Recruitment Consultant Qatar

Qatar Recruitment Agency – Manpower Recruitment Qatar

Car Rental Companies in Qatar

Qatar Recruitment Agency –Manpower Recruitment Consultant Qatar

Construction Companies in Qatar

Best Landscaping Companies in Qatar

Exploring Tourism: Qatar Travel Agency & Tour Operator

Recruit Cook In Qatar For Hire - Alliance Recruitment Agency