760 likes | 853 Views
CS/COE0447 Computer Organization & Assembly Language. Multi-Cycle Execution. A Multi-cycle Datapath. A single memory unit for both instructions and data Single ALU rather than ALU & two adders
E N D
CS/COE0447Computer Organization & Assembly Language Multi-Cycle Execution
A Multi-cycle Datapath • A single memory unit for both instructions and data • Single ALU rather than ALU & two adders • Registers added after every major functional unit to hold the output until it is used in a subsequent clock cycle
Multi-Cycle ControlWhat we need to cover • Adding registers after every functional unit • Need to modify the “instruction execution” slides to reflect this • Breaking instruction execution down into cycles • What can be done during the same cycle? What requires a cycle? • Need to modify the “instruction execution” slides again • Timing • Control signal values • What they are per cycle, per instruction • Finite state machine which determines signals based on instruction type + which cycle it is • Putting it all together
Execution: single-cycle (reminder) • add • Fetch instruction and add 4 to PC add $t2,$t1,$t0 • Read two source registers $t1 and $t0 • Add two values $t1 + $t0 • Store result to the destination register $t1 + $t0 $t2
A Multi-cycle Datapath • For add: • Instruction is stored in the instruction register (IR) • Values read from rs and rt are stored in A and B • Result of ALU is stored in ALUOut
Execution: single-cycle (reminder) • lw (load word) • Fetch instruction and add 4 to PC lw $t0,-12($t1) • Read the base register $t1 • Sign-extend the immediate offset fff4 fffffff4 • Add two values to get address X =fffffff4 + $t1 • Access data memory with the computed address M[X] • Store the memory data to the destination register $t0
A Multi-cycle Datapath • For lw: lw $t0, -12($t1) • Instruction is stored in the IR • Contents of rs stored in A $t1 • Output of ALU (address of memory location to be read) stored in ALUOut • Value read from memory is stored in the memory data register (MDR)
Execution: single-cycle (reminder) • sw • Fetch instruction and add 4 to PC sw $t0,-4($t1) • Read the base register $t1 • Read the source register $t0 • Sign-extend the immediate offset fffc fffffffc • Add two values to get address X =fffffffc + $t1 • Store the contents of the source register to the computed address $t0 Memory[X]
A Multi-cycle Datapath • For sw: sw $t0, -12($t1) • Instruction is stored in the IR • Contents of rs stored in A $t1 • Output of ALU (address of memory location to be written) stored in ALUOut
Execution: single-cycle (reminder) • beq • Fetch instruction and add 4 to PC beq $t0,$t1,L • Assume that L is +4 instructions away • Read two source registers $t0,$t1 • Sign Extend the immediate, and shift it left by 2 • 0x0003 0x0000000c • Perform the test, and update the PC if it is true • If $t0 == $t1, the PC = PC + 0x0000000c
A Multi-cycle Datapath • For beq beq $t0,$t1,label • Instruction stored in IR • Registers rs and rt are stored in A and B • Result of ALU (rs – rt) is stored in ALUOut
Execution: single-cycle (reminder) • j • Fetch instruction and add 4 to PC • Take the 26-bit immediate field • Shift left by 2 (to make 28-bit immediate) • Get 4 bits from the current PC and attach to the left of the immediate • Assign the value to PC
A Multi-cycle Datapath • For j • No accesses to registers or memory; no need for ALU
Multi-Cycle ControlWhat we need to cover • Adding registers after every functional unit • Need to modify the “instruction execution” slides to reflect this • Breaking instruction execution down into cycles • What can be done during the same cycle? What requires a cycle? • Need to modify the “instruction execution” slides again • Timing • Control signal values • What they are per cycle, per instruction • Finite state machine which determines signals based on instruction type + which cycle it is • Putting it all together
Multicycle Approach • Break up the instructions into steps • each step takes one clock cycle • balance the amount of work to be done in each step/cycle so that they are about equal • restrict each cycle to use at most once each major functional unit so that such units do not have to be replicated • functional units can be shared between different cycles within one instruction
Operations • These take time: • Memory (read/write); register file (read/write); ALU operations • The other connections and logical elements have no latency (for our purposes)
Five Execution Steps • Each takes one cycle • In one cycle, there can be at most one memory access, at most one register access, and at most one ALU operation • But, you can have a memory access, an ALU op, and/or a register access, as long as there is no contention for resources • Changes to registers are made at the end of the clock cycle • PC, ALUOut, A, B, etc. save information for the next clock cycle
Step 1: Instruction Fetch • Access memory w/ PC to fetch instruction and store it in Instruction Register (IR) • Increment PC by 4 • We can do this because the ALU is not being used for something else this cycle
Step 2: Decode and Reg. Read • Read registers rs and rt • We read both of them regardless of necessity • Compute the branch address in case the instruction is a branch • We can do this because the ALU is not busy • ALUOut will keep the target address
Step 3: Various Actions • ALU performs one of three functions based on instruction type (later – cycles per type of instruction; easier to understand) • Memory reference • ALUOut <= A + sign-extend(IR[15:0]); • R-type • ALUOut <= A op B; • Branch: • if (A==B) PC <= ALUOut; • Jump: • PC <= {PC[31:28],IR[25:0],2’b00};
Step 4: Memory Access… • If the instruction is memory reference • MDR <= Memory[ALUOut]; // if it is a load • Memory[ALUOut] <= B; // if it is a store • Store is complete! • If the instruction is R-type • Reg[IR[15:11]] <= ALUOut; • Now the instruction is complete!
Step 5: Register Write Back • Only the lw instruction reaches this step • Reg[IR[20:16]] <= MDR;
4 Multicycle Execution Step (1):Instruction Fetch IR = Memory[PC]; PC = PC + 4; PC + 4
Reg[rs] PC + 4 Reg[rt] Multicycle Execution Step (2):Instruction Decode & Register Fetch A = Reg[IR[25-21]]; (A = Reg[rs]) B = Reg[IR[20-15]]; (B = Reg[rt]) ALUOut = (PC + sign-extend(IR[15-0]) << 2) Branch Target Address
Reg[rs] Mem. Address PC + 4 Reg[rt] Multicycle Execution Step (3):Memory Reference Instructions ALUOut = A + sign-extend(IR[15-0]);
Reg[rs] PC + 4 Reg[rt] Multicycle Execution Step (4):Memory Access - Write (sw) Memory[ALUOut] = B;
Mem. Address Reg[rs] PC + 4 Reg[rt] Multicycle Execution Step (4):Memory Access - Read (lw) MDR = Memory[ALUOut]; Mem. Data
Reg[rs] Mem. Address PC + 4 Mem. Data Reg[rt] Multicycle Execution Step (5):Memory Read Completion (lw) Reg[IR[20-16]] = MDR;
Reg[rs] R-Type Result PC + 4 Reg[rt] Multicycle Execution Step (3):ALU Instruction (R-Type) ALUOut = A op B
Reg[rs] R-Type Result PC + 4 Reg[rt] Multicycle Execution Step (4):ALU Instruction (R-Type) Reg[IR[15:11]] = ALUOUT
Branch Target Address Reg[rs] Reg[rt] Multicycle Execution Step (3):Branch Instructions if (A == B) PC = ALUOut; Branch Target Address
Branch Target Address Reg[rs] Reg[rt] Multicycle Execution Step (3):Jump Instruction PC = PC[31-28] concat (IR[25-0] << 2) Jump Address
For Reference • The next 5 slides give the steps, one slide per instruction
Multi-Cycle Execution: R-type • Instruction fetch • IR <= Memory[PC]; sub $t0,$t1,$t2 • PC <= PC + 4; • Decode instruction/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; rt • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • ALUOut <= A op B; op = add, sub, and, or,… • Completion • Reg[IR[15:11]] <= ALUOut; $t0 <=ALU result
Multi-cycle Execution: lw • Instruction fetch • IR <= Memory[PC]; lw $t0,-12($t1) • PC <= PC + 4; • Instruction Decode/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • ALUOut <= A + sign-extend(IR[15:0]); $t1 +-12 (sign extended) • Memory Access • MDR <= Memory[ALUOut]; M[$t1 + -12] • Write-back • Load: Reg[IR[20:16]] <= MDR; $t0 <= M[$t1 + -12]
Multi-cycle Execution: sw • Instruction fetch • IR <= Memory[PC]; sw $t0,-12($t1) • PC <= PC + 4; • Decode/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; rt • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • ALUOut <= A + sign-extend(IR[15:0]); $t1 + -12 (sign extended) • Memory Access • Memory[ALUOut] <= B; M[$t1 + -12] <= $t0
Multi-cycle execution: beq • Instruction fetch • IR <= Memory[PC]; beq $t0,$t1,label • PC <= PC + 4; • Decode/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; rt • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • if (A == B) then PC <= ALUOut; • if $t0 == $t1 perform branch
Multi-cycle execution: j • Instruction fetch • IR <= Memory[PC]; j label • PC <= PC + 4; • Decode/register read • A <= Reg[IR[25:21]]; • B <= Reg[IR[20:16]]; • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • PC <= {PC[31:28],IR[25:0],”00”};
Multi-Cycle ControlWhat we need to cover • Adding registers after every functional unit • Need to modify the “instruction execution” slides to reflect this • Breaking instruction execution down into cycles • What can be done during the same cycle? What requires a cycle? • Need to modify the “instruction execution” slides again • Timing • Control signal values • What they are per cycle, per instruction • Finite state machine which determines signals based on instruction type + which cycle it is • Putting it all together
Examplefrom beginning to end • lw $t0,4($t1) • Machine code: opcode rs rt immediate • 100011 01001 01000 0000 0000 0000 0100 • IR[31:26] IR[25:21] IR[20:16] IR[15:0] rs rt
Multi-cycle Execution: lw • Instruction fetch • IR <= Memory[PC]; lw $t0,-12($t1) • PC <= PC + 4; • Instruction Decode/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • ALUOut <= A + sign-extend(IR[15:0]); $t1 +-12 (sign extended) • Memory Access • MDR <= Memory[ALUOut]; M[$t1 + -12] • Write-back • Load: Reg[IR[20:16]] <= MDR; $t0 <= M[$t1 + -12]
Example: Load (1) 00 1 1 0 0 1 0 01 00
rs rt Example: Load (2) 0 11 00
Example: Load (3) 1 10 00
Example: Load (4) 1 1 0
Example: Load (5) 1 0 1
Example: Jump (1) 00 1 1 0 0 1 0 01 00
Example: Jump (2) 0 11 00