660 likes | 673 Views
CS/COE0447 Computer Organization & Assembly Language. Chapter 5 Part 3. A Multi-cycle Datapath. A single memory unit for both instructions and data Single ALU rather than ALU & two adders
E N D
CS/COE0447Computer Organization & Assembly Language Chapter 5 Part 3
A Multi-cycle Datapath • A single memory unit for both instructions and data • Single ALU rather than ALU & two adders • Registers added after every major functional unit to hold the output until it is used in a subsequent clock cycle
Multi-Cycle ControlWhat we need to cover • Adding registers after every functional unit • Need to modify the “instruction execution” slides to reflect this • Breaking instruction execution down into cycles • What can be done during the same cycle? What requires a cycle? • Need to modify the “instruction execution” slides again • Timing: Registers/memory updated at the beginning of the next clock cycle • Control signal values • What they are per cycle, per instruction • Finite state machine which determines signals based on instruction type + which cycle it is • Putting it all together
Execution: single-cycle (reminder) • add • Fetch instruction and add 4 to PC add $t2,$t1,$t0 • Read two source registers $t1 and $t0 • Add two values $t1 + $t0 • Store result to the destination register $t1 + $t0 $t2
A Multi-cycle Datapath • For add: • Instruction is stored in the instruction register (IR) • Values read from rs and rt are stored in A and B • Result of ALU is stored in ALUOut
Multi-Cycle Execution: R-type • Instruction fetch • IR <= Memory[PC]; sub $t0,$t1,$t2 • PC <= PC + 4; • Decodeinstruction/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; rt • ALUOut <= PC + (sign-extend(IR[15:0])<<2); later • Execution • ALUOut <= A op B; op = add, sub, and, or,… • Completion • Reg[IR[15:11]] <= ALUOut; $t0 <=ALU result
Execution: single-cycle (reminder) • lw (load word) • Fetch instruction and add 4 to PC lw $t0,-12($t1) • Read the base register $t1 • Sign-extend the immediate offset fff4 fffffff4 • Add two values to get address X =fffffff4 + $t1 • Access data memory with the computed address M[X] • Store the memory data to the destination register $t0
A Multi-cycle Datapath • For lw: lw $t0, -12($t1) • Instruction is stored in the IR • Contents of rs stored in A $t1 • Output of ALU (address of memory location to be read) stored in ALUOut • Value read from memory is stored in the memory data register (MDR)
Multi-cycle Execution: lw • Instruction fetch • IR <= Memory[PC]; lw $t0,-12($t1) • PC <= PC + 4; • InstructionDecode/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • ALUOut <= A + sign-extend(IR[15:0]); $t1 +-12 (sign extended) • Memory Access • MDR <= Memory[ALUOut]; M[$t1 + -12] • Write-back • Load: Reg[IR[20:16]] <= MDR; $t0 <= M[$t1 + -12]
Execution: single-cycle (reminder) • sw (store word) • Fetch instruction and add 4 to PC sw $t0,-4($t1) • Read the base register $t1 • Read the source register $t0 • Sign-extend the immediate offset fffc fffffffc • Add two values to get address X =fffffffc + $t1 • Store the contents of the source register to the computed address $t0 Memory[X]
A Multi-cycle Datapath • For sw: sw $t0, -12($t1) • Instruction is stored in the IR • Contents of rs stored in A $t1 • Output of ALU (address of memory location to be written) stored in ALUOut
Multi-cycle Execution: sw • Instruction fetch • IR <= Memory[PC]; sw $t0,-12($t1) • PC <= PC + 4; • Decode/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; rt • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • ALUOut <= A + sign-extend(IR[15:0]); $t1 + -12 (sign extended) • Memory Access • Memory[ALUOut] <= B; M[$t1 + -12] <= $t0
Execution: single-cycle (reminder) • beq • Fetch instruction and add 4 to PC beq $t0,$t1,L • Assume that L is +3 instructions away • Read two source registers $t0,$t1 • Sign Extend the immediate, and shift it left by 2 • 0x0003 0x0000000c • Perform the test, and update the PC if it is true • If $t0 == $t1, the PC = PC + 0x0000000c • [we will follow what Mars does, so this is not Immediate == 0x0002; PC = PC + 4 + 0x00000008]
A Multi-cycle Datapath • For beq beq $t0,$t1,label • Instruction stored in IR • Registers rs and rt are stored in A and B • Result of ALU (rs – rt) is stored in ALUOut
Multi-cycle execution: beq • Instruction fetch • IR <= Memory[PC]; beq $t0,$t1,label • PC <= PC + 4; • Decode/register read • A <= Reg[IR[25:21]]; rs • B <= Reg[IR[20:16]]; rt • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • PC + #bytes away label is (negative for backward branches, positive for forward branches) • Execution • if (A == B) then PC <= ALUOut; • if $t0 == $t1 perform branch • Note: the ALU is used to evaluate A == B; we’ll see later that this does not clash with the use of the ALU above.
Execution: single-cycle (reminder) • j • Fetch instruction and add 4 to PC • Take the 26-bit immediate field • Shift left by 2 (to make 28-bit immediate) • Get 4 bits from the current PC and attach to the left of the immediate • Assign the value to PC • BUT, as we’ll see soon, only the instruction fetch takes time (at our level of detail)
A Multi-cycle Datapath • For j • No accesses to registers or memory; no need for ALU
Multi-cycle execution: j • Instruction fetch • IR <= Memory[PC]; j label • PC <= PC + 4; • Decode/register read • A <= Reg[IR[25:21]]; • B <= Reg[IR[20:16]]; • ALUOut <= PC + (sign-extend(IR[15:0])<<2); • Execution • PC <= {PC[31:28],IR[25:0],”00”};
Multi-Cycle ControlWhat we need to cover • Adding registers after every functional unit • Need to modify the “instruction execution” slides to reflect this • Breaking instruction execution down into cycles • What can be done during the same cycle? What requires a cycle? • Need to modify the “instruction execution” slides again • Timing: Registers/memory updated at the beginning of the next clock cycle • Control signal values • What they are per cycle, per instruction • Finite state machine which determines signals based on instruction type + which cycle it is • Putting it all together
Operations • These take time: • Memory (read/write); register file (read/write); ALU operations • The other connections and logical elements have no latency (for our purposes)
Fig 5.28 (given exam 3 and Final) Memory, register file, ALU take time; the rest of it doesn’t (for our purposes)
Five Execution Steps • Instruction fetch • Instruction decode and register read • Execution, memory address calculation, or branch completion • Memory access or R-type instruction completion • Write-back • Instruction execution takes 3~5 cycles!
Step 1: Instruction Fetch • Access memory w/ PC to fetch instruction and store it in Instruction Register (IR) • Increment PC by 4 • We can do this because ALU is not busy and we can use it • PC Update is done at the next clock rising edge
Step 2: Decode and Reg. Read • Read registers rs and rt • We read both of them regardless of necessity • Compute the branch address in case the instruction is a branch • We can do this as ALU is not busy • ALUOut will keep the target address • We still don’t set any control signals based on the instruction type • Instruction is being decoded now in the control logic!
Step 3: Various Actions • ALU performs one of three functions based on instruction type • Memory reference • ALUOut <= A + sign-extend(IR[15:0]); • R-type • ALUOut <= A op B; • Branch: • if (A==B) PC <= ALUOut; • Jump: • PC <= {PC[31:28],IR[25:0],2’b00}; // verilog notation
Step 4: Memory Access… • If the instruction is memory reference • MDR <= Memory[ALUOut]; // if it is a load • Memory[ALUOut] <= B; // if it is a store • Store is complete! • If the instruction is R-type • Reg[IR[15:11]] <= ALUOut; • Now the instruction is complete!
Step 5: Register Write Back • Only memory load instruction reaches this step • Reg[IR[20:16]] <= MDR;
Traffic Light Control Example • Two states • NSlite:1: green light on North-South road; 0: red light on North-South Road • EWlite: similar • Two inputs: NS car (a car is sensed on NS road, going either way); EW car (similar) • Current state goes for 30 seconds, then • Switch to the other state if there is a car waiting • Current state goes for another 30 seconds if not • So, use 1/30Hz clock, or 0.033Hz
Traffic Light Control, cont’d • Let’s assign “0” to NSlite and “1” to “EWlite” • NextState=CurrentState’EWcar+CurrentStateNScar’
Finite State Machine (FSM) • FSM • Memory element to keep current state • Next state function • Output function