380 likes | 556 Views
COE 308. Enhancing Performance with Pipelining. Laundry Example. Student doing laundry (processing one load). Washing a single load of laundry. Drying a single load of laundry. Folding a single load. Putting the load in the closet. Sequential Laundry. 1. 2 AM. 7. 8. 9. 10. 11. 12.
E N D
COE 308 Enhancing Performance with Pipelining COE 308
Laundry Example Student doing laundry (processing one load) Washing a single load of laundry Drying a single load of laundry Folding a single load Putting the load in the closet COE 308
Sequential Laundry 1 2 AM 7 8 9 10 11 12 6 PM Task order A B C D Sequential Laundry takes 8 hours for four loads of wash … COE 308
Pipelined Laundry 6 PM 1 2 AM 7 8 9 10 11 12 Task order A B C D … while pipelined laundry takes just 3.5 hours COE 308
Pipelining Analysis Pipelining possible because: • All four laundry steps use independent stations • Washing uses the washer which is independent from the dryer used in the drying step and from the table used in the folding step. • This means that once the washing step is done, it is possible to use the washer (for another load) while the current load is drying in the dryer • All steps are always used in the sameorder • Washing always occurs before drying as it is not correct to dry clothes that haven’t been washed yet • Drying always occur before folding • … COE 308
Pipelining Processor Execution • Processor executes instructions • Instruction execution process can be pipelined ? • Yes because it can be divided into steps • And because the order of the execution steps is the same (most of the time) • Instruction execution steps • Fetch instruction from memory • Read registers while decoding the instruction • Execute the operation • Access an operand in data memory • Write the result into a register COE 308
Pipeline Stages Instruction execution steps are called: pipeline stages: IF • Instruction Fetch (IF stage) • Instruction Decode (ID) • EXecute operation (EX) • MEMory access (MEM) • Write Back the result (WB) ID EX MEM WB COE 308
Processor Pipeline Pipeline is well represented as a timing diagram (laundry example) The following sequence is represented: add $1, $3, $5 sub $3, $1, $4 and $2, $5, $1 or $7, $1, $9 addi $10, $6, $3 add sub and or addi IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Clock Cycle IF ID EX MEM WB IF ID EX MEM WB Five Instructions are Executed in 9 cycles COE 308
Data Dependency Hazard Examine the following instructions: add $1, $3, $5 sub $3, $1, $4 and $2, $5, $1 or $7, $1, $9 There is a dependency between add and sub on register $1 as it is used by sub after it is modified by add The result of the add instruction is written in the $1 register NOT BEFORE the WB stage add sub IF ID EX MEM WB IF ID EX MEM WB However, the sub instruction fetches the value of register $1 during the ID stage Problem: The sub instruction will fetch the wrong value of register $1 because the correct value has not been written in there yet. COE 308
Types of Dependencies All cases of data dependencies should be analyzed to see whether they cause any malfunction in the pipeline context: Data Dependency cases: • Read After Write (RAW) • Read After Read (RAR) • Write After Write (WAW) • Write After Read (WAR) COE 308
RAW Dependency add $1, $3, $5 sub $3, $1, $4 and $2, $5, $1 or $7, $1, $9 Read After Write (RAW) dependencies It is the fact that some instructions have the same source register that is a destination in a previous instruction which means that the next instructions will need to read the value of this register while it is going to be written by the previous instruction Problem: The next instruction(s) will fetch the wrong values of the dependent registers because the correct values have not been written back yet. COE 308
RAR Dependency add $1, $3, $5 sub $3, $5, $4 and $2, $4, $1 or $7, $1, $9 Read After Read (RAR) dependencies Two consecutive instructions use the same register as a source operand No Problem: As long as the registers are not modified, pipelining does not affect the normal execution process in this case COE 308
WAW Dependency add $1, $3, $5 sub $1, $5, $4 and $4, $4, $1 or $4, $1, $9 Write After Write (WAW) dependencies Two consecutive instructions use the same register as a destination operand No Problem: Writes occur during the last pipeline stage and no inconsistency results from this situation because the instructions execution order is maintained COE 308
WAR Dependency add $1, $3, $5 sub $3, $5, $2 and $2, $4, $1 or $7, $1, $9 Write After Read (RAR) dependencies The next instruction uses the same register, used as a source operand by a previous instruction, as destination register No Problem: Read occurs in ID stage and Write occurs in WB stage which means that the order of operations is not altered by the pipeline structure COE 308
RAW Dependency Cases Case 1 dependency between instruction i and instruction i+1 i: add $1, $3, $5 i+1: sub $3, $1, $4 i+2: and $2, $5, $1 i+3: or $7, $1, $9 Case 2 dependency between instruction i and instruction i+2 Case 3 dependency between instruction i and instruction i+3 Every case needs to be checked in order to determined whether it poses a real problem or not COE 308
RAW Dependency Case 1 i: add $1, $3, $5 i+1: sub $3, $1, $4 i+2: and $2, $5, $1 i+3: or $7, $1, $9 Case 1 dependency between instruction i and instruction i+1 add sub IF ID EX MEM WB IF ID EX MEM WB Operand is fetched BEFORE it is written back COE 308
RAW Dependency Case 2 i: add $1, $3, $5 i+1: sub $3, $1, $4 i+2: and $2, $5, $1 i+3: or $7, $1, $9 Case 2 dependency between instruction i and instruction i+1 add sub and IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Operand is fetched BEFORE it is written back COE 308
RAW Dependency Case 3 i: add $1, $3, $5 i+1: sub $3, $1, $4 i+2: and $2, $5, $1 i+3: or $7, $1, $9 Case 3 dependency between instruction i and instruction i+1 add sub and or IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Operand is fetched AT THE SAME TIME it is written back COE 308
Register File Model Case 3 does not pose a problem because we assume that: In the Register File Writes occur BEFORE Reads This is only true if we use the falling edge of the clock to write Clock ID Stage Write is prepared here Read occurs here Write occurs here COE 308
Data Dependency Solutions • Data dependency between instructions causes fetch of operands at the wrong time. • Obvious remedy is to DELAY the fetch of operands to after the correct value is written in the register file • In software, by inserting NOP instructions • In hardware, by stalling the pipeline COE 308
NOP Insertion Insertion of two NOP instructions will solve the data dependency problem add $1, $3, $5 nop nop sub $3, $1, $4 and $2, $5, $1 add nop nop sub and IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB COE 308
Pipeline Stall add $1, $3, $5 sub $3, $1, $4 and $2, $5, $1 or $7, $1, $9 addi $10, $6, $3 Delaying the fetch of the operands can be implemented in software add sub and or addi IF ID EX MEM WB IF IF IF ID EX MEM WB IF ID EX MEM The instruction sub is maintained in the IF stage for two extra clock cycles IF ID EX IF ID It is equivalent to … COE 308
Pipeline Stall … inserting bubbles in the pipeline add sub or IF ID EX MEM WB ID EX MEM WB While virtual nop instructions are inserted in the pipeline (as bubbles) IF ID EX MEM WB IF IF ID EX MEM WB IF ID EX MEM The instruction sub is maintained in the IF stage for two extra clock cycles COE 308
Branch Hazard Examine the following instructions: In the case the branch is taken, the instructions sub and add are wrongfully executed because they are fetched BEFORE the branch decision is made beq $1, $3, Target sub $3, $1, $4 and $2, $5, $1 ... Target: or $3, $5, $9 beq sub and or IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Branch decision is taken and Target is fetched IF ID EX MEM WB Problem: Modification of the Program Logic: Unacceptable Behavior COE 308
Branch Hazard Solution The solution is to: • Not to let the instructions after the branch finish execution in the case the branch is taken • Instruction transformation into nops (in hardware) • Put instructions which do not disturb the logic of the program after the branch instruction so that their execution will not modify the logic of the program. • Insertion of nop instructions after each branch instruction (by the compiler) COE 308
NOP forcing After branch is taken, following instruction are forced as NOP instructions for the subsequent pipeline stages until the branch target instruction is fetched. NOP will have no effect. It is also said that instruction execution is killed beq sub and or IF ID EX MEM WB Transformed into NOPs after branch taken IF ID EX MEM WB IF ID EX MEM WB Branch decision is taken and Target is fetched IF ID EX MEM WB COE 308
NOP Insertion beq $1, $3, Target sub $3, $1, $4 and $2, $5, $1 ... Target: or $3, $5, $9 Insertion of NOP instructions, by the compiler, after each branch instruction, does not disturb the logic of the program. add nop nop or IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB COE 308
Delayed Branch • Insertion of NOP instructions introduces a substantial overhead that increases the instruction count significantly. • Idea is to move actual instructions from the area before the branch to the slots after the branch to fill in the nop slots without modifying the logic of the program Transformed code Original code xor $2, $2, $5 and $1, $7, $8 sub $10, $6, $4 add $3, $6, $7 beq $1, $3, Target sub $3, $1, $4 and $2, $5, $1 No dependency Register $1 used by beq No dependency Register $3 used by beq and $1, $7, $8 add $3, $6, $7 beq $1, $3, Target xor $2, $2, $5 sub $10, $6, $4 sub $3, $1, $4 and $2, $5, $1 COE 308
Delayed Branch Consider the transformed code obtained after moving the xor and sub instructions after the beq instruction: A programmer who reads the code without any idea about the execution will think that the branch occurs here and $1, $7, $8 add $3, $6, $7 beq $1, $3, Target xor $2, $2, $5 sub $10, $6, $4 sub $3, $1, $4 and $2, $5, $1 The execution will actually make the branch take effect here; so while the instructions xor and sub are executed, the second sub and the and instructions are not Branch instruction and branch execution are sparated by a two instruction delay that’s why it is called: Delayed Branch COE 308
Pipelined Datapath COE 308
Inserting Pipeline Registers COE 308
Writing Back the Result COE 308
Destination Register Specifier ? COE 308
Branch Logic COE 308
Pipelined Control COE 308
Data Hazards and Forwarding COE 308
Forwarding Unit COE 308