190 likes | 213 Views
Reducing Average CPE Time On A Y86 Pipelined Processor. Darren Stikes DS58062. Y86 Processor. Has A pipeline architecture. Has 5 Stages, FDEMW Its Pipeline allows 5 instructions at a time. The Five Stages (Fetch).
E N D
Reducing Average CPE Time On A Y86 Pipelined Processor Darren Stikes DS58062
Y86 Processor • Has A pipeline architecture. • Has 5 Stages, FDEMW • Its Pipeline allows 5 instructions at a time.
The Five Stages(Fetch) • Gets the address at the current program counter and reads the instruction there • Fetches the address of the next instruction
Decode Stage • Reads in registers used in the instruction • Places them in the correct processor registers so they can be used by the execute stage
Execute stage • Computes memory address • uses the ALU to make a computation to registers
Memory Stage • Reads and writes data needed to be used for the current instruction
Write stage • Saves the computed value from the other stages into a register or memory address.
E M W F F D E M W Problems? • Since more than one instruction is being ran though the pipeline at one time, problems occur when data needed from the previous instruction isn’t computed yet. • Some of the ways these problems can be fixed is by adding bubbles or stalling F D E M W F D E M W F D E M W D D E M W F F D E M W
Common Problems • Most Problems in Pipeline architecture can be handled by forwarding, which is using pipeline registers to obtain a value before it is written normally
Unsolved Problems • Load/use Data hazards. • Branch-Missprediction
Happens when something is needed from memory of the previous instruction. This doesn’t work because it simply needs more cycles in between the two. Put instructions between the two that uses registers independent of the one with a problem. This will give time for the information to be collected. Load/Use Data Hazards Solutions
With the Y86 Pipeline Architecture, if a condition branch is in the pipeline, it assumes that the branch will be taken. The problem Arises when the branch ends up not taken and falls through. All the instructions that have started down the pipeline would then not be the correct instructions, and they would need to be removed. If you have an idea of which way the condition codes will be at the time of the branch, you can rearrange your code to where your branch would be taken most of the time. Branch - Miss prediction Solution
Hardware Added instruction iaddl which will allow a number added to a register. This saves lines of code used to simply place a number in a register, just to add it once. Added instruction leave, which takes the place of the two instructions: rrmovl %ebp, %esp popl %ebp with: leave Software Replaced old hardware instructions with the new ones. Rearranged the conditional branch inside the loop to be taken most of the time. Placed a line of code between a use/load hazard. Rearranged the place where I subtracted the length, then removed the and instruction that was specifically designed to set condition codes. Enhancements!
BEFORE: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) andl %eax, %eax jle Npos irmovl $1, %edi addl %edi, %esi Npos: irmovl $1, %edi subl %edi, %edx irmovl $4, %edi addl %edi, %ebx addl %edi, %ecx andl %edx,%edx jg Loop AFTER: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) andl %eax, %eax jle Npos addl $1, %esi Npos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx,%edx jg Loop Also added the leave instruction at bottom of code Changes #1
Before: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) andl %eax, %eax jle Npos addl $1, %esi Npos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx,%edx jg Loop AFTER: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) iaddl $1, %esi andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx, %edx jg Loop Changes#2
Before: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) iaddl $1, %esi andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx, %edx jg Loop AFTER: Loop: mrmovl (%ebx), %eax iaddl $1, %esi rmmovl %eax, (%ecx) andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx, %edx jg Loop Changes #3
Before: Loop: mrmovl (%ebx), %eax iaddl $1, %esi rmmovl %eax, (%ecx) andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx, %edx jg Loop AFTER: Loop: mrmovl (%ebx), %eax iaddl $1, %esi rmmovl %eax, (%ecx) andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $4, %ebx iaddl $4, %ecx iaddl $-1, %edx jg Loop Changes #4
Results • The results after each change are as follows: • This resulted in lowering my CPE time by 36%, from 18.15 to 11.59.