Pipelined Design

Pipelined Design תרגול 9

Latency and Throughput • Latency - Total time to perform a single operation from start to end. • Throughput – operations / latency ratio or GOPS, giga-operations per second. • The clock cycle is always bounded by the slowest operation. • Picoseconds - 10^-12 • Nanoseconds - 10^-9

Latency and Throughput 80ps 30ps 60ps 50ps 70ps 10ps 20ps A B C D E F R E G • How to maximize throughput using an additional register? • Put it between C and D • How to maximize throughput using 2 additional registers? • AB, CD, EF • How to maximize throughput using 3 additional registers? • A, BC, D, EF • What is the cycle time, latency and throughput in that case? • 110ps, 440ps, (1/110) * 1000 = 9.09 • How to get max throughput with min stages? • 5 stages, since we cannot go lower than 100ps for a clock cycle

Pipeline registers • Select A, Since only jxx and call need valP at further stages (E and M resp.) • Predicted PC

Control Hazards • PC prediction (Fetch stage) • jxx , ret → ? • call, jmp → ValC • Other → ValP • Branch prediction strategies • Always-taken • Never-taken • Backward taken, forward not taken • Why are forward branches less common ? • How can we deal with stack prediction (ret) ? 5

Data Hazards • Data dependencies in Processors • Result from one instruction is being used as an operand for another (nop example) • Stalling / forwarding / load-use hazard 6

Data Dependencies: 2 Nop’s 1 2 3 4 5 6 7 8 9 10 # demo-h2.ys F F D D E E M M W W 0x000: irmovl $10,%edx F F D D E E M M W W 0x006: irmovl $3,%eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: nop F F D D E E M M W W 0x00e: addl %edx,%eax F F D D E E M M W W 0x010: halt Cycle 6 W W W  ← R[ ] R[ R[ 3 ] ] 3 3 %eax %eax %eax • • • • • • D D D ←   valA valA valA R[ R[ R[ ] = 10 ] = 10 ] = 10 Error %edx %edx %edx   valB valB valB R[ R[ R[ ] = 0 ] = 0 ] = 0 %eax %eax %eax ←

Stalling solution 1 2 3 4 5 6 7 8 9 10 # demo-h2.ys F D E M W 0x000: irmovl $10,%edx F D E M W 0x006: irmovl $3,%eax F D E M W 0x00c: nop F D E M W 0x00d: nop E M W bubble F D D E M W 0x00e: addl %edx,%eax F F D E M W 0x010: halt If instruction follows too closely after one that writes register,  slow it down Hold instruction in decode  Dynamically inject nop into execute stage 

Data Forwarding Solution 1 2 3 4 5 6 7 8 9 # demo-h2.ys F F D D E E M M W W 0x000: irmovl $10,%edx F F D D E E M M W W 0x006: irmovl $3,%eax F F D D E E M M W W 0x00c: nop F F D D E E M M W W 0x00d: nop F F D D E E M M W W 0x00e: addl%edx,%eax F F D D E E M M W W 0x010: halt Cycle 6 in write- irmovl  W back stage ← R[ ] 3 W_dstE= %eax %eax Destination value in W_valE = 3  W pipeline register • • • Forward as valB for  D decode stage ← valA R[ ] = 10 srcA= %edx %edx ← srcB= %eax valB W_valE= 3

Limitation of Forwarding # demo-luh.ys 1 2 3 4 5 6 7 8 9 10 11 F D E M W 0x000: irmovl $128,%edx F D E M W 0x006: irmovl $3,%ecx F D E M W 0x00c: rmmovl %ecx, 0(%edx) F D E M W 0x012: irmovl $10,%ebx F F D D E E M M W W 0x018: mrmovl 0(%edx),%eax # Load %eax F D E M W 0x01e: addl%ebx,%eax # Use %eax F D E M W 0x020: halt Load-use dependency Cycle 7 Cycle 8 Value needed by end of  M M decode stage in cycle 7 M_dstE= M_dstM= %ebx %eax ← M_valE= 10 m_valM M[128] = 3 Value read from memory in  memory stage of cycle 8 • • • D D Error ← valA valA M_valE= 10 M_valE= 10  ← valB valB R[ R[ ] = 0 ] = 0 %eax %eax 

Avoiding Load/Use Hazard # demo-luh.ys 1 2 3 4 5 6 7 8 9 10 11 F F D D E E M M W W 0x000: irmovl $128,%edx F F D D E E M M W W 0x006: irmovl $3,%ecx F F D D E E M M W W 0x00c: rmmovl %ecx, 0(%edx) F F D D E E M M W W 0x012: irmovl $10,%ebx F F F D D D E E E M M M W W W 0x018: mrmovl 0(%edx),%eax# Load %eax E M W bubble F D D E M W 0x01e:addl%ebx,%eax# Use %eax F F D E M W 0x020: halt Stall using instruction for  Cycle 8 one cycle W W Can then pick up loaded  W_dstE= W_dstE= %ebx %ebx value by forwarding from W_valE= 10 W_valE= 10 memory stage M M M_dstM= M_dstM= %eax %eax ← m_valM0M[128] = 3 m_valM M[128] = 3 • D D • valA valA W_valE= 10 W_valE= 10 ←  valB valB m_valM= 3 m_valM= 3 ← 

Control Logic Processing “ret” Must stall until instruction reaches write back Load/use hazard Must stall between read memory and use Mis-predicted branch removing instructions from the pipe 12

Implementing the forwarding logic • Note: 5 forwarding sources

Pipelined Design

Pipelined Design

Presentation Transcript

Pipelined ADC

Lecture 5. MIPS Processor Design Pipelined MIPS #1

Lecture 5. MIPS Processor Design Pipelined MIPS #2

Pipelined Datapath

Pipelined Processor Design

Design of Low-Power Pipelined ADCs

Pipelined Electronics

Pipelined protocols

Pipelined Implementation

Pipelined Architecture

Pipelined Processor Design

Lecture 9. MIPS Processor Design – Pipelined Processor Design #2

Pipelined ADC

Pipelined Processor Design

PIPELINED PROCESSORS

Pipelined Pattern

Pipelined Processor Design

Pipelined protocols

Pipelined Electronics

Pipelined Computations

Pipelined Processor Design

Pipelined Computations