Pipelining

Pipelining

Processor Data Path • Single cycle processor makes poor use of units:

Processor Data Path • ADD r1, r2, r3 running

Assembly Lines • Single cycle laundry:

Assembly Lines • Assembly line laundry:

Segmented Data Path

Segmented Data Path • Registers to hold values between stages

Pipelined • Each stage can work on different instruction:

Pipeline vs Not: • Pipeline: 4 ins / 8 cycles • No Pipeline: 2 ins / 10 cycles

Throughput • N stage pipeline: • n - 1 cycles to "prime it" • Then one instruction per cycle

Throughput • N stage pipeline: • Time for i instructions in n stage pipeline • Time for i instructions without pipelining

Throughput • N stage pipeline: • Time for i instructions in n stage pipeline • Time for i instructions without pipelining • Max Speedup: as = n

Pipelining Limits • In theory:n times speedup for n stage pipeline • But • Only if all stages are balanced • Only if can be kept full

Weak Link & Latency • Total data path = 800ps

Weak Link & Latency • Pipelined : can't run faster than slowest step

Weak Link & Latency • Pipelined : can't run faster than slowest step5 x 200ps = 1000ps • Plus delay of memory betweenstages

Pipeline vs Not Clock time 800psno pipeline 200ps pipeline

Weak Link & Latency • First Instruction • No-pipeline: 800ps / 1 instruction • Pipeline: 1000ps / 1 instruction • "Speedup" on first instruction : 0.8x (25% slower) Increased Latency

Weak Link & Latency • Full Pipeline • No-pipeline: 800ps / 1 instruction • Pipeline: 1000ps / 5 instructions = 200 ps / inst • Speedup with full pipeline = = 4x Increased Throughput

Designed for Pipelining • Consistent instruction length • Simple decode logic • No feeding data from memory to ALU

Hazards • Hazard : Situation preventing next instruction from continuing in pipeline • Structural : Resource (shared hardware) conflict • Data : Needed data not ready • Control : Correct action depends on earlier instruction

Structural Hazards • What if one memory?IF and MEMaccess sameunit Mem Mem

Structural Hazards • Conflict between MEM and IF

Dealing with Conflict • Bubble : Unused pipeline stage MOV Bubble LDR SUB ADD

Dealing with Conflict • Bubbles to handle shared memory

Avoiding Structural Hazards • Separate Inst/Data cache • Can’t send memory data to ALU

Data Hazards • Sequence of instructions to be executed:

Data Hazards • RAW : Read After Write • Later instruction depends on result from earlier • ADD writes R1 at time 5 • SUB wants r1 at time 3

Dealing with Data Hazards • Option 1 : NOP = No op = Bubble • Assuming can read new value of r1 as being written : 2 cycles of bubble (otherwise 3)

Dealing with Data Hazards • Option 2 : Clever compiler/programmer reorders instructions: • 1 Bubble eliminated by LDR before SUB

Reorder = New Problems • While reordering, need to maintain critical ordering: • RAW : Read after WriteADD r1, r3, r4ADD r2, r1, r0 • WAR : Write after ReadADD r2, r1, r0ADD r1, r3, r4 • WAW : Write after WriteADD r1, r4, r0ADD r1, r3, r4

Dealing with Data Hazards • Option 3 : Forwarding • Shortcut to send results back to earlier stages

Dealing with Data Hazards • r1’s value forwarded to ALU

Dealing with Data Hazards • Forwarding may not eliminate all bubbles

Dealing with Data Hazards • Requires complex hardware • Potentially slows down pipeline

Pipeline History • Pipelines:

Pipelining

Pipelining

Presentation Transcript

Pipelining

PIPELINING

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining

Pipelining