400 likes | 409 Views
This article discusses the concept of pipelining in processor data paths, its benefits in terms of throughput and efficiency, and the challenges and limitations associated with implementing pipelined architectures. It also covers hazards that can occur in pipelined architectures and ways to mitigate them.
E N D
Processor Data Path • Single cycle processor makes poor use of units:
Processor Data Path • ADD r1, r2, r3 running
Processor Data Path • ADD r1, r2, r3 running
Processor Data Path • ADD r1, r2, r3 running
Processor Data Path • ADD r1, r2, r3 running
Processor Data Path • ADD r1, r2, r3 running
Assembly Lines • Single cycle laundry:
Assembly Lines • Assembly line laundry:
Segmented Data Path • Registers to hold values between stages
Pipelined • Each stage can work on different instruction:
Pipeline vs Not: • Pipeline: 4 ins / 8 cycles • No Pipeline: 2 ins / 10 cycles
Throughput • N stage pipeline: • n - 1 cycles to "prime it" • Then one instruction per cycle
Throughput • N stage pipeline: • Time for i instructions in n stage pipeline • Time for i instructions without pipelining
Throughput • N stage pipeline: • Time for i instructions in n stage pipeline • Time for i instructions without pipelining • Max Speedup: as = n
Pipelining Limits • In theory:n times speedup for n stage pipeline • But • Only if all stages are balanced • Only if can be kept full
Weak Link & Latency • Total data path = 800ps
Weak Link & Latency • Pipelined : can't run faster than slowest step
Weak Link & Latency • Pipelined : can't run faster than slowest step5 x 200ps = 1000ps • Plus delay of memory betweenstages
Pipeline vs Not Clock time 800psno pipeline 200ps pipeline
Weak Link & Latency • First Instruction • No-pipeline: 800ps / 1 instruction • Pipeline: 1000ps / 1 instruction • "Speedup" on first instruction : 0.8x (25% slower) Increased Latency
Weak Link & Latency • Full Pipeline • No-pipeline: 800ps / 1 instruction • Pipeline: 1000ps / 5 instructions = 200 ps / inst • Speedup with full pipeline = = 4x Increased Throughput
Designed for Pipelining • Consistent instruction length • Simple decode logic • No feeding data from memory to ALU
Hazards • Hazard : Situation preventing next instruction from continuing in pipeline • Structural : Resource (shared hardware) conflict • Data : Needed data not ready • Control : Correct action depends on earlier instruction
Structural Hazards • What if one memory?IF and MEMaccess sameunit Mem Mem
Structural Hazards • Conflict between MEM and IF
Dealing with Conflict • Bubble : Unused pipeline stage MOV Bubble LDR SUB ADD
Dealing with Conflict • Bubbles to handle shared memory
Avoiding Structural Hazards • Separate Inst/Data cache • Can’t send memory data to ALU
Data Hazards • Sequence of instructions to be executed:
Data Hazards • RAW : Read After Write • Later instruction depends on result from earlier • ADD writes R1 at time 5 • SUB wants r1 at time 3
Dealing with Data Hazards • Option 1 : NOP = No op = Bubble • Assuming can read new value of r1 as being written : 2 cycles of bubble (otherwise 3)
Dealing with Data Hazards • Option 2 : Clever compiler/programmer reorders instructions: • 1 Bubble eliminated by LDR before SUB
Reorder = New Problems • While reordering, need to maintain critical ordering: • RAW : Read after WriteADD r1, r3, r4ADD r2, r1, r0 • WAR : Write after ReadADD r2, r1, r0ADD r1, r3, r4 • WAW : Write after WriteADD r1, r4, r0ADD r1, r3, r4
Dealing with Data Hazards • Option 3 : Forwarding • Shortcut to send results back to earlier stages
Dealing with Data Hazards • r1’s value forwarded to ALU
Dealing with Data Hazards • Forwarding may not eliminate all bubbles
Dealing with Data Hazards • Requires complex hardware • Potentially slows down pipeline
Pipeline History • Pipelines: