190 likes | 263 Views
This study explores the concept of pipelining through a comparison of sequential and pipelined designs using the example of building a bicycle. The text discusses the benefits of pipelined performance, unequal pipeline stages, poorly-balanced stages, and solutions for re-balancing and splitting pipeline stages. It also covers pipeline hazards, lessons learned, RISC architectures, and the implementation of pipelining for improved efficiency and performance.
E N D
Pipelining See: P&H Chapter 4.5
The Kids • Alice • Bob • They don’t always get along…
The Materials Saw Drill Glue Paint
The Instructions • N pieces, each built following same sequence: Saw Drill Glue Paint
Design 1: Sequential Schedule • Alice owns the room • Bob can enter when Alice is finished • Repeat for remaining tasks • No possibility for conflicts
Sequential Performance • Elapsed Time for Alice: 4 • Elapsed Time for Bob: 4 • Total elapsed time: 4*N • Can we do better? time1 2 3 4 5 6 7 8 … Latency: Throughput: Concurrency:
Design 2: Pipelined Design • Partition room into stagesof a pipeline Dave Carol Bob Alice One person owns a stage at a time 4 stages 4 people working simultaneously Everyone moves right in lockstep
time1 2 3 4 5 6 7… Pipelined Performance Latency: Throughput: Concurrency: What if some tasks don’t need drill?
Unequal Pipeline Stages 90 min 30 min 45 min 15 min 0h 1h 2h 3h… Latency: Throughput: Concurrency:
Poorly-balanced Pipeline Stages 90 min 30 min 45 min 15 min 0h 1h 2h 3h… Latency: Throughput: Concurrency:
Re-Balanced Pipeline Stages 90 min 30 min 45 min 15 min 0h 1h 2h 3h… Latency: Throughput: Concurrency:
Splitting Pipeline Stages 45+45 min 30 min 45 min 15 min 0h 1h 2h 3h… Latency: Throughput: Concurrency:
Pipeline Hazards • Q: What if glue step of task 3 depends on output of task 1? 0h 1h 2h 3h… Latency: Throughput: Concurrency:
Lessons • Principle: • Latencies can be masked by parallel execution • Pipelining: • Identify pipeline stages • Isolate stages from each other • Resolve pipeline hazards
A Processor memory registerfile inst alu +4 +4 addr =? PC din dout control cmp offset memory new pc target imm extend
A Processor memory registerfile inst alu +4 addr PC din dout control memory new pc computejump/branchtargets imm extend Write-Back InstructionDecode InstructionFetch Memory Execute
Basic Pipeline Five stage “RISC” load-store architecture • Instruction fetch (IF) • get instruction from memory, increment PC • Instruction Decode (ID) • translate opcode into control signals and read registers • Execute (EX) • perform ALU operation, compute jump/branch targets • Memory (MEM) • access memory if needed • Writeback (WB) • update register file Slides thanks to Sally McKee & Kavita Bala
Pipelined Implementation • Break instructions across multiple clock cycles (five, in this case) • Design a separate stage for the execution performed during each clock cycle • Add pipeline registers to isolate signals between different stages