960 likes | 1.15k Views
CS2100 Computer Organisation http://www.comp.nus.edu.sg/~cs2100/. Pipelining (AY2013/2014) Semester 2. Road Map: Part II. Performance. Assembly Language. Processor: Datapath. Pipelining MIPS Pipeline Pipeline datapath & control Pipeline hazards Branch prediction. Processor:
E N D
CS2100 Computer Organisationhttp://www.comp.nus.edu.sg/~cs2100/ Pipelining (AY2013/2014) Semester 2
Road Map: Part II Performance Assembly Language Processor: Datapath • Pipelining • MIPS Pipeline • Pipeline datapath & control • Pipeline hazards • Branch prediction Processor: Control Pipelining Cache Pipelining
Pipelining: Overview • MIPS Pipeline Stages • Pipelined Datapath and Control • Pipeline Hazards • Structural • Data • Control • Forwarding • Branch prediction Pipelining
Inspiration: Assembly Line Pipelining
A B C D The Laundary Example • Ann, Brian, Cathy, Dave each have one load of clothes to wash, dry, fold and stash • Washertakes 30 minutes • Dryer takes 30 minutes • “Folder” takes 30 minutes • “Stasher” takes 30 minutesto put clothes into drawers Pipelining
2 AM 12 6 PM 1 8 7 11 10 9 T a s k O r d e r 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time A B C D Sequential Laundry • Sequential laundry takes 8 hours for 4 loads • Steady state: 1 load every 2 hours • If they learned pipelining, how long would laundry take? Pipelining
2 AM 12 6 PM 1 8 7 11 10 9 Time T a s k O r d e r 30 30 30 30 30 30 30 A B C D Pipelined Laundry • Pipelined laundry takes 3.5 hours for 4 loads! • Steady state: 1 load every 30 min • Potential speedup = 2 hr/30 min = 4 (# stages) • Time to fill pipeline takes 2 hours speedup Pipelining
2 AM 12 6 PM 1 8 7 11 10 9 T a s k O r d e r Time 30 30 30 30 30 30 30 30 30 30 30 A B C D What IF: Slow Dryer • Pipelined laundry now takes 5.5 hours! • Steady state: One load every 1 hr (dryer speed) • Pipeline rate is limited by the slowest stage Pipelining
2 AM 12 6 PM 1 8 7 11 10 9 T a s k O r d e r Time 30 30 30 30 30 30 30 30 A Dependency B Stall C D What IF: Dependency • Brian is using the laundry for the first time; he wants to see the outcome of one wash + dry cycle first before putting in his clothes • Pipelined laundry now takes 4 hours Pipelining
Pipelining Lessons Pipelining • Pipelining doesn’t help latencyof single task: • It helps the throughputof entire workload • Multiple tasks operating simultaneously using different resources • Possible delays: • Pipeline rate limited by slowest pipeline stage • Stall for dependencies ( more on this later )
Datapath and Control MIPS Pipeline Pipelining
MIPS Pipeline Stages (1/2) • FiveExecution Stages • IF: Instruction Fetch • ID: Instruction Decode and Register Read • EX: Execute an operation or calculate an address • MEM: Access an operand in data memory • WB: Write back the result into a register • Idea: • Each execution stage takes 1 clock cycle • General flow of data is from one stage to the next • Exceptions: • Update of PC and write back of register file – more about this later… Pipelining
MIPS Pipeline Stages (2/2) Pipelining
Time IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB IF ID EX MEM WB Program Flow (Instructions) IF ID EX MEM WB Pipelined Execution: Illustration Several instructions are in the pipeline simultaneously! Pipelining
MIPS Pipeline: Datapath(1/3) • Single-cycle implementation: • Update all state elements (PC, register file, data memory) at the end of a clock cycle • Pipelined implementation: • One cycle per pipeline stage • Data required for each stage needs to be stored separately (why?) Pipelining
MIPS Pipeline: Datapath(2/3) • Data used by subsequent instructions: • Store in programmer-visible state elements: PC, register file and memory • Data used by same instruction in later pipeline stages: • Additional registers in datapath called pipeline registers • IF/ID: register between IF and ID • ID/EX: register between ID and EX • EX/MEM: register between EX and MEM • MEM/WB: register between MEM and WB • Why no register at the end of WB stage? Pipelining
MIPS Pipeline: Datapath(3/3) Coloured rectangles are pipeline registers Pipelining
Pipeline Datapath: IF Stage • At the end of a cycle, IF/ID receives (stores): • Instruction read from DataMemory[ PC ] • PC + 4 • PC + 4 • Also connected to one of the MUX's inputs (another coming later) Pipelining
Pipeline Datapath: IDStage Pipelining
Pipeline Datapath: EX Stage Pipelining
Pipeline Datapath: MEMStage Pipelining
Pipeline Datapath: WB Stage Pipelining
Corrected Datapath (1/2) • Observe the “Write register” number • Supplied by the IF/ID pipeline register • It is NOT the correct write register for the instruction now in WB stage! • Solution: • Pass “Write register” number from ID/EX through EX/MEM to MEM/WB pipeline register for use in WB stage • i.e. let the "Write register" number follows the instruction through the pipeline until it is needed in WB stage Pipelining
Corrected Datapath (2/2) Pipelining
Pipeline Control: Main Idea • Same control signals as single-cycle datapath • Difference: Each control signal belongs to a particular pipeline stage Pipelining
Pipeline Control: Try it! Try associating each control signal with the correct pipeline stage. Pipelining
Pipeline Control: Grouping Pipelining Group control signals according to pipeline stage
Pipeline Control: Implementation Signal used in EX stage and onwards Signal generated in ID stage Signal propagated along until utilized Pipelining
MIPS Pipeline: Datapath and Control Pipelining
Exercise: Pipeline Control Signals • Indicate the control signals utilized in each cycle for the following sequence of instructions: add, sub, lw, sw Pipelining
pipeline performance Pipelining
Single Cycle Processor: Performance Pipelining • Cycle time: • = • = Time for operation in stage k • N = Number of stages • Total Execution Time for I instructions: • = =
Single Cycle Processor: Example • Cycle Time • Choose the longest total time = 8ns • To execute 100 instructions: • 100 x 8ns = 800ns Pipelining
Multi-Cycle Processor: Performance Pipelining • Cycle time: • = • Total Execution Time for I instructions: • = = • Average CPI is needed because each instruction take different number of cycles to finish
Multi-Cycle Processor: Example • Cycle Time: • Choose the longest stage time = 2ns • To execute 100 instructions, with a given average CPI of 4.6 • 100 x 4.6 x 2ns = 920ns Pipelining
Pipeline Processor: Performance Pipelining • Cycle Time: • = • = Overhead for pipelining, e.g. pipeline register • Cycles needed for I instructions: • Total Time needed for I instructions : • = =
Pipeline Processor: Example • Cycle Time • assume pipeline register delay of 0.5ns • longest stage time + overhead = 2 + 0.5 = 2.5ns • To execute 100 instructions: • (100 + 5 – 1) x 2.5ns = 260ns Pipelining
Pipeline Processor: Ideal Speedup Pipelining • Assumptions for ideal case: • Every stage takes the same amount of time: = • No pipeline overhead • Number of instructions I, is much larger than number of stages, N • Note: The above also shows how pipeline processor loses performance
Pipeline Processor: Ideal Speedup Conclusion: Pipeline processor can gain N times speedup, where N is the number of pipeline stages Pipelining = =
Cycle 1 Cycle 2 Clk Load Store Waste Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9 Cycle 10 Clk IF ID EX MEM WB IF ID EX MEM IF Load Store R-type IF ID EX MEM WB Load Store IF ID EX MEM WB R-type IF ID EX MEM WB Summary for different implementations Single-Cycle Multi-Cycle Pipeline Pipelining
Review Question • Given this code: add $t0, $s0, $s1 sub $t1, $s0, $s1 sll $t2, $s0, 2 srl $t3, $s1, 2 How many cycles will it take to execute the code on a single-cycle datapath? How long will it take to execute the code on a single-cycle datapath, assuming a 100 MHz clock? How many cycles will it take to execute the code on a 5-stage MIPS pipeline? How long will it take to execute the code on a 5-stage MIPS pipeline? Assume pipeline is completely balanced and the single-cycle datapath could support 100MHz clock frequency. Pipelining
Grinding to a halt….. pipeline hazards Pipelining
Pipeline Hazards • Speedup from pipeline implementation: • Based on the assumption that a new instructions can be "pumped" into pipeline every cycle • However, there are pipeline hazards • Problems that prevent next instruction from immediately following previous instruction • Structural hazards: • Simultaneous use of a hardware resource • Data hazards: • Data dependencies between instructions • Control hazards: • Change in program flow Instruction Dependencies Pipelining
Graphical Notation for Pipeline Pipelining Horizontal = the stages of an instruction Vertical = the instructions in different pipeline stages
Mem ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU ALU Structure Hazard: Example Time (clock cycles) Load and Inst3 access memory in the same cycle! Reg Mem Reg Load Inst 1 Instruction Order Inst 2 Mem Mem Reg Reg Inst 3 Pipelining If there is only a single memory module:
Mem ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU Bubble Bubble Bubble Bubble Solution 1: Stall the Pipeline Time (clock cycles) Delay (Stall) Inst 3 for 1 cycle Reg Mem Load Reg Inst 1 Inst 2 Instruction Order Stall Bubble Inst 3 Pipelining
Mem ALU Mem Mem Reg Reg ALU Mem Mem Reg Reg ALU ALU Solution 2: Separate Memory Time (clock cycles) Load uses Data Memory Reg Mem Reg Load Inst 1 Instruction Order Inst 2 Mem Mem Reg Reg Inst 3 Inst3 uses Instr. Memory Pipelining • Split memory into: • Data and Instruction memory
ALU Mem Mem ALU Mem Reg Reg Reg ALU Mem Reg Mem ALU Quiz: Is there another conflict? Time (clock cycles) Inst 0 Mem Reg Reg Inst 1 Instruction Order Inst 2 Inst 3 Mem Reg Mem Reg Pipelining
Instruction Dependencies Pipelining • Instructions can have relationship that prevent pipeline execution: • Although a partial overlap maybe possible in some cases • When different instructions accesses (read/write) the same register • Register contention is the cause of dependency • Known as data dependency • When the execution of an instruction depends on another instruction • Control flow is the cause of dependency • Known as control dependency • Failure to handle dependencies can affect program correctness!
Data Dependency: RAW i1:add $1, $2, $3 #writes to $1 i2: sub $4, $1, $5 #reads from $1 • Effect of incorrect execution: • If i2 reads register $1 before i1 can write back the result, i2 will get a stale result (old result) Pipelining • "Read-After-Write" Definition: • Occurs when a later instruction reads from the destination register written by an earlier instruction • Also known as true data dependency