1 / 48

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank. (New) Competency Area 6: Introduction to Pipelining. Basic Pipelining Concepts. P&H 3 rd ed., Chapter 6 H&P 3 rd ed. § A.1. Pipelining - The Basic Concept.

lona
Download Presentation

Computer Architecture Lecture Notes Spring 2005 Dr. Michael P. Frank

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture Lecture Notes Spring 2005Dr. Michael P. Frank (New) Competency Area 6: Introduction to Pipelining

  2. Basic Pipelining Concepts P&H 3rd ed., Chapter 6 H&P 3rd ed. §A.1

  3. Pipelining - The Basic Concept • In early CPUs, deep combinational logic networks were used in between state updates. • Signal delays may vary widely across different paths. • New input cannot be provided to the network until the slowest paths have finished. • Slow clock speed, slow overall processing rates. • In pipelined design, deep logic networks are subdivided into relatively shallow slices (pipeline stages). • Delays through the network are made uniform. • A new input can be provided to each slice as soon as its quick, shallow network has finished. • Multiple inputs are processed simultaneously across stages. • Clock cycle is only as long as the slowest pipeline stage.

  4. Generic Pipelining Illustration • Let represent any of a variety of logic gates • Initial, non-pipelined design for some random block of complex logic: latch latch

  5. Pipelining Illustration cont. • Aggressively pipelined version of same logic: • Insert extra “pipeline registers” periodically • Here, after every 1-2 logic layers • This design can process 5x as much data at once! latch latch

  6. Another View of Pipelining • Space-time diagrams: • Here, each colored area shows which parts of the logic network are occupied with data computed from a given input item, at which times. Depth in logic network Depth in logic network Data 1 Time Time Data 2 Pipelined (depth 6) Non-Pipelined

  7. Simple Multicycle RISC Datapath IF ID EX MEM WB Next PC Loadfr. Mem.Data ProgramCounter Inst.Reg.

  8. Basic RISC Execution Pipeline • Basic idea of instruction-execution pipelining: • Each instruction spends 1 clock cycle in each of the execution stages (in our example, there are 5). •  during 1 clock cycle, the pipeline can be processing (different stages of) 5 different instructions simultaneously! stage time

  9. Different Visualizations Same Time,Different Places Same instruction, different steps Same Time,DifferentData Item /Instruction Same Time, Different Places Skew Same Place, Different Times Same Place, Different Times

  10. More Graphical Detail

  11. Adding Pipeline Registers

  12. Description of Pipe Stages

  13. Dependences (from H&P 3rd ed. §3.1)

  14. Dependences • A dependence is a way in which one instruction can depend on (be impacted by) another for scheduling purposes. • Three major dependence types: • Data dependence • Name dependence • Control dependence • I’ll sometimes use the word dependency for a particular instance of one instruction depending on another. • The instructions can’t be effectively (as opposed to just syntactically) fully parallelized, or reordered.

  15. Data Dependence • Recursive definition: • Instruction B is data dependent on instruction A iff: • B uses a data result produced by instruction A, or • There is another instruction C such that B is data dependent on C, and C is data dependent on A. • When a data dependence is present, there is a potential RAW hazard. Loop: LD F0,0(R1) ADDD F4,F0,F2 SD 0(R1),F4 SUBI R1,R1,#8 BNEZ R1,Loop A A B C B Direct data dependenciesin a simple examplecode fragment

  16. Name Dependence • When two instructions access the same data storage location, but are not data dependent. • Also, at least one of the accesses must be a write. • Two sub-types (for inst. B after inst. A): • Antidependence: A reads, then B writes. • Potential for aWARhazard. • Output dependence: A writes, then B writes. • Potential for aWAWhazard. • Note: Name dependencies can be avoided by changing instructions to use different locations • (Rather than reusing 1 location for 2 purposes.) • This fix is called renaming. A time B A time B

  17. Control Dependence • Occurs when the execution of an instruction (as in, will it be executed, or not?) depends on the outcome of some earlier, conditional branch instruction. • We generally can’t easily change which branches an instruction depends on w/o ruining the program’s functional behavior. • However, there are exceptions.

  18. Hazards, Stalls, & Forwarding H&P 3rd ed. §A.2-3

  19. Hazards • Hazards are circumstances which may lead to stalls in the pipeline if not addressed. • Stalls are delays, and may be called “bubbles” • There are three major types of hazards: • Structural hazards: • Not enough HW resources to keep all instrs. moving. • Data hazards • Data results of earlier instrs. not yet avail. when needed. • Control hazards • Control decisions resulting from earlier instrs. (branches) not yet made; don’t know which new instrs. to execute.

  20. Structural Hazard Example Suppose you had a combined instruction+data memory w. only 1 read port

  21. Hazards Produce “Bubbles” Bubble rises Progress through pipe Time Unskew

  22. Textual View A pipeline stalled for a structural hazard – a load with only one memory port

  23. Example Data Hazards

  24. Forwarding for Data Hazards

  25. Another Forwarding Example

  26. Three Types of Data Hazards • Let i be an earlier instruction, j a later one. • RAW (read after write) • j is supposed to Read a value After iWrites it, • But instead j tries to read the value before i has written it • WAW (write after write) • j should Write to a given place After iWrites there, • But they end up writing in the wrong order. • Only occurs if >1 pipeline stage can write. • WAR (write after read) • j should Write a new value After iReads the old, • But instead j writes the new value before i has read the old one. • Only occurs if writes can happen before reads in pipeline.

  27. An Unavoidable Stall

  28. Stalling in midst of instruction

  29. Data Hazard Prevention • A clever compiler can often reschedule instructions to avoid a stall. • A simple example: • Original code:lw r2, 0(r4) add r1, r2, r3 Note: Stall happens here!lw r5, 4(r4) • Transformed code:lw r2, 0(r4) lw r5, 4(r4) add r1, r2, r3 No stall needed!

  30. Simple RISC Pipeline Stall Statistics Note that ~1 in 5loads causes a stallin many programs! Percentageof loads thatcause a stall Benchmark

  31. Data Hazard Detection

  32. Hazard Detection Logic • Example: Detecting whether an instruction that has just been fetched needs to be stalled 1 cycle because of an immediately preceding load. IF/ID ID/EX EX/ME ME/WB IF ID EX ME WB IF/ID

  33. Forwarding Situations in DLX

  34. Implementing Forwarding in HW

  35. Control Hazards, Branch Prediction, Delayed Branches H&P 3rd ed., §§A.2-3 & §4.2

  36. Control Hazards • Suppose the new PC value was not computed until the MEM stage (like orig. RISC design). • Then we must stall 3 clocks after every branch!

  37. Early Branch Resolution

  38. New Pipeline Logic

  39. Control Instruction Statistics • ~10% of dynamic insts.are fwd. cond. branches • only ~3% are backwardscond. branches • similar percentage areunconditional branches`

  40. Stats on Taken Branches ~67% of cond.branches aretaken

  41. Predict-Not-Taken

  42. Delayed Branches Machine code sequence: Branch instruction Delay slot instruction(s) Post-branch instructions Branch is taken(if taken) at this point

  43. Filling the Branch-Delay Slot

  44. Static Branch Prediction • Earlier we discussed predict-taken, predict-not-taken static prediction strategies • Applied uniformly across all branches in program • Static analysis in compiler may be able to do better, if it can non-uniformly predict whether each specific branch is likely to be taken or not • One way: Backwards taken, forwards not taken. • If we can do better, it can help with static code scheduling to reduce data hazard stalls… • Also may assist later dynamic prediction

  45. Prediction Helps Static Scheduling LD R1,0(R2) DSUBU R1,R1,R3 BEQZ R1,else OR R4,R5,R6 DADDU R10,R4,E3 J after else: DADDU R7,R8,R9 … after: Some data dependences Codemovementsto consider: Potential load delay to fill Which way will thisbranch go? Ifcase If-then-elsecontrol flow Elsecase

  46. Some Static Prediction Schemes • Always predict taken • 34% mispredict rate on SPEC (range 9%-54%) • Backwards predict taken, forwards not taken • In SPEC, more than ½ of forwards are taken! • This does worse than “always predict taken” strategy • Usu. not better than 30-40% misprediction rate • Better than either: Use profile information! • Collect statistics on earlier program runs. • Works well because individual branches tend to be strongly biased (taken or not) given average data • Bias tends to remain stable across multiple runs

  47. Profile-Based Predictor Statistics Floating-Point

  48. Predict-Taken vs. Profile-Based Instructions executed in between mispredictions Floating-point (Logscale!)

More Related