1 / 21

Computer Architecture Pipelines & Superscalars

Computer Architecture Pipelines & Superscalars. Pipelines. Data Hazards Code: lw $4, 0($1) add $15, $1, $1 sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15,100($2) The last four instructions all depend on a result produced by the first!. MIPS instructions

zamir
Download Presentation

Computer Architecture Pipelines & Superscalars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer Architecture Pipelines & Superscalars

  2. Pipelines • Data Hazards • Code: • lw $4, 0($1)add $15, $1, $1sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) The last four instructions all depend on a result produced by the first! MIPS instructions have the format op dest, srca, srcb

  3. Pipelines - Data hazards • Examine the pipeline(ignore first 2!) • r2 onlyupdatedin timefor add!

  4. Pipelines - Data Hazards • Compilersolution • InsertNOOPs • Inefficient!

  5. Pipelines - Data Hazards • Second compiler solution • Reorder Read Written lw $4, 0($1)add $15, $1, $1sub $2, $1, $3and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) sub $2, $1, $3lw $4, 0($1)add $15, $1, $1 and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) These two must not define $1 or $3!

  6. Pipelines - Data Hazards • Second compiler solution • Reorder Read Written sub $2, $1, $3lw $4, 0($1)add $15, $1, $1 and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) First use of $2

  7. Pipelines - Data Hazards • Compiler analyses dependencies • Registerdefinitions • Registeruse • Read After Write(RAW)dependency • No dependencies • Instruction can be moved! Written sub $2, $1, $3lw $4, 0($1)add $15, $1, $1 and $12, $2, $5or $13, $6, $2add $14, $2, $2sw $15,100($2) Uses of $2

  8. Pipelines - Data Hazards • Hardware solution • Value forwarding • Hardware detectsdependency • scoreboard • Forwards resultfrom WB to EXfor subsequentuse • Hardware • Transparent to software!

  9. Data Hazards - classification • Read after Write (RAW) • Instruction 1 must write before instruction 2 reads • Write after Write (WAW) • Instructions 1 and 2 both writeInstruction 2 must write after 1 • Write after Read (WAR) • Instruction 1 readsInstruction 2 writes (overwrites) • Instruction 2 must not write before 1 reads Reordering algorithms must consider all three!

  10. Lecture 5 - Key Points • Data Hazards • RAW - most common • WAW • WAR • Compiler looks for dependencies • then re-orders • Hardware • Scoreboard • Monitors dependencies • ensures correct operation • Value forwarding hardware • Forwards results from EX stage

  11. Pipelines - Exceptions • Caused by overflow, underflow • Example • add $1, $2, $1 • Overflow detected in EX stage • Causes jump to exception handler • as branch - remainder of pipeline flushed but • Compiler needs original $1 causing overflow • Register must not be overwritten • EX stage needs to squash WB operation • Precise Exception problem - more later!

  12. Pipelines - Depth • Pipeline can’t be too deep • Hazards are frequent • many stalls in deep pipelines Too Deep! 2.5 2.0 Relative Performance 1.5 1.0 0.5 1 2 4 8 16 Pipeline Depth

  13. Pipelines - Depth • Pipeline can’t be too deep • Hazards are frequent • many stalls in deep pipelines Too Deep! 2.5 2.0 Relative Performance Superpipelined 1.5 1.0 0.5 1 2 4 8 16 Pipeline Depth

  14. CISC and pipelines • High Speed CISC processors are pipelined • Overlap IF, EX • Variable • instruction length • running time (number of microcode cycles) • pipeline imbalance • “backup” in pipe stages • complicate hazard detection • Complex addressing modes • auto-increment updates address register • multiple memory accesses required • smooth pipeline flow more difficult!

  15. Instruction Queues • Vital performance determinant • Rate of instruction fetch • High Performance processors • Fetch multiple instructions in each cycle • 2 - 4 common • Use wide datapath to memory • PowerPC 604 128 bits = 4 instructions • Despatch unit • Examine dependencies • Determine which instructions can be despatched

  16. Instruction Queues • Q “matches” fetch/despatch rates • General Strategy for matchingProducers - Consumers • Use of FIFO-style Queues • Absorb AsynchronousDelivery / ConsumptionRates • ProvidesElasticityin pipelines Producer Differing Instantaneous Rates FIFO Consumer

  17. Superscalar Processors

  18. Boundary of the Si die PowerPC organisation PowerPC 601 ~1993 • 3-way SuperScalar • Integer • Branch • Floating Point A newer machine will have more functional units here! New - Look in the “Example Processors” section of the Web notes

  19. Superscalar Processors • Multiple Functional Units • PowerPC 604 • 6-way superscalar • Despatch Unit • Sends “ready” instructions to all free units • PowerPC 604: • potential 4 instructions/cycle (pipeline lengths are different!) • reality: 2-3 instructions/cycle?(program dependent!) Branch Unit LoadStore Unit 3 Integer Units Floating Point Unit

  20. Superscalar Processors • Mix of functional units • Up to 8-way superscalar common now • 2 Floating point units • Usually have ~3 cycle latency • 3 Integer Arithmetic • Branch unit • Load / store unit • + ….? • Marketing departments can play some games with the ‘n’ of a n-way superscalar!

  21. Superscalar – Maximum throughput • Instruction Issue Unit is the key! • If IIU only issues 4 instructions per cycle, • An n-way superscalar (n>> 4) can still only complete 4 instructions / cycle! • IIU has many tasks • Pre-fetch instructions • At least one cache line! • Check dependencies • Has data required by this instruction been computed yet? • Keeps register ‘scoreboard’ • Mark registers which will be written by instructions already issued • It’s a small dataflow machine (see later!) • Check availability of functional units

More Related