Chapter Six Pipelining

Chapter SixPipelining

Pipelining: analogia com linha de produção  tempo • Tempo de fabricação de um carro: C+M+C+P+A • Taxa de produção (carros/h): medido na saída • throughput: depende do número de estágios e do balanceamento • Objetivo do pipeline: • distribuir e balancear o tempo em cada estágio • otimizar o uso de HW dos estágios (taxa de ocupação) • resumo: aumentar a velocidade e diminuir o hardware

P r o g r a m 2 4 6 8 1 0 1 2 1 4 1 6 1 8 e x e c u t i o n T i m e o r d e r ( i n i n s t r u c t i o n s ) I n s t r u c t i o n D a t a l w $ 1 , 1 0 0 ( $ 0 ) R e g A L U R e g f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 2 , 2 0 0 ( $ 0 ) R e g A L U R e g 8 n s f e t c h a c c e s s I n s t r u c t i o n l w $ 3 , 3 0 0 ( $ 0 ) 8 n s f e t c h . . . 8 n s P r o g r a m 1 4 2 4 6 8 1 0 1 2 e x e c u t i o n T i m e o r d e r ( i n i n s t r u c t i o n s ) I n s t r u c t i o n D a t a l w $ 1 , 1 0 0 ( $ 0 ) R e g A L U R e g f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 2 , 2 0 0 ( $ 0 ) R e g A L U R e g 2 n s f e t c h a c c e s s I n s t r u c t i o n D a t a l w $ 3 , 3 0 0 ( $ 0 ) 2 n s R e g A L U R e g f e t c h a c c e s s 2 n s 2 n s 2 n s 2 n s 2 n s Pipelining • Improve perfomance by increasing instruction throughput • tempo de execução de uma instrução: 8 ns e 9 ns (com ou sem pipe) • throughput: 1 instr / 8ns (sem pipeline) ou 1 instr / 2 ns (com) • # de estágios = 5; ganho de 4:1; para obter 5:1 ??

Pipelining • What makes it easy • all instructions are the same length • just a few instruction formats • memory operands appear only in loads and stores • What makes it hard? • structural hazards: suppose we had only one memory • control hazards: need to worry about branch instructions • data hazards: an instruction depends on a previous instruction • We’ll build a simple pipeline and look at these issues • We’ll talk about modern processors and what really makes it hard: • exception handling • trying to improve performance with out-of-order execution, etc.

I F : I n s t r u c t i o n f e t c h I D : I n s t r u c t i o n d e c o d e / E X : E x e c u t e / M E M : M e m o r y a c c e s s W B : W r i t e b a c k r e g i s t e r f i l e r e a d a d d r e s s c a l c u l a t i o n 0 M u x 1 A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d r e g i s t e r 1 A d d r e s s P C R e a d d a t a 1 R e a d Z e r o r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U 0 R e a d W r i t e d a t a 2 r e s u l t A d d r e s s 1 d a t a r e g i s t e r M I n s t r u c t i o n M u D a t a u m e m o r y W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Basic Idea • What do we need to add to actually split the datapath into stages?

T i m e ( i n c l o c k c y c l e s ) P r o g r a m C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7 e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) A L U l w $ 1 , 1 0 0 ( $ 0 ) I M R e g D M R e g A L U l w $ 2 , 2 0 0 ( $ 0 ) I M R e g D M R e g A L U l w $ 3 , 3 0 0 ( $ 0 ) I M R e g D M R e g Uma representação para o pipeline • Hachurado representa “atividade” • Representação alternativa: imaginando que cada instrução tenha a sua própria via de dados • Outra representação: foto no tempo

0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 n R e a d o i t r e g i s t e r 1 c A d d r e s s P C R e a d u r t d a t a 1 s R e a d n Z e r o I r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Pipelined Datapath • O que é produzido em cada estágio é armazenado em um registrador de pipeline para uso pelo próximo estágio • Problemas com o endereço do registrador de escrita?? Caminhando para esquerda?

0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 n R e a d o i t r e g i s t e r 1 c A d d r e s s P C R e a d u r t d a t a 1 s R e a d n Z e r o I r e g i s t e r 2 I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 r e s u l t 1 d a t a r e g i s t e r M M D a t a u u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Corrected Datapath

Exemplo com sub e lw (1)

T i m e ( i n c l o c k c y c l e s ) P r o g r a m C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 e x e c u t i o n o r d e r ( i n i n s t r u c t i o n s ) l w $ 1 0 , 2 0 ( $ 1 ) I M R e g A L U D M R e g s u b $ 1 1 , $ 2 , $ 3 I M R e g D M R e g A L U Graphically Representing Pipelines • Can help with answering questions like: • how many cycles does it take to execute this code? • what is the ALU doing during cycle 4? • use this representation to help understand datapaths

Pipeline Control

Pipeline control • We have 5 stages. What needs to be controlled in each stage? • Instruction Fetch and PC Increment • Instruction Decode / Register Fetch • Execution • Memory Stage • Write Back • How would control be handled in an automobile plant? • a fancy control center telling everyone what to do? • should we use a finite state machine?

Pipeline Control • Pass control signals along just like the data

Datapath with Control

Dependencies • Problem with starting next instruction before first is finished • dependencies that “go backward in time” are data hazards

Software Solution • Have compiler guarantee no hazards • Where do we insert the “nops” ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) • Problem: this really slows us down!

what if this $2 was $13? Forwarding • Use temporary results, don’t wait for them to be written • register file forwarding to handle read/write to same register • ALU forwarding

Forwarding

Can't always forward • Load word can still cause a hazard: • an instruction tries to read a register following a load instruction that writes to the same register. • Thus, we need a hazard detection unit to “stall” the load instruction

Stalling • We can stall the pipeline by keeping an instruction in the same stage

Hazard Detection Unit • Stall by letting an instruction that won’t write anything go forward

Branch Hazards • When we decide to branch, other instructions are in the pipeline! • We are predicting “branch not taken” • need to add hardware for flushing instructions if we are wrong

Flushing Instructions

Improving Performance • Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) • Add a “branch delay slot” • the next instruction after a branch is always executed • rely on compiler to “fill” the slot with something useful • Superscalar: start more than one instruction in the same cycle

Dynamic Scheduling • The hardware performs the “scheduling” • hardware tries to find instructions to execute • out of order execution is possible • speculative execution and dynamic branch prediction • All modern processors are very complicated • DEC Alpha 21264: 9 stage pipeline, 6 instruction issue • PowerPC and Pentium: branch history table • Compiler technology important • This class has given you the background you need to learn more • Video: An Overview of Intel’s Pentium Processor (available from University Video Communications)

Chapter Six Pipelining

Chapter Six Pipelining

Presentation Transcript

CHAPTER SIX

Chapter Six Pipelining

CHAPTER SIX

Chapter 3 Pipelining

Chapter 8. Pipelining

Chapter Six

CHAPTER SIX

Chapter Six

Chapter 8. Pipelining

Pipelining (Chapter 8)

Chapter Six

Pipelining (Chapter 8)

Chapter Six

Chapter 6: Pipelining

Pipelining Chapter 6

Chapter Six

Chapter 8. Pipelining

Chapter 3 Pipelining

Chapter Six Enhancing Performance with Pipelining

Chapter Six