580 likes | 728 Views
2010 R&E Computer System Education & Research. Lecture 9. MIPS Processor Design – Pipelined Processor Design #2. Prof. Taeweon Suh Computer Science Education Korea University. Pipelined Datapath. 0. M. u. x. 1. I. F. /. I. D. I. D. /. E. X. E. X. /. M. E. M. M. E. M.
E N D
2010 R&E Computer System Education & Research Lecture 9. MIPS Processor Design – Pipelined Processor Design #2 Prof. Taeweon Suh Computer Science Education Korea University
0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Instruction Fetch (IF) Instruction fetch
0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Instruction Decode (ID) Instruction decode
0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Execution (EX) Execution
0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Memory (MEM) Memory
0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for lw instruction: Writeback (WB) Writeback
0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for sw instruction: Memory (MEM) Memory
0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d Example for sw instruction: Writeback (WB): do nothing Writeback
Corrected Datapath (for lw) 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 r e s u l t 1 d a t a r e g i s t e r M M D a t a u u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d
Pipelining Example add $14, $5, $6 lw $13, 24($1) add $12, $3, $4 sub $11, $2, $3 lw $10, 20($1) 0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d
Pipeline Control Note that in this implementation, branch instruction decides whether to branch in the MEM stage
Pipeline Control • We have 5 stages • IF, ID, EX, MEM, WB • What needs to be controlled in each stage? • Instruction fetch and PC increment • Instruction decode / operand fetch • Execution stage • RegDst • ALUop[1:0] • ALUSrc • Memory stage • Branch • MemRead • MemWrite • Writeback • MemtoReg • RegWrite (note that this signal is in ID stage)
Pipeline Control • Extend pipeline registers to include control information (created in ID) • Pass control signals along just like the data
IF: lw $10, 9($1) P C S r c I D / E X 0 M W B u E X / M E M x 1 C o n t r o l M W B M E M / W B E X M W B I F / I D A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control
IF: sub $11, $2, $3 ID: lw $10, 9($1) P C S r c I D / E X 0 11 M W B u E X / M E M “lw” x 010 1 C o n t r o l M W B M E M / W B 0001 E X M W B I F / I D A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control
IF: and $12, $4, $5 ID: sub $11, $2, $3 EX: lw $10, 9($1) P C S r c I D / E X 0 11 10 M W B u E X / M E M “sub” x 010 000 1 C o n t r o l M W B 0 M E M / W B 1100 00 E X M W B I F / I D 1 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control
IF: or $13, $6, $7 ID: and $12, $4, $5 EX: sub $11, $2, $3 MEM: lw $10, 9($1) P C S r c I D / E X 0 10 10 M W B u E X / M E M “and” x 000 000 11 1 C o n t r o l M W B 0 1 M E M / W B 1100 1 10 E X M W B 0 I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control
IF: add $14, $8, $9 ID: or $13, $6, $7 EX: and $12, $4, $5 MEM: sub $11, .. WB: lw $10, 9($1) P C S r c I D / E X 0 10 10 M W B u E X / M E M “or” x 000 000 10 1 C o n t r o l M W B 0 1 1 M E M / W B 1100 0 10 E X M W B 0 I F / I D 0 1 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control
IF: xxxx ID: add $14, $8, $9 EX: or $13, $6, $7 MEM: and $12… WB: sub $11, .. P C S r c I D / E X 0 10 10 M W B u E X / M E M “add” x 000 000 10 1 C o n t r o l M W B 0 1 1 M E M / W B 1100 0 10 E X M W B 0 I F / I D 0 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control
IF: xxxx ID: xxxx EX: add $14, $8, $9 MEM: or $13, .. WB: and $12… P C S r c 0 10 M I D / E X u E X / M E M x 000 10 W B 1 C o n t r o l W B 0 1 1 M E M / W B M 0 10 M W B 0 I F / I D 0 0 E X A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control
IF: xxxx ID: xxxx EX: xxxx MEM: add $14, .. P C S r c I D / E X 0 M W B u E X / M E M x 10 1 M C o n t r o l W B 0 1 M E M / W B 0 E X M W B 0 I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control WB: or $13…
IF: xxxx ID: xxxx EX: xxxx MEM: xxxx WB: add $14.. P C S r c I D / E X 0 M W B u E X / M E M x 1 M C o n t r o l W B 1 M E M / W B E X M W B I F / I D 0 A d d A d d 4 A d d r e s u l t e t i r B r a n c h W S h i f t g e l e f t 2 e t i R r A L U S r c W m g R e a d n e e o i r e g i s t e r 1 M R A d d r e s s P C t R e a d c o t u d a t a 1 r m t R e a d s e Z e r o n r e g i s t e r 2 M I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e d a t a 2 A d d r e s s r e s u l t 1 d a t a r e g i s t e r M M D a t a u u m e m o r y W r i t e x x d a t a 1 0 W r i t e d a t a I n s t r u c t i o n 1 6 3 2 6 [ 1 5 – 0 ] S i g n A L U M e m R e a d e x t e n d c o n t r o l I n s t r u c t i o n [ 2 0 – 1 6 ] 0 A L U O p M u I n s t r u c t i o n x [ 1 5 – 1 1 ] 1 R e g D s t Datapath with Control
Dependencies • Dependencies • Problem with starting (or executing) next instruction before first is finished • Dependencies incur data and control hazards
Data Hazard - Software Solution • Data hazards • Dependencies that “go backward in time” • Have compiler guarantee no hazards? • Insert nop (no operation) instructions (“0x00000000” is nop in MIPS) • Code scheduling • Where do we insert the “nops” ? sub $2, $1, $3 and $12, $2, $5 or $13, $6, $2 add $14, $2, $2 sw $15, 100($2) • Problem? • This really slows us down!
R e g s u b $ 2 , $ 1 , $ 3 I M R e g D M stall stall stall I I I M M M a n d $ 1 2 , $ 2 , $ 5 I M D M R e g R e g I M D M R e g o r $ 1 3 , $ 6 , $ 2 R e g a d d $ 1 4 , $ 2 , $ 2 I M D M R e g R e g s w $ 1 5 , 1 0 0 ( $ 2 ) I M D M R e g R e g Data Hazard - Pipeline Stalls? bubble
Data Hazard - Forwarding • Use temporary results, don’t wait for them to be written • Register file forwarding to handle read/write to same register • ALU forwarding Ok.. Then, do we have to do this forwarding? • If you are asked to design CPU using only rising-edge of the clock, then? • Let’s stick to this for our project • If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? • Our textbook follows this
Forwarding (simplified) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX
MUX MUX Forwarding (from EX/MEM) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX
MUX MUX Forwarding (from MEM/WB) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX
MUX MUX Forwarding (operand selection) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX Forwarding Unit
MUX MUX MUX Forwarding (operand propagation) ID/EX EX/MEM MEM/WB Register File Data Memory ALU MUX Rd Rt EX/MEM Rd Forwarding Unit Rt Rs MEM/WB Rd
I D / E X W B E X / M E M M W B C o n t r o l M E M / W B E X M W B I F / I D M n o u i t c x u r t R e g i s t e r s s n D a t a I I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x I F / I D . R e g i s t e r R s R s I F / I D . R e g i s t e r R t R t I F / I D . R e g i s t e r R t R t M E X / M E M . R e g i s t e r R d u I F / I D . R e g i s t e r R d R d x F o r w a r d i n g M E M / W B . R e g i s t e r R d u n i t Forwarding
Can't always forward • lw (load word) can still cause a hazard • An instruction tries to read a register following a load instruction that writes to the same register • Thus, we need a hazard detection unit to “stall” the pipeline after the load instruction
Stalling • We can stall the pipeline by keeping an instruction in the same stage ID ID IF IF
Hazard Detection Unit • Stall by letting an instruction that won’t write anything go forward • Stall the pipeline if both ID/EX is a load and (rt=IF/ID.rs or rt=IF/ID.rt)
Control Hazards - Branch • When we decide to branch, other instructions are in the pipeline! • Assume: branch is not taken • When this assumption failed, flush 3 instructions • We are predicting “branch not taken” • need to add hardware for flushing instructions if we are wrong
Alleviate Branch Hazards • Move branch compare to ID stage of the pipeline • Add adder to calculate branch target in ID stage • Add IF.flush signal that zeros the instruction (or squash) in IF/ID pipeline register • Reduce penalty to 1 cycle Taken target address is known here Actual condition is generated here MEM MEM IF IF ID ID EX EX WB WB beq $1,$2,L1 Bubblee add $1,$2,$3 … MEM IF ID EX WB L1: sub $1,$2, $3
Flushing Instructions I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t
Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) beq $1, $3, L2 and $12, $2, $5 I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t
Flushing Instructions (cycle N) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) beq $1, $3, L2 and $12, $2, $5 I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C L2 m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t
Flushing Instructions (cycle N+1) beq $1, $3, L2 and $12, $2, $5 or $13, $12, $1 … L2: lw $4, 40($7) lw $4, 40($7) beq $1, $3, L2 nop I F . F l u s h H a z a r d d e t e c t i o n u n i t I D / E X M u x W B E X / M E M M u C o n t r o l M W B M E M / W B x 0 E X M W B I F / I D 4 S h i f t l e f t 2 M u x = R e g i s t e r s D a t a I n s t r u c t i o n A L U P C m e m o r y M m e m o r y u x M u x S i g n e x t e n d M u x F o r w a r d i n g u n i t
Improving Performance • Try and avoid stalls! E.g., reorder these instructions: lw $t0, 0($t1) lw $t2, 4($t1) sw $t2, 0($t1) sw $t0, 4($t1) • Add a “branch delay slot” • The next instruction after a branch is always executed • Rely on compiler to “fill” the slot with something useful • Superscalar • Start more than one instruction in the same cycle • Most all processors are now pipelined and Superscalar
Dynamic Scheduling • The hardware performs the “scheduling” • Hardware tries to find instructions to execute • Out of order (OOO) execution is possible • Speculative execution and dynamic branch prediction • All modern processors are very complicated • DEC Alpha 21264: 9 stage pipeline, 6 instruction issue • PowerPC and Pentium: branch history table • Compiler technology is important • This class has given you the background you need to learn more
Exceptions & Interrupts • CPU has to prepare for all possible situations it could face • “Unexpected” events require change in flow of control • Exceptions arise within the CPU • Undefined opcode • Arithmetic overflow in MIPS • Some other architectures (such as x86 and ARM) do not generate exception on arithmetic overflow. Instead, set bits of the flag register inside CPU • Interrupts are from external I/O devices • Keyboard, Mouse, Network card etc • Many architectures and authors do not distinguish between interrupts and exceptions • Often use the term “interrupt” to refer to both types of events
Pipelined Performance Example • Ideally CPI = 1 • But, need to handle stalling (cause by loads and branches) • SPECINT2000 benchmark: • 25% loads • 10% stores • 11% branches • 2% jumps • 52% R-type • Suppose • 40% of loads are used by next instruction • 25% of branches are mispredicted • What is the average CPI?
Pipelined Performance Example • SPECINT2000 benchmark: • 25% loads • 10% stores • 11% branches • 2% jumps • 52% R-type • If there is no stall in the pipelined MIPS, how would you calculate CPI? • Average CPI = (0.25) (1 CPI) + (0.10) (1 CPI) + (0.11) (1 CPI) + (0.02) (1 CPI) + (0.52) (1 CPI) = 1 • Suppose • 40% of loads are used by next instruction • 25% of branches are mispredicted • All jumps flush next instruction • What is the average CPI? • Load/Branch CPI = 1 when no stalling, 2 when stalling. Thus • CPIlw = 1 (0.6) + 2 (0.4) = 1.4 • CPIbeq = 1 (0.75) + 2 (0.25) = 1.25 • CPIjump = 2 (1) = 2 • Average CPI = (0.25)(1.4) + (0.1)(1) + (0.11)(1.25) + (0.02)(2) + (0.52)(1) = 1.15
Pipelined Performance • Critical path of the pipelined MIPS processor: Tc = max { tpcq + tmem + tsetup ,// IF stage 2(tRFread + tmux + teq + tAND + tmux + tsetup ) , // ID stage tpcq + tmux + tmux + tALU + tsetup ,// EX stage tpcq + tmemwrite + tsetup ,// MEM stage 2(tpcq + tmux + tRFwrite) // WB stage } Where does this “2” come from? • If you are asked to design CPU using only rising-edge of the clock, then? • Let’s stick to this for our project • If the register file write occurs in the first half of the clock, and read occurs in the 2nd half of the clock, then? • Our textbook follows this
Pipelined Performance Example Tc = 2(tRFread + tmux + teq + tAND + tmux + tsetup ) = 2[150 + 25 + 40 + 15 + 25 + 20] ps= 550 ps