980 likes | 996 Views
This presentation discusses the limitations of static instruction scheduling and explores the benefits of dynamic instruction scheduling. It also examines how hardware can overcome these limitations.
E N D
CS3014: Concurrent Systems Static & Dynamic Instruction Scheduling Slides originally developed by Drew Hilton, Amir Roth, Milo Martin and Joe Deviettiat University of Pennsylvania <number>
Instruction Scheduling & Limitations <number>
Instruction Scheduling <number>
Can Hardware Overcome These Limits? <number>
Example: In-Order Limitations #1 <number>
Example: In-Order Limitations #2 <number>
Out-of-Order to the Rescue <number>
Out-of-Order Pipeline Buffer of instructions Dispatch Rename Decode Commit Writeback Reg-read Fetch Execute Issue In-order front end Out-of-order execution In-order commit <number>
Out-of-Order Execution <number>
Dependence types <number>
Step #1: Register Renaming Time <number>
Out-of-order Pipeline Buffer of instructions Dispatch Rename Decode Commit Writeback Reg-read Fetch Execute Issue In-order front end Out-of-order execution Have unique register names Now put into out-of-order execution structures In-order commit <number>
Step #2: Dynamic Scheduling regfile I$ D$ insn buffer B P D S add p2,p3➜p4 and sub p2,p4➜p5 div p4,4➜p7 Time mul p2,p5➜p6 <number>
Dynamic Scheduling/Issue Algorithm <number>
Register Renaming <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 r1 p1 p6 r2 p2 p7 r3 p3 p8 r4 p4 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ r1 p1 p6 r2 p2 p7 r3 p3 p8 r4 p4 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 r1 p1 p6 r2 p2 p7 r3 p3 p8 r4 p4 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 r1 p1 r2 p2 p7 r3 p6 p8 r4 p4 p9 r5 p5 p10 Map table Free-list CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ r1 p1 r2 p2 p7 r3 p6 p8 r4 p4 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 r1 p1 r2 p2 p7 r3 p6 p8 r4 p4 p9 r5 p5 p10 Map table Free-list CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 r1 p1 r2 p2 r3 p6 p8 r4 p7 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ r1 p1 r2 p2 r3 p6 p8 r4 p7 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 r1 p1 r2 p2 r3 p6 p8 r4 p7 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 r1 p1 r2 p2 r3 p8 r4 p7 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 addi p8 + 1 ➜ r1 p1 r2 p2 r3 p8 r4 p7 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 addi p8 + 1 ➜ p9 r1 p1 r2 p2 r3 p8 r4 p7 p9 r5 p5 p10 Map table Free-list <number>
Renaming example xor r1 ^ r2 ➜ r3 add r3 + r4 ➜ r4 sub r5 - r2 ➜ r3 addi r3 + 1 ➜ r1 xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 addi p8 + 1 ➜ p9 r1 p9 r2 p2 r3 p8 r4 p7 r5 p5 p10 Map table Free-list CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
Out-of-order Pipeline Buffer of instructions (reorder buffer) Dispatch Rename Decode Commit Writeback Reg-read Fetch Execute Issue Have unique register names Now put into out-of-order execution structures <number>
Dispatch Insn Inp1 R Inp2 R Dst Bday Ready? <number>
Dispatch Steps <number>
Dispatch Example xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 addi p8 + 1 ➜ p9 Ready bits p1 y p2 y p3 y Issue Queue p4 y Insn Inp1 R Inp2 R Dst Bday p5 y p6 y p7 y p8 y p9 y <number>
Dispatch Example xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 addi p8 + 1 ➜ p9 Ready bits p1 y p2 y p3 y Issue Queue p4 y Insn Inp1 R Inp2 R Dst Bday p5 y xor p1 y p2 y p6 0 p6 n p7 y p8 y p9 y CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
Dispatch Example xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 addi p8 + 1 ➜ p9 Ready bits p1 y p2 y p3 y Issue Queue p4 y Insn Inp1 R Inp2 R Dst Bday p5 y xor p1 y p2 y p6 0 p6 n add p6 n p4 y p7 1 p7 n p8 y p9 y CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
Dispatch Example xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 addi p8 + 1 ➜ p9 Ready bits p1 y p2 y p3 y Issue Queue p4 y Insn Inp1 R Inp2 R Dst Bday p5 y xor p1 y p2 y p6 0 p6 n add p6 n p4 y p7 1 p7 n sub p5 y p2 y p8 2 p8 n p9 y CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
Dispatch Example xor p1 ^ p2 ➜ p6 add p6 + p4 ➜ p7 sub p5 - p2 ➜ p8 addi p8 + 1 ➜ p9 Ready bits p1 y p2 y p3 y Issue Queue p4 y Insn Inp1 R Inp2 R Dst Bday p5 y xor p1 y p2 y p6 0 p6 n add p6 n p4 y p7 1 p7 n sub p5 y p2 y p8 2 p8 n addi p8 n --- y p9 3 p9 n CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
Out-of-order pipeline Issue Reg-read Execute Writeback <number>
Dynamic Scheduling/Issue Algorithm CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
Issue = Select + Wakeup Insn Inp1 R Inp2 R Dst Bday xor p1 y p2 y p6 0 Ready! add p6 n p4 y p7 1 sub p5 y p2 y p8 2 Ready! addi p8 n --- y p9 3 <number>
Issue = Select + Wakeup Ready bits p1 y Insn Inp1 R Inp2 R Dst Bday p2 y xor p1 y p2 y p6 0 p3 y add p6 y p4 y p7 1 p4 y sub p5 y p2 y p8 2 p5 y addi p8 y --- y p9 3 p6 y p7 n p8 y p9 n <number>
Note: Content Addressable Memory <number>
Issue Insn Inp1 R Inp2 R Dst Bday add p6 y p4 y p7 1 addi p8 y --- y p9 3 CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
OOO execution (2-wide) p1 7 p2 3 p3 4 xor RDY add sub RDY addi p4 9 p5 6 p6 0 p7 0 p8 0 p9 0 CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
OOO execution (2-wide) p1 7 p2 3 xor p1^ p2 ➜ p6 p3 4 add RDY addi RDY p4 9 p5 6 p6 0 p7 0 sub p5 - p2 ➜ p8 p8 0 p9 0 CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
OOO execution (2-wide) p1 7 p2 3 xor 7^ 3 ➜ p6 add p6 +p4 ➜p7 p3 4 p4 9 p5 6 p6 0 p7 0 addi p8 +1 ➜ p9 sub 6 - 3 ➜ p8 p8 0 p9 0 CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
OOO execution (2-wide) p1 7 p2 3 4 ➜ p6 add p6 + 9 ➜ p7 p3 4 p4 9 p5 6 p6 0 p7 0 addi p8 +1 ➜ p9 3 ➜ p8 p8 0 p9 0 CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>
OOO execution (2-wide) p1 7 p2 3 13 ➜ p7 p3 4 p4 9 p5 6 p6 4 p7 0 p8 3 4 ➜ p9 p9 0 CIS 501: Comp. Arch. | Prof. Joe Devietti | Scheduling <number>