380 likes | 702 Views
CSL718 : Superscalar Processors. Issue and Despatch 23rd Jan, 2006. Early proposals/prototypes. Term Superscalar. Cheetah. America project(4). IBM. Multititan project(2). DEC. Match(2) Torch(4). Stanford U. SIMP(4) DSNS(4). Kyushu U.
E N D
CSL718 : Superscalar Processors Issue and Despatch 23rd Jan, 2006 Anshul Kumar, CSE IITD
Early proposals/prototypes Term Superscalar Cheetah America project(4) IBM Multititan project(2) DEC Match(2) Torch(4) Stanford U SIMP(4) DSNS(4) Kyushu U 1982 1983 1984 1985 1986 1987 1988 1989 Anshul Kumar, CSE IITD
Commercial superscalars RISCs • Intel 960KA/KB 960CA (3) 1989 • IBM Power 1 RS/6000 (4) 1990 • HP PA7000 PA7100 (2) 1992 • SUN SPARC SuperSparc (3) 1992 • DEC Alpha 21064(2) 1992 • Motorola MC88100 MC88110(2) 1993 • Motorola PowerPC 601/603 (3) 1993 • MIPS R4000 R8000(4) 1994 Anshul Kumar, CSE IITD
Commercial superscalars CISCs • Intel 80486 Pentium (2) 1993 • Motorola MC68040 MC68060 (2) 1993 • Gmicro Gmicro/100p Gmicro 500 (2) 1993 • AMD K5(2) – 4 RISC instr 1995 • CYRIX M1 (2) 1995 Anshul Kumar, CSE IITD
Tasks of superscalar processing Parallel Parallel Preserving the decoding instruction sequential and issue execution consistency of instruction execution and exception processing Anshul Kumar, CSE IITD
Superscalar decode and issue I - cache I - cache Instruction buffer Instruction buffer Scalar Issue Superscalar Issue Decode & Issue Decode & Issue IF D/I IF D I Anshul Kumar, CSE IITD
Parallel Decoding • Fetch multiple instructions in instruction buffer • Decode multiple instructions in parallel – instruction window • Possibly check dependencies among these as well as with the instructions already under execution Anshul Kumar, CSE IITD
Pre-decoding • Do partial decoding while instructions are being loaded in I-cache • Decoded information is appended to the instruction • This includes instruction class, resources required etc. Second level cache or main memory N bits/cycle Pre-decode unit N + n bits/cycle I - cache Anshul Kumar, CSE IITD
Number of Pre-decode bits ProcessorNo. of predecode bits PA 7200 (1995) 5 PA 8000 (1996) 5 PowerPC 620(1996) 7 UltraSparc (1995) 4 HAL PM1 (1995) 4 AMD K5 (1995) 5 (per byte) R 10000 (1996) 4 Anshul Kumar, CSE IITD
Blocking Issue Decode and issue to EU Instructions may be blocked due to data dependency Non-blocking Issue Decode and issue to buffer From buffer dispatch to EU Instructions are not blocked due to data dependency Issue vs Dispatch Anshul Kumar, CSE IITD
Blocking Issue Instruction buffer issue window Decode Check & Issue EU EU EU Anshul Kumar, CSE IITD
Non-blocking (shelved) Issue Instruction buffer Decode & Issue Reservation station Reservation station Reservation station Dep. Checking/ dispatch Dep. Checking/ dispatch Dep. Checking/ dispatch EU EU EU Anshul Kumar, CSE IITD
Handling of Issue Blockages Preserving issue order Alignment of instruction issue aligned unaligned in-order out of order Anshul Kumar, CSE IITD
Issue Order Issue in strict program order Out of order Issue Issue window Issue window Instructions to be issued Instructions issued Instructions to be issued Instructions issued e d c b a e d c b a a c a Example: MC 88110, PowerPC 601 Independent instruction Dependent instruction Issued instruction Anshul Kumar, CSE IITD
Alignment Aligned Issue Unaligned Issue next window fixed window gliding window checked in cycle 1 h g f e d c b a h g f e d c b a issued in cycle 1 a a checked in cycle 2 h g f e d c b h g f e d c b issued in cycle 2 c b c b checked in cycle 3 h g f e d h g f e d issued in cycle 3 d f e d Anshul Kumar, CSE IITD
Design choices in instruction issue Coping with Coping with Use of Handling of Issue false data unresolved shelving issue blockages rate dependencies control (2-6) dependencies blocking shelved no Register renaming wait speculative Anshul Kumar, CSE IITD
Frequently used issue policies in scalar processors Traditional Traditional Traditional Traditional scalar issue scalar issue scalar issue scalar issue with shelving with shelving with spec. and renaming execution i386 MC68030 R3000 Sparc CDC 6600 IBM 360/91 I486 MC68040 R4000 MicroSparc Anshul Kumar, CSE IITD
Frequently used issue policies in super scalar processors Straightforward Straightforward Straight forward Advanced superscalar superscalar superscalar superscalar issue issue with issue with issue shelving renaming (renaming+shelving) (speculative execution in all) aligned unaligned R10000 PentiumPro PowerPC602 PA8000 Sparc64 Am29000 K5 MC88110 R8000 MC68060 PA7200 UltraSparc Pentium PowerPC601 PA7100 SuperSparc Alpha21164 PowerPC602 Anshul Kumar, CSE IITD
Frequently used issue policies Traditional Traditional Straight forward Advanced scalar issue scalar issue superscalar issue superscalar with spec. Issue execution aligned unaligned Anshul Kumar, CSE IITD
Design Space of Shelving Scope of Layout of Operand fetch Instruction shelving shelving policy dispatch scheme buffers partial full Anshul Kumar, CSE IITD
Layout of Shelving Buffers Type of the Number of Number of read shelving buffers shelving buffer entries and write ports depends on no. of EUs connected individual 2-4 group 6-16 central 20 total 15-40 Stand combined with alone renaming and (RS) reordering Anshul Kumar, CSE IITD
RS RS RS RS RS Reservation Stations (RS) Individual RSs Group RSs Central RS EU EU EU EU EU EU EU EU Anshul Kumar, CSE IITD
Combined Buffer(for Shelving, Renaming, Reordering) From decode/issue Deferred scheduling, Register renaming and Instruction Shelving DRIS EU EU Anshul Kumar, CSE IITD
Operand Fetch Policies Issue bound fetch Dispatch bound fetch Anshul Kumar, CSE IITD
RS RS RS RS Issue bound operand fetch(with single register file) instruction data Decode/issue RF EU EU EU EU Anshul Kumar, CSE IITD
instruction data RF RS RS RS RS Dispatch bound operand fetch (with single register file) Decode/issue EU EU EU EU Anshul Kumar, CSE IITD
RS RS RS RS Issue bound operand fetch(with multiple register files) instruction data Decode/issue RF RF EU EU EU EU Anshul Kumar, CSE IITD
instruction data RF RF RS RS RS RS Dispatch bound operand fetch (with multiple register files) Decode/issue EU EU EU EU Anshul Kumar, CSE IITD
RS RS RS RS Updating RFs and RSs instruction data Decode/issue RF RF EU EU EU EU Anshul Kumar, CSE IITD
Instruction dispatch scheme Dispatch Dispatch Checking Treatment of policy rate operand empty RS availability single multiple instr/ instr/ cycle cycle Individual RS Group or central RS Anshul Kumar, CSE IITD
Dispatch policy Selection Arbitration Dispatch rule rule order Rule for identifying instructions which are ready for execution (data dependency check) Rule for choosing one out of several ready instructions (earlier instruction has priority) Anshul Kumar, CSE IITD
RS RS Dispatch order in-order partially out of out of order order check check Anshul Kumar, CSE IITD
Checking availability of operands Direct check of Check of explicit score-board bits status bits in RS (usual for dispatch (usual for issue bound operand fetch) bound operand fetch) control flow approach data flow approach Flynn’s terminology Anshul Kumar, CSE IITD
Score-board Introduced with CDC6600 Data status 0 Register File 1 1 0 2 1 0 1 Anshul Kumar, CSE IITD
Checking in dispatch bound fetch decoded instruction check V bits of sources Reservation station update Rd set V bit Rs1,Rs2,Rd reset V bit of Rd OC Rs1 Rs2 Rd Register File Os1 OC (opcode) Os2 (operand value) EU result, Rd Anshul Kumar, CSE IITD
Checking in issue bound fetch decoded instruction update Rd, set V bit Rs1,Rs2,Rd reset V bit of Rd Register File Os1 Os2 (operand value) check Vs1, Vs2 Reservation station OC, Os1, Os2, Rd OC Os1/Is1 Vs1 Os2/Is2 Vs2 Rd EU associative update of Is1, Is2 with Rd, set Vs bits result, Rd Anshul Kumar, CSE IITD
RS RS Treatment of an empty RS Straight forward Bypassing approach RS if empty At least one cycle stay in RS EU EU Sparc64 PowerPc 604 Nx586 Anshul Kumar, CSE IITD
Approaches in dispatching Straight forward Enhanced Advanced in order partially out of order out of order single single multiple instr/cycle instr/cycle instr/cycle individual RSs individual RSs group/central RSs Power1, PPC603 Power2 PM1, PentiumPro Nx586, Am29000 PPC604,620 PA8000, R10000 Anshul Kumar, CSE IITD