150 likes | 369 Views
Lecture 11 : Modern Superscalar Processor Models. Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design. Generic Superscalar Processor Models. Issue queue based. FU. Wakeup select. Regfile. bypass. Fetch. Rename. D-cache. FU. commit. schedule. execute.
E N D
Lecture 11: Modern Superscalar Processor Models Generic Superscalar Models, Issue Queue-based Pipeline, Multiple-Issue Design
Generic Superscalar Processor Models Issue queue based FU Wakeup select Regfile bypass Fetch Rename D-cache FU commit schedule execute Reservation based (already studied) Reg FU bypass Fetch Rename D-cache ROB Wakeupselect FU commit schedule execute Revised from Paracharla PhD thesis 1998
Issue Queue Based Pipeline Fetch->Rename->Issue->Reg-read-> Execute->Writeback/Commit Core structure: register mapping table • Rename: translate architectural registers into physical registers • Issue: send instruction out to register read and then execution • Commit: Process mis-prediction/exception, update register renaming Why study? Used in Alpha 21264, MIPS R10000, Intel P4
Compare Reservation Station and Issue Queue • Pipeline Stage Sequence • RS: IF -> REN -> REG/ROB->SCHD->… • IQ: IF -> REN -> SCHD -> REG ->… • Mapping Table vs. Status Table • RS: Status table chooses architectural register or ROB • IQ: Always renames to a physical register • Register file • RS: Architectural register file stores architectural states • IQ: Physical register file; No architectural register file! Mapping table determines architectural states
Compare Reservation Station and Issue Queue • Reservation Station • RS: busy, fu, op, Qj, Qk, Vj, Vk • IQ: busy, fu, op, Pj, Pk, ReadyJ, ReadyK • ROB • RS: Store register values • IQ: No register contents Pros and Cons of IQ: • No copying between ROB and register • Efficient use of register • Bad: Complex mapping table design
Records the mapping from virtual, architectural registers to physical registers Mapping is stored in RAM or CAM memories Register Mapping Table Phy reg Arch reg (virtual) R1 => P3 R2 => P10 R3 => P6 R4 => P8 R5 => P12 …
Loop: LW R2, 0(R1) ADD R2, R2, 1 SW R2, 0(R1) ADD R1, R1, 4 BNE R2, R3, LOOP LW returns 100, R1=1000 Renamed dynamic instructions: … BNE P2, P3, Loop LW P32, 0(P1) ADD P33, P32, 1 SW R33, 0(P1) ADD P34, P1, 4 BNE P34, P3, LOOP … Assume at first BNE.rename, R1-R31 mapped to P1-P31, P32-P127 are free First BNE may be predicted either correctly or not Register Renaming Examples
Register Mapping Status R1 => P1 R2 => P2 R3 => P3 R4 => P4 R5 => P5 … R1 => P1 R2 => P32 R3 => P3 R4 => P4 R5 => P5 … R1=>P1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … R1=>P1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … R1=>R34 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … At commit (possible sequence) P1=4000 P2=200 … P32=100 P33=? P34=4004 P1=4000 P2=200 … P32=100 P33=101 P34=4004 P1=4000 P2=200 … P32=100 P33=101 P34=4004 No change P1=4000 P2=200 … P32=100 P33=101 P34=4004
Commit successful: make the next mapping status as committed mapping status free the previous physical register Mis-prediction/exception: flush pipeline, flush the following mappings Commit and Rollback Rename point commit point R1 => P1 R2 => P2 R3 => P3 R4 => P4 R5 => P5 … R1 => P1 R2 => P32 R3 => P3 R4 => P4 R5 => P5 … P1=>R1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … P1=>R1 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … P1=>R34 R2 => P33 R3 => P3 R4 => P4 R5 => P5 … P1=4000 P2=200 … P32=100 P33=? P34=4004
Program Execution Correctness • Only committed instructions write to register and memory Yes, from programmer’s viewpoint -- only committed instructions’ register output becomes visible • Maintain correct data flow – a child instruction always use the values from its parents Yes, in renamed form, and not affected by speculative execution • Register/memory receives the value of last write Yes, from programmer’s viewpoint --architectural mapping status is updated in program order Note memory correctness is not affected
Mapping Table Design – MIPS R1000 Mapping tables Branch stack Current mapping Mapping after Br4 Alternative PC4 Mapping after Br3 Alternative PC3 Mapping after Br2 Alternative PC2 Committed mapping Mapping after Br1 Alternative PC1 Committed mapping RAM-based structure: • Automatically, parallel saving on branches at rename • On mis-prediction: restore the previous mapping immediately, flush pipeline, restart fetch at the alternative PC • On commit of branch instruction: make the corresponding mapping as the committed one • Stall if branch stack is full
Mapping Table Design – MIPS R1000 • How about precise exception? • Cannot preserve every mapping status for every instruction • Solution: record the change of mapping in ROB • ROB: Contains Dest Architectural Register, Renamed physical register, Old renamed physical register • On exception: rollback mapping one instruction by one instruction, four instructions per cycle • Slow performance – but how frequent is exception? Note branch mis-prediction has fast recovery
Mapping Table Design – Alpha 21264 Valid bits p0 Arch. Reg # 1 1 p1 Arch. Reg # 1 0 Match and valid p2 Arch. Reg # 0 1 … … pk Arch. Reg # 1 1 committed mapping current mapping CAM structure • Associative searching on architecture register index, output physical register index (through an encoder) • One column represents one mapping, allocated to each instruction with register output at rename • One pair of valid bit changes per one dest renaming • Fast recovery even on exceptions
Multiple Issue Pipelines Each pipeline stages accept k instructions – k-issue processor • Alpha 21264 – 4-issue • MIPS R1000 – 4-issue • Intel P4 – 3-issue Memory structure must have multiple ports proportional to issue width! What if k instructions at rename have dependence among them? Need Dependence check logic!
Dependence Check Logic Rs0 Rt0 Rd0 Rs1 Rt1 Rd1 Rs2 Rt2 Rd2 Rs3 Rt3 Rd3 mapping table No dependencecheck yet Ps0 Ps1 Ps0 Ps1 Ps0 Ps1 Ps0 Ps1 Pd0 Pd1 Pd2 Pd3 Any change to the first renaming? What is the change to the second one? Third and forth ones?