Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith

Future Superscalar Processors Based on Instruction Compounding Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith

Instruction Compounding (Fusing) Instruction compounding, or “fusing” has become a key idea in high performance microprocessors “A compound instruction reflects the parallel issue of instructions; it comprises some number of independent instructions or interlocked instructions” “Instructions composing a compound instruction need not be consecutive.” -- S. Vassiliadis et al. IBM Journal of R and D, Jan. 1994 Future Microprocessors

The Future Processor: Three Key Aspects • Instruction compounding or fusing • Based on S. Vassiliadis work • Employs compounding and 3-input ALU • Co-designed VM for dynamic translation/fusing • Concealed from all software • Optimized (fused) instructions held in code-cache • Dual decoder front-end for fast startup • Hardware front-end decoder for fast startup • Software translator for sustained high performance Future Microprocessors

Processor Micro-architecture Future Microprocessors

Fusible Instruction Set • RISC-ops with unique features: • A fusible bit per instruction fuses two dependent instructions • Dense instruction encoding, 16/32-bit ISA design • Special Features to Support the x86 ISA • Condition codes • Addressing modes • Aware of long immediate & displacement values Future Microprocessors

Microarchitecture: Macro-op Execution • Enhanced OOO superscalar microarchitecture • Process & execute fused macro-ops as single Instructions throughout the entire pipeline Future Microprocessors

Macro-op Fusing Algorithm • Objectives: • Maximize fused dependent pairs • Simple & Fast • Heuristics: • Pipelined Scheduler: Only single-cycle ALU ops can be a head. Minimize non-fused single-cycle ALU ops • Criticality: Fuse instructions that are “close” in the original sequence. ALU-ops criticality is easier to estimate. • Simplicity: 2 or fewer distinct register operands per fused pair • Solution: Two-pass Fusing Algorithm: • The 1st pass, forward scan, prioritizes ALU ops, i.e. for each ALU-op tail candidate, look backward in the scan for its head • The 2nd pass considers all kinds of RISC-ops as tail candidates Future Microprocessors

Fusing Algorithm: Example x86 asm: ----------------------------------------------------------- 1. lea eax, DS:[edi + 01] 2. mov [DS:080b8658], eax 3. movzx ebx, SS:[ebp + ecx << 1] 4. and eax, 0000007f 5. mov edx, DS:[eax + esi << 0 + 0x7c] RISC-ops: ----------------------------------------------------- 1. ADD Reax, Redi, 1 2. ST Reax, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. AND Reax, 0000007f 5. ADD R17, Reax, Resi 6. LD Redx, mem[R17 + 0x7c] After fusing: Macro-ops ----------------------------------------------------- 1. ADD R18, Redi, 1 :: AND Reax, R18, 007f 2. ST R18, mem[R22] 3. LD.zx Rebx, mem[Rebp + Recx << 1] 4. ADD R17, Reax, Resi :: LD Rebx, mem[R17+0x7c] Future Microprocessors

Instruction Fusing Profile • 55+% fused RISC-ops  increases effective ILP by 1.4 • Only 6% single-cycle ALU ops left un-fused. Future Microprocessors

Other DBT Software Profile • Of all fused macro-ops: • 50%  ALU-ALU pairs. • 30%  fused condition test & conditional branch pairs. • Others  mostly ALU-MEM ops pairs. • Of all fused macro-ops: • 70+% are inter-x86instruction fusion. • 46% access two distinct source registers, • only 15% (6% of all instruction entities) write two distinct destination registers. • Translation Overhead Profile • About 1000 instructions per translated hotspot instruction. Future Microprocessors

Co-designed x86 Processor Performance  Future Microprocessors

Dual Decoder Front-End Future Microprocessors

Evaluation: Startup Performance Future Microprocessors

Activity of HW Assists Future Microprocessors

Important Research Issues • Profiling • Probe insertion via software translator not feasible • Multi-core • Shared code cache • SMT designs • Memory consistency • Stores can be done in-order • Re-scheduled loads may be important for performance • Precise traps • Potential HW assist? Future Microprocessors

Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith

Stamatis Vassiliadis Symposium Sept. 28, 2007 J. E. Smith

Presentation Transcript

Georgia Tech November 2006 J. E. Smith

Friday, Sept. 28

WTO Information Technology Symposium 28-29 March 2007

22. Sept. 2007

Sept 2007

November 2004 J. E. Smith

Sept 28

Stamatis Vassiliadis Symposium The Future of Computing A+A=A

Computing Frontiers May 2005 J. E. Smith

DARK2007 Sydney, Sept 24 th -28 th , 2007

17 th AER Symposium Yalta, 24-28 September, 2007

UWCISA Symposium 2007

EBRD in Romania Symposium Cluj-Napoca 28 March 2007

ARIPPA Technical Symposium August 28, 2007

28 Sept. 2010

J. E. Smith June 2007

Friday, Sept. 28

WTO Information Technology Symposium 28-29 March 2007