Dynamic Removal of Redundant Computations

ICS´99, Rhodes (Greece) - June 20-25, 1999 Dynamic Removal of Redundant Computations Carlos Molina, Antonio González and Jordi Tubella Universitat Politècnica de Catalunya - Barcelona{cmolina,antonio,jordit}@ac.upc.es

Motivation Quasi-common subexpression Quasi - invariant . . . . . R = S / T ; . . . . . X = S / U ; . . . . . for (i=0; i<N; i++) A[i] = B[i]+C[i];

Outline • Instruction Reuse • Related Work • Redundant Computation Buffer • Performance Results • Conclusions

Instruction Reuse Reuse Mechanism index OOO Execution Fetch Commit Decode & Rename

Related Work • Instruction Reuse • Value Cache for the Tree Machine (Harbison 82) • Result Cache (Richardson 92, Oberman et al. 95) • Reuse Buffer (Sodani and Sohi 97) • Physical Register Reuse (Jourdan et al. 98) • Trace Reuse • Basic blocks (Huang and Lilja 99) • General traces (González et al. 99)

Related Work • Result Cache • Richardson 92, Oberman & Flynn 95 • Special purpose (long latency operations) • Indexed by operand values • No reuse chaining • Can reuse dynamic instances of other static instructions • Reuse Buffer • Sodani & Sohi 97 • General purpose • Indexed by PC • Reuse chaining • Only reuse dynamic instances of same static instructions

address tag result Redundant Computation Buffer Vtable Atable pointer Mtable Atable opcode result/address opnd1 opnd2 pointer Reuse Test Reused Memory Value Reused Value

div 8 2 4 nil 10: 4 I1: 8 / 2 = 4 RCB (Working Example) Vtable Atable while (cond) { r = s / t ; ...... x = s / u ; }

4 div 8 2 4 nil 20: I2: 8 / 2 = 4 RCB (Working Example) Vtable Atable div 8 2 4 nil 10: while (cond) { r = s / t ; ...... x = s / u ; }

div 8 2 4 nil 4 div 8 2 4 20: I2: 8 / 2 = 4 RCB (Working Example) Vtable Atable 10: while (cond) { r = s / t ; ...... x = s / u ; }

div div 9 8 3 2 4 3 nil nil 3 4 div 8 2 4 nil 20: I1: 9 / 3 = 3 I2: 9 / 3 = 3 RCB (Working Example) Vtable Atable 10: while (cond) { r = s / t ; ...... x = s / u ; }

opcode result/address opnd1 opnd2 address tag address tag result result PC Enhancements to Other Schemes • Enhanced Result Cache Mtable Atable Operands • Enhanced Reuse Buffer Mtable Atable opcode result/address opnd1 opnd2

fetch decode& rename opnd read &dispatch issue execute write back commit Atable lookup reuse test Latency of the Reuse Buffer 1stAtable lookup 2ndAtable lookup reuse test Latency of the RCB Atable lookup reuse test Latency of the Result Cache Timing Considerations Pipeline Stages

Experimental Framework • Simulator Alpha version of the SimpleScalar Toolset • Benchmarks Spec95 • Maximum Optimization Level DEC C & F77 compilers with -non_shared -O5 • Statistics Collected for 125 million instructions Skipping initializations

Basic Reuse Statistics • We evaluate different schemes - Enhanced Result Cache (ERC) - Enhanced Reuse Buffer (ERB) - Redundant Computation Buffer (RCB) • We find best configuration for each scheme - Number of entries - History depth • Best configurations will be evaluated - Percentage of reuse - Speedup

Quasi-Common Subexpressions 32 KB

Study of Reuse (Comparative) | | | | | | | | | 8 16 32 64 128 256 512 1024 2048 4096 Size in Kbytes

Performance Evaluation • Two different capacities are evaluated - 32 KB - 200 KB • Best configuration has been chosen for each reuse scheme • We present a performance evaluation for a supercalar processor - Speedup - Percentage of reuse

Base Microarchitecture

1.20 1.15 1.10 1.05 1.00 Speedup (32 KB)

Speedup (200 KB) 1.25 1.20 1.15 1.10 1.05 1.00

Reuse (32 KB) Ops ready

Reuse (200 KB) Ops ready

Reuse by Instruction Category Load Value Memory Address Arithmetic  Cond Branch

opco opco res/addr res/addr op1 op1 op2 op2 pointer pointer opco res/addr op1 op2 nil opcod result/addr opnd1 opnd2 Hybrid Scheme Atable Atable PC PC Atable Opnds Opnds

Speedup (Hybrid Scheme) 1.20 1.15 1.10 1.05 1.00

Reuse (Hybrid Scheme)

Speedup (Perfect Reuse Engine) 2.20 2.00 1.80 1.60 1.40 1.20 1.00

Conclusions • Redundant Computation Buffer • Quasi-invariants • Quasi-common subexpressions • High reuse coverage and low latency • 30% reuse • 10% speedup • Outperforms previous schemes

Dynamic Removal of Redundant Computations

Dynamic Removal of Redundant Computations

Presentation Transcript

Redundant Expression Elimination

Complexity of Computations

Dynamic Computations in Ever-Changing Networks

Cryptography Computations

Redundant EPICS IOCs

Redundant Service Removal in QoS -aware Service Composition

Heliospheric Computations

Visibility Computations

Redundant Routers

Cartesian Computations

Parallelizing Computations

Matrix Computations

Visibility Computations:

Enhancement of Reliability and Dynamic Load Balancing for Distributed Parallel Computations.

Computations

Removal of redundant legacy standards. Standards of Service Sub Committee 22 nd July 2010

Clustering and Load Balancing Optimization for Redundant Content Removal

Redundant Journal Access

Eliminating Redundant Information

Redundant Journal Access

Cartesian Computations