Computer Architecture Principles Dr. Mike Frank

Computer Architecture PrinciplesDr. Mike Frank CDA 5155 (UF) / CA 714-R (NTU)Summer 2003 Module #28 Limits to Instruction-Level Parallelism

Limits of ILP (3.8) • There are limits to the amount of instruction-level parallelism that may be exploited! • Assume a perfect processor: • ∞ virtual registers,  no WAR/WAW hazards • Perfect branch/jump prediction • All memory addresses known/predicted exactly • Perfect caches (no fetch stalls) • ∞ issue width for all instruction types • 1-cycle execution latency for all instruction types • Only limits on ILP are then due to true data dependences through registers/memory. • Caveat: May not be a true limit, b/c even these may be reduced somewhat, through data value prediction.

Average ILP in a Perfect Processor Note that ILP even on a perfect processor is limited in real applications!

Implications of ILP Limitations • Part of the historical improvements in computer performance have come from decreased CPI • Increased IPC, or amount of ILP exploited • Reaching an ILP limit implies: No more CPI reductions! (For serially-written programs.) • Then, further perf. improvements may only be from: • Reduce total # instructions (more efficient app. algorithms) • Increased parallelism via other methods: • Programmer-visible parallelism, via various styles • Thread-level parallelism • Explicit vector-based programming styles • Improved clock speed: • Reduce gate delays per clock (but, minimum is 1) • Improve logic gate speed (but, scales only  length)

Effect of Window Size on ILP IPC Max size to date,typical Used insubsequentanalysis

Imperfect Branch Prediction Effect on ILP exploitable across basic blocksvia speculation

Effect of Finite Register Set

Imperfect Alias Analysis

Ambitious Semi-Realistic ILP IPC Configuration studied here: 64 IPC, no issue restrictions; tournament predictor, 1Kentries, 16-entry return predictor; perfect dynamic aliasanalysis; renaming w. 64 extra integer, 64 extra FP regs.

Limits on ILP in ‘Perfect’ Scheme • WAR & WAW hazards through memory • E.g. reuse of stack memory in procedure calls • Unnecessary dependences on: • Loop counter variables – fix via unrolling • Return address register – fix w. return stack • Stack pointer – fix by using non-stack-based code? • Dataflow limit • Might be overcome through value prediction • Predicting data values, speculating on result • Predicting address values for memory alias elimination

Computer Architecture Principles Dr. Mike Frank