1 / 19

A First Look at the Interplay of Code Reordering and Configurable Caches

A First Look at the Interplay of Code Reordering and Configurable Caches. Nikil Dutt Center for Embedded Computer Systems School for Information and Computer Science University of California, Irvine. Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering

zwi
Download Presentation

A First Look at the Interplay of Code Reordering and Configurable Caches

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A First Look at the Interplay of Code Reordering and Configurable Caches Nikil Dutt Center for Embedded Computer Systems School for Information and Computer Science University of California, Irvine Ann Gordon-Ross and Frank Vahid* Department of Computer Science and Engineering University of California, Riverside *Also with the Center for Embedded Computer Systems, UC Irvine This work was supported by the U.S. National Science Foundation, and by the Semiconductor Research Corporation

  2. Optimizations • Optimization is an important part of the design of an application or system Area Performance Power and/or energy

  3. Instruction Cache Optimizations • The instruction cache is a good candidate for optimizations • Gordon-Ross ‘04 Instruction caches have predictable spatial and temporal locality. 90% of execution time is spent in 10% of the code • ARM920T(Segars ‘01) Power hungry - 29% of power consumption

  4. int x; x = 5; … int x; x = 5; … int x; x = 5; … Instruction Cache Tuning - Code Reordering • Tune the instruction stream for increased cache utilization and thus increased performance • Reorder the code so that infrequently executed regions of code do not pollute the instruction cache. Download Compile Link obj file App Code reordering is typically applied during link time however runtime methods do exist but incur undesirable runtime overhead. Execute

  5. Instruction Cache Tuning - Code Reordering while (input) while (input) Read input Read input no 100 Is the input valid? Is the input valid? Code Reordering yes yes 1 no Process input Error handling routine Process input Done Done Error handling routine

  6. Instruction Cache Tuning - Configurable Cache Tuning • Tune the cache to the instruction stream for decreased energy and/or increased performance • Cache tuning can be performed during application/platform design or even in system during runtime incurring no runtime overhead (Zhang - DATE’04) OR

  7. } { Instruction Cache Tuning - Configurable Cache Tuning • Tunable parameters include: Cache Associativity Cache Line Size Total Cache Size L1 Cache L1 Cache L1 Cache

  8. int x; x = 5; … int x; x = 5; … int x; x = 5; … Motivation - Code Reordering + Cache Configuration Cache configuration tunes the cache to the instruction stream How do these optimizations affect each other? Complement? Obviate? Instruction Cache Degrade? Code reordering tunes the instruction stream for the cache

  9. Pettis and Hansen Code Reordering • Many current code reordering techniques are based heavily off of the Pettis and Hansen code reordering algorithm - 1990 • Reorder basic blocks using edge profiling to increase locality • Orders basic blocks so that the most frequently executed path through the basic blocks is placed as straight-line code

  10. Pettis and Hansen Bottom-up Positioning Algorithm Control Flow Graph • Process arc weights in decreasing order • For each arc, merge basic blocks at the source and destination of each arc to form a chain • If one of the blocks is already in the middle of a chain, form a new chain Reordered basic block chains Execution frequencies Basic Blocks

  11. Configurable Cache Architecture • We used the configurable cache architecture proposed by Zhang - ISCA’03

  12. Configurable Cache Architecture • The base cache consists of 4 2KByte banks that may individually be shutdown for size configuration • Way concatenation allows for configurable associativity Way shutdown 8 KBytes 4 KBytes 8 KBytes 2-way

  13. } { Configurable Cache Heuristic L1 Cache …then tune cache line size… 16, 32, and 64 bytes …and finally tune cache associativity L1 Cache Direct-mapped, 2-way and 4-way L1 Cache First tune cache size… 2, 4, and 8 KBytes

  14. Powerstone MediaBench EEMBC Evaluation Framework Cache Exploration Heuristic No code reordering Chosen cache configuration Exhaustive search for comparison purposes Instrument the executable to gather edge profiles Execute the application Code reordered executable PLTO* Pentium Link Time Optimizer Hit and miss ratios for each configuration Provide edge profiles to perform code reordering Cache energy - Cacti Main memory energy - Samsung memory Execute the application to gather edge profiles *Provided by the University of Arizona

  15. Results - Energy Savings Base cache = 2KB, d-m, 16 byte line size Base Cache With Code Reordering Base Cache Without Code Reordering Configured Cache Without Code Reordering Configured Cache With Code Reordering 1.5 1.5 • Code reordering alone = 3.5% energy reduction • Cache configuration alone = 15% energy reduction • Cache configuration + code reordering = 17% energy reduction

  16. Results - Performance Benefits Base Cache Without Code Reordering Base Cache With Code Reordering Configured Cache Without Code Reordering Configured Cache With Code Reordering 1.5 1.6 • Code reordering alone = 3.5% performance benefit • Cache configuration alone = 17% performance benefit • Cache configuration + code reordering = 18.5% performance benefit • On average, code reordering gives little additional benefit over cache configuration alone. However a few benchmarks see added benefits.

  17. Change in Cache Requirements Due to Code Reordering x x x x * x * * x * * x x * *Powerstone **Mediabench ***EEMBC x - reduction in cache area - larger line size - smaller cache size *

  18. Conclusions • We explore the interplay of two instruction cache optimization techniques - code reordering and cache configuration • Cache configuration largely obviates the need for code reordering with respect to energy and performance • Cache configuration applied dynamically during runtime eliminates the need for designer applied code reordering • Code reordering improved cache utilization in 52% of the benchmarks • Reduced instruction cache size by an average of 13% and as high as 90% - beneficial for small custom synthesized embedded systems where area is critical

  19. Future Work • We plan to use a more advanced code reordering methodology that will take into account set assiociativity or multiple levels of cache • We plan to study the iterative interplay of code reordering and cache configuration using a code reordering technique that takes the cache configuration into consideration

More Related