1 / 22

Visualization Enables the Programmer to Reduce Cache Misses

Visualization Enables the Programmer to Reduce Cache Misses. Kristof Beyls, Erik H. D’Hollander, Yijun Yu Ghent University PDCS - November 2002. Overview. Introduction Reuse Distance Metric Data Locality Visualization Case Study: MCF Conclusion. Overview. Introduction

von
Download Presentation

Visualization Enables the Programmer to Reduce Cache Misses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Visualization Enables the Programmer to Reduce Cache Misses Kristof Beyls, Erik H. D’Hollander, Yijun Yu Ghent University PDCS - November 2002

  2. Overview • Introduction • Reuse Distance Metric • Data Locality Visualization • Case Study: MCF • Conclusion

  3. Overview • Introduction • Reuse Distance Metric • Data Locality Visualization • Case Study: MCF • Conclusion

  4. Introduction • Anti-law of Moore Relatieve speed versus 1980 1000 PROCESSOR 100 SpeedGap 10 vergeleken met 1980 Relatieve snelheid MEMORY 1 1980 1985 1990 1995 2000

  5. Cache capacity misses dominate 3 cache miss types: Cold, Conflict, Capacity.

  6. Optimization at different levels • Cache optimization at 3 levels: • Hardware: only resolves conflict misses. • Compiler: only resolves tiny portion of the capacity misses. • Algorithm: performed by programmer. Should try to eliminate capacity misses which cannot be handled well by hardware and compiler • Problem: cache behavior is not obvious in source code.

  7. Objectives for cache Visualization • Cache behavior should be visualized program-centric. • Cache behavior should be described accurately in concise way. • Independent of the specific cache parameters.  Reuse Distance metric meets above objectives.

  8. Overview • Introduction • Reuse Distance Metric • Data Locality Visualization • Case Study: MCF • Conclusion

  9. Backward reuse distance > cache sizeCapacity miss 3     03  0 Reuse Distance: Definition • Reuse pair • Reuse distance of reuse pair • Backward reuse distance A B C D D A F

  10. Overview • Introduction • Reuse Distance Metric • Data Locality Visualization • Case Study: MCF • Conclusion

  11. Visualization: Overview 1. Instrumentation 2. Simulation 3. Filtering 4. Visualization 5. Program Optimization

  12. 1. Instrumentation • For every load, store and prefetch-instruction, a memory access is inserted: profile_memaccess( instr_id, address) • Implemented in the Open Research Compiler.

  13. 2. Simulation • Library implements profile_memaccess • uses hash tables and binary treaps to quickly compute • reuse distance per reuse pair • reuse distance distribution of all reuse pairs between any two instructions. • Only the distribution is stored to disk, using XML.

  14. 3. Filtering • Only reuse distance larger than cache size generate capacity misses. • Those are filtered out using an XSLT-filter. E.g.: <reference id="pbeampp.c/primal_bea_mpp:21"> <reuse> <log2distance>15</log2distance> <fromid>pbeampp.c/primal_bea_mpp:21</fromid> <count>24628629</count> </reuse> </reference>

  15. 22.12% 48.09% 4. Visualization • In our prototype, XSLT-script generates input to the VCG-visualizer.

  16. Overview • Introduction • Reuse Distance Metric • Data Locality Visualization • Case Study: MCF • Conclusion

  17. 22.12% 48.09% 70% of capacity misses 5. Optimization for( ; arc < stop_arcs; arc += nr_group ) { if( arc->ident > BASIC ) { red_cost = bea_compute_red_cost( arc ); if( red_cost<0 && arc->ident == AT_LOWER || red_cost>0 && arc->ident == AT_UPPER ) { basket_size++; perm[basket_size]->a = arc; perm[basket_size]->cost = red_cost; perm[basket_size]->abs_cost = ABS(red_cost); } } }

  18. 5. Optimization for( ; arc < stop_arcs; arc += nr_group ) { #define PREFETCH_DISTANCE 8 PREFETCH(arc+nr_group*PREFETCH_DISTANCE) if( arc->ident > BASIC ) { red_cost = bea_compute_red_cost( arc ); if( red_cost<0 && arc->ident == AT_LOWER || red_cost>0 && arc->ident == AT_UPPER ) { basket_size++; perm[basket_size]->a = arc; perm[basket_size]->cost = red_cost; perm[basket_size]->abs_cost = ABS(red_cost); } } }

  19. Speedup due to prefetching

  20. Overview • Introduction • Reuse Distance Metric • Data Locality Visualization • Case Study: MCF • Conclusion

  21. Conclusion • Complement hardware and compiler techniques with programmer-driven optimizations. • Reuse distance indicates cache bottlenecks for a wide range of cache configurations • MCF: speedup between 24% and 48% on CISC, RISC and EPIC processors • Reuse distance visualization enables portable and platform-independent cache optimizations.

  22. Questions?

More Related