330 likes | 346 Views
Learn about cache concepts, memory hierarchy, cache coherence, and specific techniques for cache tuning in the Global Cyber Bridges program. Includes exercises and discussion.
E N D
Cache Tuning – Global Cyber Bridges CacheTuning Student: João Gabriel Gazolla Professor: Dr. S. Masoud Sadjadi
Sections Cache Tuning – Global Cyber Bridges • Cache Concepts • Locality • Cache Hit and Miss • Memory Hierarchy • Kinds of Cache • Cache Coherence • Specifics • Thrashing • Cache Exercises • Conclusion • Discussion
Cache Concepts Cache Tuning – Global Cyber Bridges ADD A,B,C MOVE B,A MUL A,B,C clock cycles executing instructions clock cycles waiting for memory • CPU time required to perform an operation is:
Cache Concepts Cache Tuning – Global Cyber Bridges • The CPU cannot be performing useful work if it is waiting for data to arrive from memory.
Cache Concepts • The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.
Cache Concepts • The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.
Cache Concepts • The memory system is a major factor in determining the performance of your program and a large part is your use of the cache.
Cache Concepts Cache Tuning – Global Cyber Bridges • Other Comments:
Interleaving Cache Tuning – Global Cyber Bridges bank cycle time is 4-8 times the CPU clock So if I can acess in parallel I solve the problem getting more information and putting together • Sequential Elements, are together (Fortran Style):
Temporal Locality Cache Tuning – Global Cyber Bridges #include <iostream> ... Intmain(){ int a = 0; for (int i=0;i<987654;i++){ a = a+i; cout << a << endl; } return 0; } Cache It! 90% of Time 10% of THE CODE “When an item is referenced, it will be referenced again soon”
Spatial Locality Cache Tuning – Global Cyber Bridges Get Data N and... N+1,N+2,N+3,N+4 Butnotsomany... “When an item is referenced, items whose addresses are nearby will tend to be referenced soon. ”
Cache Hit MAXIMIZE it ! Cache Tuning – Global Cyber Bridges What is Cache Hit Rate?
Cache Miss MINIMIZE it ! Cache Tuning – Global Cyber Bridges What is Cache Miss Rate? What is Cache Miss Penalty?
Memory Hierarchy Sizes *1024 Bytes Cache Tuning – Global Cyber Bridges *1024 KBytes *1024 MBytes GBytes
There are 3 kinds of cache: Cache Tuning – Global Cyber Bridges • Direct mapped cache • Set associative cache • Fully associative cache 21%
Directed Maped Cache Cache Tuning – Global Cyber Bridges How it works? use MOD op. Direct Mapped Cache
Thrashing Process has not enough pages Page-Fault is Ultra High Low CPU Usage Let’s Increase Multiprogramming Cache Tuning – Global Cyber Bridges
Fully Associative Cache Cache Tuning – Global Cyber Bridges
Set Associative Cache Cache Tuning – Global Cyber Bridges • This is a trade-off between direct mapped and fully associative cache.
Cache Block Replacement Cache Tuning – Global Cyber Bridges • direct mapped cache
Cache Block Replacement Cache Tuning – Global Cyber Bridges FIFO Random LRU “When an item is referenced, it will be referenced again soon” • set associative cache
Cache Specifics Cache Tuning – Global Cyber Bridges Itanium SGI Origin 2000 Pentium III • CacheSize • Replacement • Acess Time • Commands to Measure Performance Specificsandit’stechnology Go To: tinyurl.com/gcbcache2
Cache Coherence Copy 1 of Data A Copy 2 of Data A Cache Tuning – Global Cyber Bridges Data A Copy 3 of Data A
P1 P2 P3 PN Cache Coherence: Snoop Protocol . . . Cache Tuning – Global Cyber Bridges WritingonLine 4 Line 4 notValidAnyMore MEMORY
Cache Coherence: Directory Based Protocol Cache Tuning – Global Cyber Bridges • Directory Based Protocol • Cache lines contain extra bits that indicate which other processor has a copy of that cache line, and the status of the cache line – clean (cache line does not need to be sent back to main memory) or dirty (cache line needs to update main memory with content of cache line). • Hardware Cache Coherence • Cache coherence on the Origin computer is maintained in the hardware, transparent to the programmer.
Cache Coherence: False Sharing Cache Tuning – Global Cyber Bridges struct foo { volatile int x; volatile int y; }; foo f; int sum_a() { int s = 0; for (int i = 0; i < 1000000; ++i) s += f.x; return s; } void inc_b() { for (int i = 0; i < 1000000; ++i) ++f.y; }
Cache Exercises sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; • Examples of Locality: • Data • Acess Elements in Series: • Reference to sum in each iteraction: • Instruction • Instruction done in Sequence: • Always walking through the loop: Spatial Temporal Spatial Temporal
Cache Exercises int sumarrayrows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum } Does this function has Good locality ? 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
Cache Exercises int sumarraycols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum } Does this function has Good locality ? 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
Conclusions 100%
Sources Cache Tuning – Global Cyber Bridges • Slides Prepared from the CI-Tutor Courses at NCSA by S. Masoud Sadjadi • Memória Cache, Simone Martins, 2008. • Wikipedia • www.ariadne.ac.uk • parasol.tamu.edu/~rwerger/Courses/654/cachecoherence1.pdf • www.cs.unc.edu/~montek/teaching/fall-05/lectures/lecture-16.ppt • http://www.ic.uff.br/~simone/sistemascomp/ • David A. Patterson; John L. Hennessy. Organização e Projeto de Computadores, A Interface Hardware/Software LTC, 2000. Página do livro em inglês .
Sources • Randal E. Bryant and David R. O´Hallaron. Computer Systems: A Programmer´s Perspective. Prentice Hall 2002. Página do livro • Many Google Image Queries Cache Tuning – Global Cyber Bridges
Doubts? Comments? Extras? Cache Tuning – Global Cyber Bridges • Download of the Presentation: • www.gabrielgazolla.com/gcbCT.zip