130 likes | 391 Views
Levels of Performance ? You Get What You Pay For. Recall:Dynamic Random Access Memory (DRAM)Capacitors to store state (0 or 1)Periodically refreshedRelatively cheapStatic Random Access Memory (SRAM)Transistors to store stateDoesn't need to be refreshed, faster, and uses less power than DRAMM
E N D
1. Computer Systems ArchitectureA networking ApproachChapter 12 IntroductionThe Memory Hierarchy CS 147
Nathaniel Gilbert 1
2. Levels of Performance – You Get What You Pay For Recall:
Dynamic Random Access Memory (DRAM)
Capacitors to store state (0 or 1)
Periodically refreshed
Relatively cheap
Static Random Access Memory (SRAM)
Transistors to store state
Doesn’t need to be refreshed, faster, and uses less power than DRAM
More expensive than DRAM 2
3. Levels of Performance cont. 3
4. Levels of Performance cont. 4
5. Localization of Access – exploiting repetition Computers tend to access the same locality of memory.
This is partly due to the programmer organizing data in clusters along with the compiler attempting to organize code efficiently.
This localization can be exploited in memory hierarchy. 5
6. Localization of Access cont. Exploiting localization of memory access
Keep related data in smaller groups (try not to store all input and output to a single array when reading from/writing to disk)
Only the portion of data the CPU is using should be loaded into faster memory. 6
7. Localization of Access cont. 7
8. Localization of Access cont. On a sun workstation (200 MHz CPU, 256 Mbyte main memory, 256 kbyte cache, 4 Gbyte local hard drive), the output was: 8
9. Localization of Access cont. The reason for the doubling of time is the movement of data up and down the data hierarchy.
The array is sent to higher memory in blocks because the 256 kbytes of cache memory cannot hold the whole object. 9
10. Instruction and Data Caches – Matching Memory to CPU Speed A 2 GHz Pentium CPU accesses program memory an average off 0.5 ns just for fetching instructions
DDO DRAM responds within 10 ns. If the CPU only used DRAM, it would result in 20x loss in speed
This is where using SRAM (cache) comes into play
Downfall of cache:
Misses (if the desired code is not in the memory segment) may take longer because the memory has to be reloaded
Negative cache – (depending on architecture) where negative results (failures) are stored 10
11. Instruction and Data Caches cont. Cache is built from SRAM chips, and ideally are made to match the system clock speed of a CPU
The Cache Controller Unit (CCU) and cache memory, are inserted between the CPU and the main memory.
Level 1 and Level 2 cache are different by placement.
Level 1 is on the CPU chip.
Level 2 was generally located off the CPU chip and was slowed down by the system bus. Intel successfully integrated a 128 kbyte L2 cache memory onto the CPU and continues to offer integrated chips. 11
12. Instruction and Data Caches cont. Generic System Architecture
Level 1 is the microprocessor with three forms of cache:
D-cache – (Data) Fast buffer containing application data
I-cache – (Instruction) Speed up executable instruction
TLB – (Translation Lookaside Buffer) Stores a map of translated virtual page addresses
Level 2 is Unified cache
Memory – DRAM
CPU and Register file reside in Level 1
Register file – Small amount of memory closest to CPU where data is manipulated
12
13. Thank You 13