1 / 45

An Introduction to Cache Design

An Introduction to Cache Design. Cache. A safe place for hiding and storing things. Webster Dictionary. Even with the inclusion of cache, almost all CPUs are still mostly strictly limited by the cache access-time :

rhea-dodson
Download Presentation

An Introduction to Cache Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Cache Design \course\cpeg323-08F\Topic7a

  2. Cache A safe place for hiding and storing things. Webster Dictionary \course\cpeg323-08F\Topic7a

  3. Even with the inclusion of cache, almost all CPUs are still mostly strictly limited by the cache access-time: In most cases, if the cache access time were decreased, the machine would speedup accordingly. - Alan Smith - Even more so for MPs! \course\cpeg323-08F\Topic7a

  4. While one can imagine ref. patterns that can defeat existing cache M designs, it is the author’s experience that cache M improve performance for any program or workload which actually does useful computation. \course\cpeg323-08F\Topic7a

  5. Optimizing the design of a cache memory Generally has four aspects: • Maximizing the probability of finding a memory reference’s target in the cache (the hitratio). • Minimizing the time to access information that is indeed in the cache (access time). • Minimizing the delay due to a miss. • Minimizing the overheads of updating main memory, maintaining cache coherence etc. \course\cpeg323-08F\Topic7a

  6. . = 4 ~ 20 . = 104 ~ 106 Key Factor in Design Decision for VM and Cache Access-timeMainMem Access-timeCache Access-timeSecondaryMem Access-timeMainMem Cache control is usually implemented in hardware!! \course\cpeg323-08F\Topic7a

  7. Technology in 1990s: Technology in 2000s ? \course\cpeg323-08F\Topic7a

  8. Technology in 2004: See P&H Fig. pg. 469 3rd Ed Technology in 2008s ? \course\cpeg323-08F\Topic7a

  9. Technology in 2008: See P&H Fig. pg. 453 4th Ed \course\cpeg323-08F\Topic7a

  10. Secondary Memory Main Memory Processor Cache in Memory Hierarchy Cache \course\cpeg323-08F\Topic7a

  11. Emerging Memory Device Technologies Source: Emerging Nanoscale Memory and Logic devices: A Critical Assesment”, Hutchby et al, IEEE Computer, May, 2008

  12. Emerging Memory Device Technologies Source: “Emerging Nanoscale Memory and Logic devices: A Critical Assesment”, Hutchby et al, IEEE Computer, May, 2008

  13. \course\cpeg323-08F\Topic7a

  14. Source: Kooge, Peter ACS Productivity Workshop 2008

  15. Four Questions for Classifying Memory Hierarchies: The fundamental principles that drive all memory hierarchies allow us to use terms that transcend the levels we are talking about. These same principles allow us to pose four questions about any level of the hierarchy: \course\cpeg323-08F\Topic7a

  16. Four Questions for Classifying Memory Hierarchies Q1: Where can a block be placed in the upper level? (Block placement) Q2: How is a block found if it is in the upper level? (Block identification) Q3: Which block should be replaced on a miss? (Block replacement) Q4: What happens on a write? (Write strategy) \course\cpeg323-08F\Topic7a

  17. These questions will help us gain an understanding of the different tradeoffs demanded by the relationships of memories at different levels of a hierarchy. \course\cpeg323-08F\Topic7a

  18. Concept of Cache miss and Cache hit 0 1 2 3 4 5 6 7 TAGSDATA 0117X 35, 72, 55, 30, 64, 23, 16, 14 7620X 11, 31, 26, 22, 55, … 3656X 71, 72, 44, 50, … 1741X 33, 35, 07, 65, ... Line 01173 30 ADDRESS DATA \course\cpeg323-08F\Topic7a

  19. Access Time teff : effective cache access time tcache : cache access time tmain : main memory access time h : hit ratio teff = htcache + (1-h)tmain \course\cpeg323-08F\Topic7a

  20. Example Let tcache = 10 ns - 1- 4 clock cycles tmain = 50 ns - 8-32 clock cycles h = 0.95 teffect = ? 10 x 0.95 + 50 x 0.05 9.5 + 2.5 = 12 \course\cpeg323-08F\Topic7a

  21. Hit Ratio • Need high enough (say > 90%) to obtain desirable level of performance • Amplifying effect of changes • Never a constant even for the same machine \course\cpeg323-08F\Topic7a

  22. tmain tcache tmain tcache tmain tcache ~ ~ Sensitivity of Performance w.r.t h (hit ratio) teff = h tcache + (1-h) tmain = tcache [ h + (1-h) ] tcache [ 1 + (1-h) ] since 10, the magnifactor of h changes is 10 times. Conclusion: very sensitive \course\cpeg323-08F\Topic7a

  23. Remember: “h1” • Example: Let h = 0.90 if h = 0.05 (0.90 0.95) then (1 - h) = 0.05 then teff = tcache ( 1 + 0.5) ~ ~ \course\cpeg323-08F\Topic7a

  24. Basic Terminology • Cache line (block) - size of a room 1 ~ 16 words • Cache directory - key of rooms Cache may use associativity to find the “right directory” by matching “A collection of contiguous data that are treated as a single entity of cache storage.” The portion of a cache that holds the access keys that support associative access. \course\cpeg323-08F\Topic7a

  25. Cache Organization • Fully associative: an element can be in any block • Direct mapping : an element can be in only one block. • Set-associative : an element can be in a group of block \course\cpeg323-08F\Topic7a

  26. An Example Mem Size = 256 k words x 4B/W = 1 MB Cache Size = 2 k words = 8 k byte Block Size = 16 word/block = 64 byte/block So Main M has = 16 k blocks (16,384) Cache has = 128 blocks addr = 18 bits + 2 bits = (28 x 210) x 22 256K 16 2K 16 (byte) 20 words 256 k \course\cpeg323-08F\Topic7a

  27. Fully Associative Feature • Any block in M can be in any block-frame in cache. • All entries (block frame) are compared simultaneously (by associative search). \course\cpeg323-08F\Topic7a

  28. A Special Case simplest example: a block = a word entire memory word address becomes “tag” 0 17 Address 027560 very “flexible” and higher probability to reside in cache. 0 17 Cache adv: no trashing (quick reorganizing) disadv: overhead of associative search: cost + time 027560 data \course\cpeg323-08F\Topic7a

  29. Fully associative cache organization \course\cpeg323-08F\Topic7a

  30. Direct Mapping • No associative match • From M-addr, “directly” indexed to the block frame in cache where the block should be located. A comparison then is to used to determine if it is a miss or hit. \course\cpeg323-08F\Topic7a

  31. Direct Mapping Cont’d Advantage: simplest: Disadvantage: “trashing” Fast (fewer logic) Low cost: (only one set comparator is needed hence can be in the form of standard M \course\cpeg323-08F\Topic7a

  32. Example since cache only has 128 block frames so the degree of multiplexing: Disadr: “trashing” Main Memory Size 16384 (block) 128 (27) 128 for addressing the corresponding frame or set of size 1. = = 27 block/frame the high-order 7 bit is used as tag. i.e. 27 blocks “fall” in one block frame. \course\cpeg323-08F\Topic7a

  33. Direct Mapping \course\cpeg323-08F\Topic7a

  34. Direct Mapping Cont’d Mapping (indexing) block addr mod (# of blocks in cache – in this case: mod (27)) Adv: low-order log2 (cache size) bit can be used for indexing \course\cpeg323-08F\Topic7a

  35. M S Set-Associative • A compromises between direct/full-associative • The cache is divided into S sets S = 2, 4, 8, … • If the cache has M blocks than, all together, there are E = blocks/set # of buildings available for indexing In our example, S = 128/2 = 64 sets \course\cpeg323-08F\Topic7a

  36. 2-way set associative The 6-bit will index to the right set, then the 8-bit tag will be used for an associative match. \course\cpeg323-08F\Topic7a

  37. Associativity with 8-block cache \course\cpeg323-08F\Topic7a

  38. 214 (16k) 26 = 28 block/set 28 block/per set of 2 blocks a 2-way set associative organization: 8 6 4 2 Set Word thus or available for indexing 6 bit used to index into the right “set” higher order 2 way 8 bit used as tag hence an associative match of 8 bit with the tags of the 2 blocks is required Hence an associative matching of 8 bit with the tags of the 2 block is required. \course\cpeg323-08F\Topic7a

  39. 0 6 7 13 14 17 7 7 4 Sector block word (tag) Sector Mapping Cache • Sector (IBM 360/85) - 16 sector x 16 block/sector • 1 sector = consecutive multiple blocks • Cache miss: sector replacement • Valid bit - one block is moved on demand • Example: A sector in memory can be in any sector in cache \course\cpeg323-08F\Topic7a

  40. Sector Mapping Cache \course\cpeg323-08F\Topic7a

  41. 16k 16 cont’d 128 blocks 16 blocks/sector Cache has = 8 sector Main memory has = 1K sectors Sector mapping cache \course\cpeg323-08F\Topic7a

  42. Example See P&H Fig. 7.7 3rd Ed or 5.7 4th Ed \course\cpeg323-08F\Topic7a

  43. Total # of Bits in a Cache Total # of bits = Cache size x (# of bits of a tag + # of bits of a block + # of bits in valid field) For the example: Direct mapped Cache with 4kB of data, 1-word blocks and 32 bit address  4kB = 1k words = 210 words = 210 blocks # of bits of tag = 32 – (10 + 0 + 2) = 20 210 blocks 20 words/block22 bytes/word Total # of bits = 210 x (20 + 32*1 + 1) = 53* 210 = 53 kbits = 6.625kBytes \course\cpeg323-08F\Topic7a

  44. Another example: FastMATH Fast embedded microprocessor that uses the MIPS Architecture and a simple cache implementation. 16kB of data, 16-word blocks and 32 bit address  214 bytes * 1 word/4bytes * 1 block/16 words = 214 / (22 * 24 ) = 28 blocks # of bits of tag = 32 – (8 + 4 + 2) = 18 28 blocks 24 words/block22 bytes/word Total # of bits = 28 x (18 + 32*16 + 1) = 531* 28 = 135,936 bits = 132.75 kBytes \course\cpeg323-08F\Topic7a

  45. Example FastMATH See P&H Fig. 7.9 3rd Ed or 5.9 4th Ed \course\cpeg323-08F\Topic7a

More Related