1 / 34

Cache Design

Understand cache parameters (size, set number, line number), replacement policies, and performance evaluation methods. Explore set-associativity trade-offs and important factors in cache design. Learn the cache design process and how to make optimal choices for cache size, associativity, and line size. Discover the impact of cache design on hit ratios and access times.

dpeters
Download Presentation

Cache Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Design • Cache parameters (organization and placement) • Cache replacement policy • Cache performance evaluation method \course\cpeg324-05F\Topic7c

  2. Cache Parameters • Cache size : Scache(lines) • Set number: N (sets) • Line number per set: K (lines/set) Scache = KN (lines) = KN * L (bytes) --- here L is line size in bytes K-way set-associative \course\cpeg324-05F\Topic7c

  3. Trade-offs in Set-Associativity Full-associative: Higher hit ratio, concurrent search, but slow access when associativity is large; Direct mapping: fast access (if hits) and simplicity for comparison trivial replacement alg. Also, if alternatively use 2 blocks which mapped into the same cache block frame: “trash” may happen. \course\cpeg324-05F\Topic7c

  4. Smain Scache Note • Main memory size: Smain (blocks) Cache memory Size: Scache (blocks) Let P = Since P >>1, average search length is much greater than 1. • Set-associativity provides a trade-off between • concurrency in search • average search/access time per block You need search! \course\cpeg324-05F\Topic7c

  5. Set # < < 1 N Scache Full associative Set associative Direct Mapped \course\cpeg324-05F\Topic7c

  6. Important Factors in Cache Design • Address partitioning strategy (3-dimention freedom) • Total cache size/memory size • Work load \course\cpeg324-05F\Topic7c

  7. Address Partitioning M bits • Byte addressing mode Cache memory size data part = NKL (bytes) • Directory size (per entry) M - log2N - log2L • Reduce clustering (randomize accesses) Log N Log L Set number address in a line set size \course\cpeg324-05F\Topic7c

  8. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.34 0.3 0.2 0.1 8 10 20 30 40 Cache Size Note: The exists a knee Miss Ratio General Curve Describing Cache Behavior \course\cpeg324-05F\Topic7c

  9. …the data are sketchy and highly dependent on the method of gathering... … designer must make critical choices using a combination of “hunches, skills, and experience” as supplement… “a strong intuitive feeling concerning a future event or result.” \course\cpeg324-05F\Topic7c

  10. Basic Principle • Typical workload study + intelligent estimate of others • Good Engineering: small degree over-design • “30% rule” • Each doubling of the cache size reduces misses by 30% by A. Smith • It is a rough estimate only \course\cpeg324-05F\Topic7c

  11. Cache Design Process • “Typical”, not “Standard” • “Sensitive” to: • Price-performance -- Technology • main M access time • cache access • chip density • bus speed • on-chip cache \course\cpeg324-05F\Topic7c

  12. Cache Design Process Choose cache size fix K, L, varying N Pick k = 2 - (likely k = 1) K is small Choose line size L for fix cache size and K, varying L Use new k Choose associativity K fix cache size and L, varying K If k = old \course\cpeg324-05F\Topic7c

  13. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Step 1 : Choose Sizefix K, L, varying N Relative Number of Misses 10 20 30 40 Cache Size (N) \course\cpeg324-05F\Topic7c

  14. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Step 2 : Choose Lfix NKL = size and K, varying L Relative Number of Misses 10 20 30 40 Cache Line Size (L) \course\cpeg324-05F\Topic7c

  15. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Step 2 : Choose Kfix NKL = size and L, varying K Relative Number of Misses 10 20 30 40 Cache Associativity Factor (K) \course\cpeg324-05F\Topic7c

  16. N: Set number Cache directory# = N K Cache size = N K L Constraints in selection of N: (page size) \course\cpeg324-05F\Topic7c

  17. K: Associativity • Bigger miss ratio • Smaller is better in: • faster • Cheaper • 4 ~ 8 get best miss ratio simpler \course\cpeg324-05F\Topic7c

  18. L : Line Size • Atomic unit of transmission • Miss ratio • Smaller • Larger average delay • Less traffic • Larger average hardware cost for associative search • Larger possibility of “Line crossers” • Workload dependent • 16 ~ 128 byte \course\cpeg324-05F\Topic7c

  19. Cache Replacement Policy • FIFO (first-in-first-out) • LRU (least-recently used) • OPT (furthest-future used) do not retain lines that have next occurrence in the most distant future Note: LRU performance is close to OPT for frequently encountered program structures. \course\cpeg324-05F\Topic7c

  20. Program Structure fori = 1 to n forj = 1 to n endfor endfor last-in-first-out feature makes the recent past likes the near future …. \course\cpeg324-05F\Topic7c

  21. Nearest Future Access Furthest Future Access Nearest Future Access Furthest Future Access Nearest Future Access Furthest Future Access A B B C B Z C C A D D A D E E E F F G G F G H H [c] [a] [b] An eight-way cache directory maintained with the OPT policy: [a] Initial state for future reference-string AZBZCADEFGH; [b] After the cache hit the Line A; and [c] After the cache miss to Line Z. \course\cpeg324-05F\Topic7c

  22. Why LRU and OPT are Close to Each Other? LRU : look only at past OPT : look only at future But, recent past nearest future Why? (Consider nested loops) ~ ~ \course\cpeg324-05F\Topic7c

  23. Problem with LRU • Not good in mimic sequential/cyclic • Example ABCDEF ABC…… ABC…… With a set size of 3 \course\cpeg324-05F\Topic7c

  24. OPT A B C D E F G A B C……G ABC…G ABC…G ABC A A A B C A B B C A B C A B C LRU A B C D E F …... A B C D E A B C D Sequential Access \course\cpeg324-05F\Topic7c

  25. Empirical Data OPT can gain about 10% ~ 30% improvement over LRU (in terms of miss reduction) \course\cpeg324-05F\Topic7c

  26. A Comparison • OPT has two candidates for replacement • LRU only has one • the least-recently used -- it never replaces the most recently referenced deadline in LRU the most recently referenced the furthest to be referenced in the future \course\cpeg324-05F\Topic7c

  27. Performance Evaluation Methods for Workload • Analytical modeling • Simulation • Measuring \course\cpeg324-05F\Topic7c

  28. Cache Analysis Methods • Hardware monitoring • fast and accurate • not fast enough (for high-performance machines) • cost • flexibility/repeatability \course\cpeg324-05F\Topic7c

  29. Cache Analysis Methods cont’d • Address traces and machine simulator • slow • accuracy/fidelity • cost advantage • flexibility/repeatability • OS/other impacts - how to put them in? \course\cpeg324-05F\Topic7c

  30. Trace Driven Simulation for Cache • Workload dependence • difficulty in characterizing the load • no general accepted model • Effectiveness • possible simulation for many parameters • repeatability \course\cpeg324-05F\Topic7c

  31. Problem in Address Traces • Representative of the actual workload (hard) • only cover milliseconds of real workload • diversity of user programs • Initialization transient • use long enough traces to absorb the impact • Inability to properly model multiprocessor effects \course\cpeg324-05F\Topic7c

  32. An Example • Assume a two-way associative cache with 256 sets Scache = 2 x 256 lines • Assume that the difficulties of count or not count the initialization causes 512 more misses than actually required • Assume a trace of length 100,000 with hit rate 0.99 than 1000 misses is generated the 512 makes big difference!! • If want 512 miss count less than 5% then total misses = 512/5% = 10,240 miss thus with hit = 0.99 required trace length > 1,024,000! \course\cpeg324-05F\Topic7c

  33. One may not know the cache parameters before hand What to do?? Make it longer than minimum acceptable length! \course\cpeg324-05F\Topic7c

  34. 100,000? too small • (10 ~ 100) x 106 OK? • 1000 x 106 or more being used now .. \course\cpeg324-05F\Topic7c

More Related