1 / 22

Cache Parameters

Cache Parameters. Cache size : S cache (lines) Set number: N (sets) Line number per set: K (lines/set) S cache = KN (lines) = KN * L (bytes)  Here L is line size in bytes K-way set-associative. Trade-offs in Set-Associativity. Fully-associative:

cmcphearson
Download Presentation

Cache Parameters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cache Parameters • Cache size : Scache(lines) • Set number: N (sets) • Line number per set: K (lines/set) Scache = KN (lines) = KN * L (bytes) Here L is line size in bytes K-way set-associative \course\cpeg324-08F\Topic7c

  2. Trade-offs in Set-Associativity Fully-associative: - Higher hit ratio, concurrent search, but slow access when associativity is large. Direct mapping: - Fast access (if hits) and simplicity for comparison. - Trivial replacement algorithm. Problem with hit ratio, e.g. in extreme case: if alternatively use 2 blocks which mapped into the same cache block frame: “trash” may happen. \course\cpeg324-08F\Topic7c

  3. Smain Scache Note Main memory size: Smain (blocks) Cache memory Size: Scache (blocks) Let P = Since P >>1. Average search length is much greater than 1. • Set-associativity provides a trade-off between: • Concurrency in search. • Average search/access time per block. You need search! \course\cpeg324-08F\Topic7c

  4. Number of sets < < 1 N Scache Full associative Set associative Direct Mapped \course\cpeg324-08F\Topic7c

  5. Important Factors in Cache Design • Address partitioning strategy (3-dimention freedom). • Total cache size/memory size • Work load \course\cpeg324-08F\Topic7c

  6. Address Partitioning M bits • Byte addressing mode Cache memory size data part = NKL (bytes) • Directory size (per entry) M - log2N - log2L • Reduce clustering (randomize accesses) Log N Log L Set number byte address in a line set size \course\cpeg324-08F\Topic7c

  7. 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Note: The exists a knee Miss Ratio 0.34 8 10 20 30 40 Cache Size General Curve Describing Cache Behavior \course\cpeg324-08F\Topic7c

  8. …the data are sketchyand highly dependent on the method of gathering... … designer must make critical choices using a combination of “hunches, skills, and experience” as supplement… “a strong intuitivefeeling concerning a future event or result.” \course\cpeg324-08F\Topic7c

  9. Basic Principle • Typical workload study + intelligent estimate of others • Good Engineering: small degree over-design • “30% rule”: • Each doubling of the cache size reduces misses by 30% by Alan J. Smith. Cache Memories. Computing Surveys, Vol. 14., No 13, Sep 1982. • It is a rough estimate only. \course\cpeg324-08F\Topic7c

  10. K: Associativity • Bigger  Miss ratio • Smaller is better in: • Faster • Cheaper • 4 ~ 8 get best miss ratio Simpler \course\cpeg324-08F\Topic7c

  11. L : Line Size • Atomic unit of transmission • Miss ratio • Smaller • Larger average delay • Less traffic • Larger average hardware cost for associative search • Larger possibility of “Line crossers” • Workload dependent • 16 ~ 128 byte Memory references spanning the boundary between two cache lines \course\cpeg324-08F\Topic7c

  12. Cache Replacement Policy • FIFO (first-in-first-out) replace the block loaded furthest in the past • LRU (least-recently used) replace the block used furthest in the past • OPT (furthest-future used)replace the block which will be used furthest in the future. Do not retain lines that have next occurrence in the most distant future Note: LRU performance is close to OPT for frequently encountered program structures. \course\cpeg324-08F\Topic7c

  13. Example: Misses and Associativity Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. • Direct Mapped Cache. Blue text Data used in time t. Black text Data used in time t-1. 5 misses for the 5 accesses \course\cpeg324-08F\Topic7c

  14. Example: Misses and Associativity (cont’d) Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. • Two-way set-associative. LRU replacement policy Blue text Data used in time t. Black text Data used in time t-1. 4 misses for the 5 accesses \course\cpeg324-08F\Topic7c

  15. Example: Misses and Associativity (cont’d) Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. • Fully associative Cache. • Any memory block can be stored in any cache block. Blue text Data used in time t. Black text Data used in time t-1. Red text  Data used in time t-2. 3 misses for the 5 accesses \course\cpeg324-08F\Topic7c

  16. Program Structure for i = 1 to n for j = 1 to n endfor endfor Last-in-first-out feature makes the recent past likes the near future …. \course\cpeg324-08F\Topic7c

  17. Problem with LRU • Not good in mimic sequential/cyclic Example ABCDEF ABC…… ABC…… Exercise: With a set size of 3, what is the miss ratio assuming all 6 addresses mapped to the same set ? \course\cpeg324-08F\Topic7c

  18. Performance Evaluation Methods for Workload • Analytical modeling. • Simulation • Measuring \course\cpeg324-08F\Topic7c

  19. Cache Analysis Methods • Hardware monitoring: • Fast and accurate. • Not fast enough (for high-performance machines). • Cost. • Flexibility/repeatability. \course\cpeg324-08F\Topic7c

  20. Cache Analysis Methods cont’d • Address traces and machine simulator: • Slow. • Accuracy/fidelity. • Cost advantage. • Flexibility/repeatability. • OS/other impacts - How to put them in? \course\cpeg324-08F\Topic7c

  21. Trace Driven Simulation for Cache • Workload dependence: • Difficulty in characterizing the load. • No general accepted model. • Effectiveness: • Possible simulation for many parameters. • Repeatability. \course\cpeg324-08F\Topic7c

  22. Problem in Address Traces • Representative of the actual workload (hard) • Only cover a small fraction of real workload. • Diversity of user programs. • Initialization transient • Use long enough traces to absorb the impact of cold misses • Inability to properly model multiprocessor effects \course\cpeg324-08F\Topic7c

More Related