220 likes | 234 Views
Cache Parameters. Cache size : S cache (lines) Set number: N (sets) Line number per set: K (lines/set) S cache = KN (lines) = KN * L (bytes) Here L is line size in bytes K-way set-associative. Trade-offs in Set-Associativity. Fully-associative:
E N D
Cache Parameters • Cache size : Scache(lines) • Set number: N (sets) • Line number per set: K (lines/set) Scache = KN (lines) = KN * L (bytes) Here L is line size in bytes K-way set-associative \course\cpeg324-08F\Topic7c
Trade-offs in Set-Associativity Fully-associative: - Higher hit ratio, concurrent search, but slow access when associativity is large. Direct mapping: - Fast access (if hits) and simplicity for comparison. - Trivial replacement algorithm. Problem with hit ratio, e.g. in extreme case: if alternatively use 2 blocks which mapped into the same cache block frame: “trash” may happen. \course\cpeg324-08F\Topic7c
Smain Scache Note Main memory size: Smain (blocks) Cache memory Size: Scache (blocks) Let P = Since P >>1. Average search length is much greater than 1. • Set-associativity provides a trade-off between: • Concurrency in search. • Average search/access time per block. You need search! \course\cpeg324-08F\Topic7c
Number of sets < < 1 N Scache Full associative Set associative Direct Mapped \course\cpeg324-08F\Topic7c
Important Factors in Cache Design • Address partitioning strategy (3-dimention freedom). • Total cache size/memory size • Work load \course\cpeg324-08F\Topic7c
Address Partitioning M bits • Byte addressing mode Cache memory size data part = NKL (bytes) • Directory size (per entry) M - log2N - log2L • Reduce clustering (randomize accesses) Log N Log L Set number byte address in a line set size \course\cpeg324-08F\Topic7c
1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 Note: The exists a knee Miss Ratio 0.34 8 10 20 30 40 Cache Size General Curve Describing Cache Behavior \course\cpeg324-08F\Topic7c
…the data are sketchyand highly dependent on the method of gathering... … designer must make critical choices using a combination of “hunches, skills, and experience” as supplement… “a strong intuitivefeeling concerning a future event or result.” \course\cpeg324-08F\Topic7c
Basic Principle • Typical workload study + intelligent estimate of others • Good Engineering: small degree over-design • “30% rule”: • Each doubling of the cache size reduces misses by 30% by Alan J. Smith. Cache Memories. Computing Surveys, Vol. 14., No 13, Sep 1982. • It is a rough estimate only. \course\cpeg324-08F\Topic7c
K: Associativity • Bigger Miss ratio • Smaller is better in: • Faster • Cheaper • 4 ~ 8 get best miss ratio Simpler \course\cpeg324-08F\Topic7c
L : Line Size • Atomic unit of transmission • Miss ratio • Smaller • Larger average delay • Less traffic • Larger average hardware cost for associative search • Larger possibility of “Line crossers” • Workload dependent • 16 ~ 128 byte Memory references spanning the boundary between two cache lines \course\cpeg324-08F\Topic7c
Cache Replacement Policy • FIFO (first-in-first-out) replace the block loaded furthest in the past • LRU (least-recently used) replace the block used furthest in the past • OPT (furthest-future used)replace the block which will be used furthest in the future. Do not retain lines that have next occurrence in the most distant future Note: LRU performance is close to OPT for frequently encountered program structures. \course\cpeg324-08F\Topic7c
Example: Misses and Associativity Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. • Direct Mapped Cache. Blue text Data used in time t. Black text Data used in time t-1. 5 misses for the 5 accesses \course\cpeg324-08F\Topic7c
Example: Misses and Associativity (cont’d) Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. • Two-way set-associative. LRU replacement policy Blue text Data used in time t. Black text Data used in time t-1. 4 misses for the 5 accesses \course\cpeg324-08F\Topic7c
Example: Misses and Associativity (cont’d) Small cache with four one-word blocks. Sequence 0, 8, 0, 6 and 8. • Fully associative Cache. • Any memory block can be stored in any cache block. Blue text Data used in time t. Black text Data used in time t-1. Red text Data used in time t-2. 3 misses for the 5 accesses \course\cpeg324-08F\Topic7c
Program Structure for i = 1 to n for j = 1 to n endfor endfor Last-in-first-out feature makes the recent past likes the near future …. \course\cpeg324-08F\Topic7c
Problem with LRU • Not good in mimic sequential/cyclic Example ABCDEF ABC…… ABC…… Exercise: With a set size of 3, what is the miss ratio assuming all 6 addresses mapped to the same set ? \course\cpeg324-08F\Topic7c
Performance Evaluation Methods for Workload • Analytical modeling. • Simulation • Measuring \course\cpeg324-08F\Topic7c
Cache Analysis Methods • Hardware monitoring: • Fast and accurate. • Not fast enough (for high-performance machines). • Cost. • Flexibility/repeatability. \course\cpeg324-08F\Topic7c
Cache Analysis Methods cont’d • Address traces and machine simulator: • Slow. • Accuracy/fidelity. • Cost advantage. • Flexibility/repeatability. • OS/other impacts - How to put them in? \course\cpeg324-08F\Topic7c
Trace Driven Simulation for Cache • Workload dependence: • Difficulty in characterizing the load. • No general accepted model. • Effectiveness: • Possible simulation for many parameters. • Repeatability. \course\cpeg324-08F\Topic7c
Problem in Address Traces • Representative of the actual workload (hard) • Only cover a small fraction of real workload. • Diversity of user programs. • Initialization transient • Use long enough traces to absorb the impact of cold misses • Inability to properly model multiprocessor effects \course\cpeg324-08F\Topic7c